I have created a DataFusion DataFrame:
| asin       | vote | verified | unixReviewTime | reviewText      |
+------------+------+----------+----------------+-----------------+
| 0486427706 | 3    | true     | 1381017600     | good            |
| 0486427707 |      | false    | 1376006400     | excellent       |
| 0486427707 | 1    | true     | 1459814400     | Did not like it |
| 0486427708 | 4    | false    | 1376006400     |                 |
+------------+------+----------+----------------+-----------------+
I was trying to find the solution of following information from the API document, but could not figure it out:
- Convert the unixReviewTimecolumn into Rust Native timestamp
- Extract the Year, Month and Day from the newly created column into separate columns
Here is how json datafile looks like:
{"asin": "0486427706", "vote": 3, "verified": true, "unixReviewTime": 1381017600, "reviewText": "good", "overall": 5.0}
{"asin": "0486427707", "vote": null, "verified": false, "unixReviewTime": 1376006400, "reviewText": "excellent", "overall": 5.0}
{"asin": "0486427707", "vote": 1, "verified": true, "unixReviewTime": 1459814400, "reviewText": "Did not like it", "overall": 2.0}
{"asin": "0486427708", "vote": 4, "verified": false, "unixReviewTime": 1376006400, "reviewText": null, "overall": 4.0}
It is very easy to do in pyspark as follows:
from PySpark.sql import functions as fn
from PySpark.sql.functions import col
main_df = (
    main_df
    .withColumn(
        'reviewed_at',
        fn.from_unixtime(col('unixReviewTime'))
    )
)
main_df = main_df.withColumn("reviewed_year", fn.year(col("reviewed_at")))
main_df = main_df.withColumn("reviewed_month", fn.month(col("reviewed_at")))
 
                        
Produces: