Remove milliseconds from timestamp pyspark SQL

947 Views Asked by At

I have two fields, both in string type. One field is called date, the other time.

The date field has values like below: 20220328,20220329,20220330,20220331

The timefield has values like below: 043313,045546,043313,044147

What I need is for these values to be converted to timestamp datatype and thus should look something like this:

2022-03-28T04:33:13
2022-03-29T04:55:46
2022-03-30T04:33:13
2022-03-28T04:41:47

I have the following code:

to_timestamp(concat(to_date(table.date,'yyyyMMdd'),COALESCE(nullif(table.time, ''),'000000')) , 'yyyy-MM-ddHHmmss')

which works fine, but returns the timestamp like below: 2022-03-28T04:33:13.000+0000

Does anyone know how I can remove the trailing 000+0000 so that I would only keep 2022-03-28T04:33:13?

Kind regards

1

There are 1 best solutions below

0
On

You want to format the timestamp, you can use date_format function for that:

spark.sql("""
    select date_format(
                to_timestamp(
                    concat(table.date, coalesce(nullif(table.time, ''), '000000')),
                     'yyyyMMddHHmmss'
                ),
                "yyyy-MM-dd'T'HH:mm:ss"
            ) as new_date
    from values ('20220328', '043313'), ('20220329', '045546'), ('20220330', '043313'), ('20220331', '044147') as table (date, time)
""").show()

#+-------------------+
#|           new_date|
#+-------------------+
#|2022-03-28T04:33:13|
#|2022-03-29T04:55:46|
#|2022-03-30T04:33:13|
#|2022-03-31T04:41:47|
#+-------------------+