Changing a string to datetime format in where clause for comparison

520 Views Asked by At

I have a dataset that looks like this:

datetime                                 count        
18:28:20.602 UTC DEC 08 2016                1
20:42:32.017 UTC DEC 08 2016                5
15:33:40.691 UTC DEC 08 2016                1
17:11:54.008 UTC DEC 08 2016                3
20:28:57.861 UTC DEC 08 2016                0
.
.
.
.

The datetime column is in the string format. I'm having difficulty in converting it to a timestamp.

How do I write a Impala/Hive query so that I get the data between '18:28:00.000 UTC DEC 08 2016' to '18:33:00.000 UTC DEC 08 2016'

1

There are 1 best solutions below

0
On

With Hive:

cast(from_unixtime(unix_timestamp(SHITTY_FORMAT, 'HH:mm:ss.SSS zzz MMM dd yyyy'), 'yyyy-MM-dd HH:mm:ss.SSS') as Timestamp)

...will translate your shitty String format into a UNIX timestamp, then into String standard format (in local timezone because that's the Hive convention), then into a Timestamp.
There is no easier way, unfortunately. And you may have some edge cases because of the 1h overlap in summer/winter times.

Source: the Hive documentation, of course...


With Impala (which does not support the zzz format modifier):

cast(from_unixtime(unix_timestamp(regexp_replace(SHITTY_FORMAT, ' UTC ', ' '), 'HH:mm:ss.SSS MMM dd yyyy'), 'yyyy-MM-dd HH:mm:ss.SSS') as Timestamp) 

...will translate your shitty String format into a UNIX timestamp, assuming that all your inputs are in UTC, then into String standard format (in UTC timezone because that's the Impala convention), then into a Timestamp.