How to change string to date-time format in sql/r?

99 Views Asked by At

I am trying to find the earliest and latest timestamp of a text message. I know I have to change the following column to date format to be able to order by:

Fri May 26 17:30:01 +0000 2017  
Fri May 26 17:30:05 +0000 2017  
Fri May 26 17:30:05 +0000 2017  
Fri May 26 17:30:04 +0000 2017  
Fri May 26 17:30:12 +0000 2017

I have tried using the 'substr' function to convert into YYYY-MM-DD HH:MM:SS

This is what I have for now:

dbGetQuery(db2, "SELECT text,
  CAST(
      SUBSTR(created_at,-4) || '-' ||
      CASE SUBSTR(created_at,5,3)
         WHEN 'Jan' THEN 01
         WHEN 'Feb' THEN 02
         WHEN 'Mar' THEN 03
         WHEN 'Apr' THEN 04
         WHEN 'May' THEN 05
         WHEN 'Jun' THEN 06
         WHEN 'Jul' THEN 07
         WHEN 'Aug' THEN 08
         WHEN 'Sep' THEN 09
         WHEN 'Oct' THEN 10
         WHEN 'Nov' THEN 11
         WHEN 'Dec' THEN 12
      END || '-' ||
      SUBSTR(created_at, 9,2)
   AS date)
           FROM tweets")

I only get the YYYY. Any help will be much appreciated. Thank you.

1

There are 1 best solutions below

0
On

Using tweets defined reproducibly in the Note at the end extract and concatenate the datetime components in a CTE (i.e. with clause) and then using that find the minimum and maximum value of created_at. We have used sqldf to compactly represent it in a reproducible way but the same SQL statement should work directly with RSQLite.

library(sqldf)

sqldf("with d as (
   select format('%s-%02d-%02d %s',
     substr(created_at, -4, 4),
     instr('  JanFebMarAprMayJunJulAugSepOctNovDec', substr(created_at, 5, 3))/3,
     substr(created_at, 9, 2),
     substr(created_at, 12, 8)) as created_at
   from tweets
  )
  select min(created_at) as min, max(created_at) as max from d")
##                   min                 max
## 1 2017-05-26 17:30:01 2017-05-26 17:30:12

This variation also works.

sqldf("with d as (
   select 
     substr(created_at, -4, 4) || '-' ||
     substr(0 || (instr('  JanFebMarAprMayJunJulAugSepOctNovDec', 
           substr(created_at, 5, 3))/3), -2, 2) || '-' ||
     substr(0 || substr(created_at, 9, 2), -2, 2) || ' ' ||
     substr(created_at, 12, 8) as created_at
   from tweets
  )
  select min(created_at) as min, max(created_at) as max from d")
##                   min                 max
## 1 2017-05-26 17:30:01 2017-05-26 17:30:12

Note

Lines <- "created_at 
Fri May 26 17:30:01 +0000 2017  
Fri May 26 17:30:05 +0000 2017  
Fri May 26 17:30:05 +0000 2017  
Fri May 26 17:30:04 +0000 2017  
Fri May 26 17:30:12 +0000 2017"
tweets <- read.csv(text = Lines, strip.white = TRUE)