I'm new to R, but try to analyze dataset Here is the original link https://cache-default03g.cdn.yandex.net/download.yandex.ru/company/jobs/test_data_dreams.txt
My code is (I use R Studio 0.99.903 & R 3.3.1)
# get the data from url
url <- "https://cache- default03g.cdn.yandex.net/download.yandex.ru/company/jobs/test_data_dreams.txt"
testdata <-read.table(url, header = T, sep="\t")
#install packages for text mining to analyze the queries
install.packages("slam")
install.packages("tm")
library(tm)
#convert unix to GMT
testdata$timestamp..unix. <- as.POSIXct(as.numeric(as.character(testdata$timestamp..unix.)),origin="1970-01-01",tz="GMT")
#delete some words
testdata$query <- gsub("к чему снится ", "\\1", testdata$query)
testdata$query <- gsub("к чему сниться ", "\\1", testdata$query)
testdata$query <- gsub(" к чему снится", "\\1", testdata$query)
testdata$query <- gsub(" к чему сниться", "\\1", testdata$query)
testdata$query <- gsub("снится ", "\\1", testdata$query)
testdata$query <- gsub(" к чему", "\\1", testdata$query)'
Now my data frame looks this way.
> head(testdata)
timestamp..unix. query city
1 2016-02-04 10:15:13 волна вынесла на берег Москва
2 2016-02-24 10:28:53 бегать наперегонки Екатеринбург
3 2016-02-07 15:31:51 свадьба мужчине со своей женой Владикавказ
4 2016-02-05 08:06:24 иголка медицинская Тамбов
5 2016-02-16 15:21:16 давняя знакомая Калининград
6 2016-02-27 03:38:46 белый маленький котенок Новосибирск
Now I'm trying to plot queries to see their distribution during the daytime (also during the month) in general and for each city I have.
Could you please help me with the tool I should pick to read days and hours separately and plot not the query itself, but just the distribution of queries.
Thanks!
You can use lubridate package, it is quite easy to extract days and hours from your dates and then to make a test on them. For example :
For the plot I recommend you ggplot2 package, and here you will find a example plotting time series.