Ggplot with yearmonth in x-axis

510 Views Asked by At

I have a dataset with information on where individuals work over time, where time is defined as year/month (and shown as numeric values YYYYMM in my dataset). I run a ggplot to visualise how long individuals stay in a given workplace as well as how they move around. I used position_dodge to make it visible when the same individual works in more than one place during the same month.

In the simple example below:

  • individual A works in place 1 from Jan/2012 (i.e., 201201) until Dec/2012
  • individual B works in place 2 from Jan/2012 until Jun/2012 and then switches to place 2 from Jul/2012 until Nov/2012
  • individual C works in place 1 from Jan/2012 until Apr/2012 and in place 2 from Feb/2012 until Jun/2012
  • individual D works in place 1 only during Jan/2012

My query is related to how to use time intervals. In my dataset, the time period variable refers to the entire month. For instance, individual A actually works in workplace 1 from 01/01/2012 until 31/12/2012 and individual D works in workplace 1 from 01/01/2012 until 31/01/2012.

# individual A
a_id <- c(rep('A',12))
a_period <- c(seq(201201, 201212))
a_workplace <-c(rep(1,12))

# individual B
b_id <- c(rep('B',11))
b_period <- c(seq(201201,201206), seq(201207,201211))
b_workplace <-c(rep(1,6), rep(2,5))

# individual C
c_id <- c(rep('C',9))
c_period <- c(seq(201201,201204), seq(201202,201206))
c_workplace <-c(rep(1,4), rep(2,5))

# individual D
d_id <- c(rep('D',1))
d_period <- c(seq(201201,201201))
d_workplace <-c(rep(1,1))

# final data frame
id <- c(a_id, b_id, c_id, d_id)
period <- c(a_period, b_period, c_period, d_period)
workplace <- as.factor(c(a_workplace, b_workplace, c_workplace, d_workplace))
mydata <- data.frame(id, period, workplace)

ggplot(mydata, aes(x = id, y = period, color = workplace)) +
  geom_line(position = position_dodge(width = 0.1), size = 2) +
  scale_x_discrete(limits = rev) +
  scale_y_continuous(breaks = seq(201201, 201212, by = 1)) +
  coord_flip() +
  theme(axis.text.x = element_text(angle=45, hjust=1),
        legend.position   = c(.8, .2), 
        legend.direction  = "vertical",
        legend.background = element_rect(linetype = "solid", colour = "black"), 
        panel.background  = element_rect(fill = "grey97")) +
  labs(y = "time", title = "Work affiliation")

The ggplot above considers year/month as a single point in time. For instance, it shows no working history for individual D. How do I consider each consecutive sequence at the individual-workplace level to begin on the first day of the first month & end on the last day of the last month of the consecutive sequence. I also would like also to convert the year/month variable from numeric to date format to make manipulation easier.

PS: I highlight each consecutive sequence in the paragraph above because the same individual may work in a given place for a few months, leave for a time period and then return to working again in this same workplace later on. In these cases, the two time intervals the individual work in this given workplace should be considered separately.

1

There are 1 best solutions below

0
On

for the second question regarding conversion of numeric into date type i've got an answer:

library(lubridate) # handling and conversion of datetype
lubridate::ymd() # turns your numeric into a date
as.Date() #turns your characterstring into date type which is by the way the 
#proper way you should handover timerelated data to ggplot

that should do it for your code:

mydata$period=lubridate::ymd(mydata[,2])