Let's assume that a group of people is followed during time and at 3 time points they were asked if they would like become judge or not. During the time they will change their opinion. I would like to show graphically the change of opinion to become judge/not judge during time. Here is an idea how it could be shown:
Here is how to read the plot:
- 1,462 student were sampled and (400+295+22+147) of these would like to become judge (first bunch of lines upwards).
- Blue path means that at the end they become judge.
- Black path means that at the end they did something else.
- Line goes up: they want to become judge.
- Line goes down: they don't want to become judge.
- Thickness of the lines is proportional to the number of person who went through this specific path (=number plotted at the end of the path).
For example:
(a) 118 person didn't want to become judge during high school and university but during practice they decided to become judge.
(b) Until practice 695 decided to become judge but after practice 400 become judge and 295 did something else.
The main idea is to explore which kind of decision path exists and which are the most used.
I have several question:
- Is there a name for this kind of graph?
- Is there already an R-function which can plot this graph?
- If there is no R-function: any idea how I can plot this prettier? For example: (3.1) I would like to have the curve adjacent (without gap between the curves and without overlapping). (3.2) Start and end of the curves should be parallel to the y-axis.
Any suggestions?
Edit 1:
I found a plot which is similar to the one above: riverplot, see for example, R library riverplot or R blogger. The drawback of riverplot is that at the crossing points the individual threads or pathes are lost.
Here are the data:
library(reshape2)
library(ggplot2)
# Data
wide <- data.frame( grp = 1:8,
time1_orig = rep(8,8)
, time2_orig = rep(c(4,12), each = 4)
, time3_orig = rep(c(2,6,10,14), each = 2)
, time4_orig = seq(1,15,2)
, n = c(409,118,38,33,147,22,295,400) # number of persion
, d = c(1,0,1,0,1,0,1,0) # decision
)
wide
grp time1_orig time2_orig time3_orig time4_orig n d
1 1 8 4 2 1 409 1
2 2 8 4 2 3 118 0
3 3 8 4 6 5 38 1
4 4 8 4 6 7 33 0
5 5 8 12 10 9 147 1
6 6 8 12 10 11 22 0
7 7 8 12 14 13 295 1
8 8 8 12 14 15 400 0
What follows are transformation of the data to get the plot:
w <- 500
wide$time1 <- wide$time1_orig + (cumsum(wide$n)-(wide$n)/2)/w
wide$time2 <- wide$time2_orig + (cumsum(wide$n)-(wide$n)/2)/w
wide$time3 <- wide$time3_orig + (cumsum(wide$n)-(wide$n)/2)/w
wide$time4 <- wide$time4_orig + (cumsum(wide$n)-(wide$n)/2)/w
long<- melt(wide[,-c(2:5)], id = c("d","grp","n"))
long$d<-as.character(long$d)
str(long)
And here is the ggplot:
gg1 <- ggplot(long, aes(x=variable, y=value, group=grp, colour=d)) +
geom_line (aes(size=n),position=position_dodge(height=c(0.5))) +
geom_text(aes(label=c( "1462","" ,"" ,"" ,"" ,"" ,"" ,""
,"" ,"" ,"598","" ,"" ,"864","" ,""
,"527" ,"" ,"" ,"71" ,"169","" ,"" ,"695"
,"409" ,"118","38" ,"33" ,"147","22" ,"295","400"
)
, size = 300, vjust= -1.5)
) +
scale_colour_manual(name="",labels=c("Yes", "No"),values=c("royalblue","black")) +
theme(legend.position = c(0,1),legend.justification = c(0, 1),
legend.text = element_text( size=12),
axis.text = element_text( size=12),
axis.title = element_text( size=15),
plot.title = element_text( size=15)) +
guides(lwd="none") +
labs(x="", y="Consider a judge career as an option:") +
scale_y_discrete(labels="") +
scale_x_discrete(labels = c( "during high school"
, "during university"
, "during practice"
, ""
)
)
gg1
I found a solution thanks to library
riverplot
which gives me this plot:Here is the code:
There is an alternative to plot a sequence of categorical information:
TraMineR - Mining sequence data