R: Plot Density Graph for data in tables with respect to Labels in tables

111 Views Asked by At

I got a data in table form which look like this in R:

      V1    V2
   1  19 -1539
   2   7 -1507
   3   3 -1446
   4   7 -1427
   5   8 -1401
   6   2  -422
   7  22  4178
   8   5  4277
   9  10  4303
   10 18  4431


   ....200 million more lines to go

I would like to plot a density plot for the value in the second column with respect to the label in the first column (i.e. each label has on density curve on a same graph). But I don't know how. Any suggestion?

2

There are 2 best solutions below

0
On BEST ANSWER

OK I figure it out by myself

ggplot(data, aes(x=V2, color=V1)) + geom_density(aes(group=V1))

Should be able to do that. However there is two thing I need to make sure first in order to let it run:

  1. V1 is a factor
  2. V2 is a numerical value

The data I got wasn't set directly by read.tables in the way I want, so I have to do the following before using ggplot:

data$V1 = as.factor(data$V1)
data$V2 = as.numeric(as.character(data$V2))
1
On

If I understood the question correctly, this would end up somewhat like a density heatmap in the end. (Considering there are 200 million observations total and V1 has fairly considerable range of variation)

For that I would try ggplot and stat_binhex:

df <- read.table(text="V1    V2
1  19 -1539
2   7 -1507
3   3 -1446
4   7 -1427
5   8 -1401
6   2  -422
7  22  4178
8   5  4277
9  10  4303
10 18  4431")

library(ggplot2)

ggplot(data=df,aes(V1,V2)) + 
  stat_binhex() +
  scale_fill_gradient(low="red", high="steelblue") +
  scale_y_continuous() + 
  theme_bw()

stat_binhex should work well with large data and has several parameters that will help with presentation (like bins, binwidth. See ?stat_binhex)