I am trying to create a coordinate vector to later plot with ggplot.
Suppose I have a data frame that looks like:
keys = c("aa", "aa", "ac", "ag", "gg", "at", "ca", "gc", "cc", "cg", "gt", "gg", "tt", "ta", "ga", "tg")
values = c(9.318796e-05, 1.863759e-04, 5.591278e-04, 1.863759e-04, 2.795639e-04, 9.318796e-05, 9.318796e-05, 1.863759e-04, 1.863759e-04, 2.795639e-04, 2.795639e-04, 1.863759e-04, 2.795639e-04, 9.318796e-05, 9.318796e-05, 5.591278e-04)
df = data.frame(keys, values)
Now I want to create a matrix which will give each letter its own space, specifically:
A(-1,1) [upper left],
T(1,-1)[lower right],
G(1,1)[upper right] and
C(-1,-1)[lower left]
For this I have done:
array_size = sqrt(4^k) #Where k = 2
graph_coord = c()
for(i in range(array_size)){
graph_coord = append(graph_coord, array_size[1])
} ##Give the graph_coord its size
maxx = array_size
maxy = array_size
posx = 1
posy = 1
for(i in df$keys){
##This part is for getting each individual letter of each element of keys.
for(j in i[[1]]){
##If the individual letter is a T then the actual position on x should be maxx/2
if (i == "T"){
posx = maxx/2
}else if(i == "C"){
posy = maxy/2
}else if(i == "G"){
posx =maxx/2
posy =maxy/2
}
###Up until this point I think that the code is doing well,
###I can grab individual letters of each element of key and
###see which one they are and then decide to move them according
###to the initial coordinate maxx and maxy. The next part escapes me:
maxx = maxx/2
maxy /=2 ##This /= is customary to python what would be the R equivalent?
##Append the graph coordinates with the df$values.
graph_coord = append(graph_coord, posy-1, posx-1, prob) ##This part is especially hard for me to grasp and as such I have left the idea, but the code snippet is absolutely incorrect.
}
This code is still a work in progress. I am trying to recreate what has been done here: Frequency table extracted from Chaos Game Representation
Here's a tidyverse approach:
Your example data has two
aa
and twogg
so this is the output:EDIT: here's a general approach for any
k
granularity. Here are two example data frames, with k = 3 and 4, respectively.Then we could plug those into the following code:
Which produces the outputs below: