I need help guys!
Imagine I have a vector fp
with n elements.
Each element is a string with 64 characters (always) and I want to build a distance matrix of all the elements. Each character of each element is either hexadecimal (0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f), -
or X
where -
means absence and X
means any value.
The distance between two characters must be the hamming distance of the binary representation of each element with the exception of -
and X
:
- if the character are equal, the distance remains the same
- if any of the character is
X
, the distance also remains the same - if any of the character is
-
and they are different, 5 is added to the distance - if they are different, the hamming distance between the binary representation of the characters is added to the distance
I was able to build a script to functionally calculate this:
dist = data.frame()
for(m in 1:length(fp)){
for(l in 1:length(fp)){
d=0
for(k in 1:nchar(fp[l])){
if(substr(fp[m],k,k) == substr(fp[l],k,k)){d = d}
else if((substr(fp[m],k,k)=="X")|((substr(fp[l],k,k)=="X"))){d = d}
else if((substr(fp[m],k,k)=="-")|((substr(fp[l],k,k)=="-"))){d = d+5}
else{
d = d+sum(stringdist(as.character(as.binary(as.hexmode(substr(fp[m],k,k)),n=4)),as.character(as.binary(as.hexmode(substr(fp[l],k,k)),n=4))))
}
}
dist[l,m] = d
}
}
but when fp is 200+ long, it gives me a error message:
I already used the Sys.setenv('R_MAX_VSIZE'=32000000000)
and it still gives the error.
Any idea of what to do?