Write out results of for-loop of distance measures in matrix form in R

1k Views Asked by At

Suppose I have something like the following vector:

text <- as.character(c("string1", "str2ing", "3string", "stringFOUR", "5tring", "string6", "s7ring", "string8", "string9", "string10"))

I want to execute a loop that does pair-wise comparisons of the edit distance of all possible combinations of these strings (ex: string 1 to string 2, string 1 to string 3, and so forth). The output should be in a matrix form with rows equal to number of strings and columns equal to number of strings.

I have the following code below:

#Matrix of pair-wise combinations
m <- expand.grid(text,text)

#Define number of strings
n <- c(1:10)

#Begin loop; "method='osa'" in stringdist is default
for (i in 1:10) {
  n[i] <- stringdist(m[i,1], m[i,2], method="osa")
  write.csv(data.frame(distance=n[i]),file="/File/Path/output.csv",append=TRUE)
  print(n[i])
  flush.console()
}

The stringdist() function is from the stringdist{} package but the function is also bundled in the base utils package as adist()

My question is, why is my loop not writing the results as a matrix, and how do I stop the loop from overwriting each individual distance calculation (ie: save all results in matrix form)?

1

There are 1 best solutions below

2
On BEST ANSWER

I would suggest using stringdistmatrix instead of stringdist (especially if you are using expand.grid)

 res <- stringdistmatrix(text, text)
 dimnames(res) <- list(text, text)  
 write.csv(res, "file.csv")

As for your concrete question: "My question is, why is my loop not writing the results as a matrix"
It is not clear why you would expect the output to be a matrix? You are calculating an element at a time, saving it to a vector and then writing that vector to disk.

Also, you should be aware that the arugments of write.csv are mostly useless (they are there, I believe, just to remind the user of what the defaults are). Use write.table instead

If you want to do this iteratively, I would do the following:

# Column names, outputted only one time
write.table(rbind(names(data.frame(i=1, distance=n[1])))
            ,file="~/Desktop/output.csv",append=FALSE   # <~~ Don't append for first run.
             , sep=",", col.names=FALSE, row.names=FALSE)

for (i in 1:10) {
  n[[i]] <- stringdist(m[i,1], m[i,2], method="osa")
  write.table(data.frame(i=i, distance=n[i]),file="~/Desktop/output.csv"
              ,append=TRUE, sep=",", col.names=FALSE, row.names=FALSE)
  print(n[[i]])
  flush.console()
}