this is my first R Code, and it is a very simple deduplication, but it is working so slowly I can't believe it! My question is: Is it normal that it is working so slowly or is my code just bad? Here it is:
file1=c(read.delim("file.txt", header=TRUE))
dedupes<-0
i<-1
n<-1
while (i<=100) {
while (n<=100) {
if (file1$email[i]==file1$email[n] && i!=n) {
#Remember amount of deduces
dedupes=dedupes+1
#Show dedupes
print(file1$email[i]) }
n<-n+1
}
n<-1
i<-i+1
}
#Show amount of dedupes
cat("There are ", dedupes/2, " deduces")
Many thanks in advance, Saitam
Imbricated loops are well known to be slow in R. You need to vectorize your calculus or use existing optimized functions such as in the suggestion of BondedDust