Optimize an R code having for loop

75 Views Asked by At

here is my code, basically I am trying to convert data to wide format which may or may not have duplicate data. I tried to use for-each and parallel but still it takes more time, Can anyone suggest any change

This is the data I am processing:-

Param1: 1, , 753360af0c8949c0aeab64d520599656
Param2: Value2
Param3: value3
Param4: Value4
Param1: 2, , 8c8c659813d842c5bab2ddba9483ea5a
Param2: Value5
Param4: value6
Param3: Value7

So basically I need a wide format but the the above example contains 4 parameters. The file can have 10 varying number of parameters as the text file comes from different sources.

The result should look like this:-

Param1                                    Param2  Param3  param4    
1 753360af0c8949c0aeab64d520599656        Value2  Value3  Value4    
2 8c8c659813d842c5bab2ddba9483ea5a        Value5  Value6  Value7

here is the code for the same:-

f <-  read.table("./Sample.txt",header = FALSE, sep = ":",fill=TRUE, row.names=NULL)

f[,2] <- paste(f[,2],f[,3],f[,4])

c <- unique(f[,1])

rw <- round(nrow(f) / length(c)) + 1

result <- data.frame(matrix(0,ncol=length(c),nrow=rw))

colnames(result) <- t(c)

wh <- which(f[,1]==c[1])

for(i in 1:(length(wh)-1)) {

  print(i)

  tmp <- f[(wh[i]:(wh[i+1]-1)),]

  result[i,] <- t(tmp[,2][match(colnames(result),tmp[,1])])



}

I have more than 10,00,000 row to process and the above code does not complete even after a day.

Thanks in advance

0

There are 0 best solutions below