here is my code, basically I am trying to convert data to wide format which may or may not have duplicate data. I tried to use for-each and parallel but still it takes more time, Can anyone suggest any change
This is the data I am processing:-
Param1: 1, , 753360af0c8949c0aeab64d520599656
Param2: Value2
Param3: value3
Param4: Value4
Param1: 2, , 8c8c659813d842c5bab2ddba9483ea5a
Param2: Value5
Param4: value6
Param3: Value7
So basically I need a wide format but the the above example contains 4 parameters. The file can have 10 varying number of parameters as the text file comes from different sources.
The result should look like this:-
Param1 Param2 Param3 param4
1 753360af0c8949c0aeab64d520599656 Value2 Value3 Value4
2 8c8c659813d842c5bab2ddba9483ea5a Value5 Value6 Value7
here is the code for the same:-
f <- read.table("./Sample.txt",header = FALSE, sep = ":",fill=TRUE, row.names=NULL)
f[,2] <- paste(f[,2],f[,3],f[,4])
c <- unique(f[,1])
rw <- round(nrow(f) / length(c)) + 1
result <- data.frame(matrix(0,ncol=length(c),nrow=rw))
colnames(result) <- t(c)
wh <- which(f[,1]==c[1])
for(i in 1:(length(wh)-1)) {
print(i)
tmp <- f[(wh[i]:(wh[i+1]-1)),]
result[i,] <- t(tmp[,2][match(colnames(result),tmp[,1])])
}
I have more than 10,00,000 row to process and the above code does not complete even after a day.
Thanks in advance