I want to loop through a data frame and create a new column that says 'YES' if the 2nd to 4th elements in the row are 'ANOMALY' and 'NO' otherwise.
for (j in 1:nrow(residual_anomalies)){
if (all(residual_anomalies[j,2:4]=='ANOMALY')) {residual_anomalies$Prediction_Anomaly[j] <- 'YES'} else
residual_anomalies$Prediction_Anomaly[j] <- 'NO'
}
So the above is currently what I'm using. It works but it's taking a big computational performance hit so I'm trying to vectorize it. What I had done so far was create a function that returns 'YES' or 'NO' based on if the elements of the row were all 'ANOMALY'.
vote_for_anomaly <- function(x){
if (all(x)=='ANOMALY') return('YES') else
return('NO')}
And then I try to use the apply function in R
aggregates <- apply(residual_anomalies[,2:4],1,vote_for_anomaly)
but then I'm getting the following errors/warnings
Error in if (all(x) == "ANOMALY") return("ANOMALY") else return("NO SIGNAL") :
missing value where TRUE/FALSE needed
In addition: Warning message:
In all(x) : coercing argument of type 'character' to logical
Can someone tell me why this isn't working and how I should change this?
You can use this data for testing and call it residual_anomalies
1 ANOMALY ANOMALY ANOMALY ANOMALY
2 ANOMALY NO SIGNAL ANOMALY ANOMALY
3 ANOMALY ANOMALY ANOMALY ANOMALY
4 NO SIGNAL ANOMALY NO SIGNAL ANOMALY
5 ANOMALY ANOMALY ANOMALY ANOMALY
6 NO SIGNAL NO SIGNAL ANOMALY ANOMALY
Per @lukeA, there's a typo in your code. It should be
but it would be faster to do:
rowSums is very fast.