I'm trying to correlate sulfate and nitrate values in my dataset (a) by ID values and specific conditions (specified below). The dataset contains three columns (ID, sulfate, nitrate). The code works when I run each ID value individually but now I'm trying to set up a loop to run through all the ID values and then print out all the correlations by ID value into a single vector. The loop is not printing out the correlation values as I'm sure I am not saving them correctly. How can I modify the code below to print out a vector of correlation values according to each ID value?
for (i in 1:5) {
if (a$ID==i && length(a$ID==i) > 10) {
cor(a$sulfate[a$ID==i], a$nitrate[a$ID==i])
}
}
Try instead:
Explanation
We attempt a logical test. Return the output of 'yes' if ID equals 1:
We get the result of 'yes', but we also get a warning. Because:
The test checks whether each element of
a$IDis equal to1. That's a problem for theifstatement. How does R know whichTRUEorFALSEvalue to use for the test? So it just uses the first.In your code, you are passing vectors like that in your if statement. You want your if statement to return one value of
TRUEorFALSE. Or avoid it all together.Vectorization
As you become more advanced, you can avoid this loop with a vectorized function call.
Some R users have written great packages to deal with these types of problems. You will need
dplyranddata.table. Here are two quick alternatives.