I am an R neophyte but this one really has me stumped--I really hope more experienced people can help me out. I am estimating a simple weighted average. In my case I have counties numerically coded (FIPS), a soil carbon value, and the area covered by a soil within the county. Soils of different types and areas occur in a county. This means a given county may have X1, X2, X3 soil types over Y1, Y2, Y3 areas. I want to know the overall weighted average soil carbon across a county based on the type of, and area covered, by a given soil. When I developed some code for a smaller subset of the data (200 rows) a correct value was returned when I compared to hand estimates. When I applied the code to the full data set (111,000+ rows), I received the error message:
1: In sum(soc[which(soc[, 1] == FIPS[i]), 8]) : integer overflow - use sum(as.numeric(.))
I did not receive this error message in the subset. When I tried the sum(as.numeric(...)) as suggested, I received a different error message. The weighted averages calculated in the subsetted data were different in the full data set--even for the same county.
Interestingly, when I saved the subsetted data to a different file name, but left the underlying data unchanged, I received the same error message as I did in the full data set. This makes me think it isn't the code or an as.numeric issue but something to do with the file itself. But I have only been working with R for about a year and know I really don't know.
Thanks in advance! This the first time I've posted, so I am not sure how to attach data, I'd be happy to send if needed.
My code:
Subsetted data:
socT<-read.table("R_SOC8.txt", header=TRUE)
FIPS<-unique(socT[,1])
WA<-c()
for(i in 1:length(FIPS)){
WA[i]<-crossprod((socT[which(socT[,1]==FIPS[i]),3]),
(socT[which(socT[,1]==FIPS[i]),8]))/
(sum(socT[which(socT[,1]==FIPS[i]),8]))
}
test8<-cbind(FIPS, WA)
print(test8)
Full data code:
soc<-read.table("R_SOC20.txt", header=TRUE)
FIPS<-unique(soc[,1])
WA<-c()
for(i in 1:length(FIPS)){
WA[i]<-crossprod((soc[which(soc[,1]==FIPS[i]),3]),
(soc[which(soc[,1]==FIPS[i]),8]))/
(sum(soc[which(soc[,1]==FIPS[i]),8]))
}
fipsoc20<-cbind(FIPS, WA)
print(fipsoc20)
Sample output:
Subset:
FIPS WA
[1,] 10001 825.0657
[2,] 10003 1327.9600
[3,] 10005 767.9470
[4,] 10007 731.9469
Full data:
FIPS WA
[1,] 10001 825.0657
[2,] 10003 NA
[3,] 10005 NA
[4,] 10007 731.9469
It looks like you're using
integer
types and should be usingdouble
. You can see this easily from theinteger
help page:?integer
Note that current implementations of R use 32-bit integers for integer vectors, so the range of representable integers is restricted to about +/-2*10^9: doubles can hold much larger integers exactly.I can't guarantee this since you didn't post the structure of your data. try
typeof()
to confirm.