Error as.numeric: weighted average subset works full data set doesn't

90 Views Asked by At

I am an R neophyte but this one really has me stumped--I really hope more experienced people can help me out. I am estimating a simple weighted average. In my case I have counties numerically coded (FIPS), a soil carbon value, and the area covered by a soil within the county. Soils of different types and areas occur in a county. This means a given county may have X1, X2, X3 soil types over Y1, Y2, Y3 areas. I want to know the overall weighted average soil carbon across a county based on the type of, and area covered, by a given soil. When I developed some code for a smaller subset of the data (200 rows) a correct value was returned when I compared to hand estimates. When I applied the code to the full data set (111,000+ rows), I received the error message:

1: In sum(soc[which(soc[, 1] == FIPS[i]), 8]) : integer overflow - use sum(as.numeric(.))

I did not receive this error message in the subset. When I tried the sum(as.numeric(...)) as suggested, I received a different error message. The weighted averages calculated in the subsetted data were different in the full data set--even for the same county.

Interestingly, when I saved the subsetted data to a different file name, but left the underlying data unchanged, I received the same error message as I did in the full data set. This makes me think it isn't the code or an as.numeric issue but something to do with the file itself. But I have only been working with R for about a year and know I really don't know.

Thanks in advance! This the first time I've posted, so I am not sure how to attach data, I'd be happy to send if needed.

My code:

Subsetted data:

socT<-read.table("R_SOC8.txt", header=TRUE) 
FIPS<-unique(socT[,1])
WA<-c()


for(i in 1:length(FIPS)){
    WA[i]<-crossprod((socT[which(socT[,1]==FIPS[i]),3]),
    (socT[which(socT[,1]==FIPS[i]),8]))/
    (sum(socT[which(socT[,1]==FIPS[i]),8]))
}


test8<-cbind(FIPS, WA)

print(test8)

Full data code:

soc<-read.table("R_SOC20.txt", header=TRUE)
FIPS<-unique(soc[,1]) 
WA<-c()


for(i in 1:length(FIPS)){
    WA[i]<-crossprod((soc[which(soc[,1]==FIPS[i]),3]),
    (soc[which(soc[,1]==FIPS[i]),8]))/
    (sum(soc[which(soc[,1]==FIPS[i]),8]))
 }


fipsoc20<-cbind(FIPS, WA)

print(fipsoc20)

Sample output:

Subset:

     FIPS        WA
[1,] 10001  825.0657
[2,] 10003 1327.9600
[3,] 10005  767.9470
[4,] 10007  731.9469

Full data:

           FIPS       WA
   [1,]  10001 825.0657
   [2,]  10003       NA
   [3,]  10005       NA
   [4,]  10007 731.9469
1

There are 1 best solutions below

2
On

It looks like you're using integer types and should be using double. You can see this easily from the integer help page:

?integer Note that current implementations of R use 32-bit integers for integer vectors, so the range of representable integers is restricted to about +/-2*10^9: doubles can hold much larger integers exactly.

I can't guarantee this since you didn't post the structure of your data. try typeof() to confirm.