Firstly, I am a student very new to what I am doing. I'd like to say that the model I am building is very simplistic at this stage. Please don't judge my work for what you see here as it's totally incomplete and terrible right now and I know it... I am just asking for coding help.
I built this model in WINBUGS to get me going. It iterates over a 1000 observations and estimates beta0 and beta1. From what I understand, in this model:
- beta0 is baseline risk of disease when there is no exposure
- beta1 is the change in the odds of disease occurrence for a one-unit increase in the exposure variable
model {
# Likelihood
for (i in 1:999) {
disease[i] ~ dbern(p[i]) # prob of disease given exposure
p[i] <- 1 / (1 + exp(- (beta0 + (beta1 * exposure[i])))) # inverse logit function
}
# Prior for beta
beta0 ~ dnorm(0, 1.0E-4)
beta1 ~ dnorm(0, 1.0E-4)
}
Init Values
list(beta0=0, beta1=0)
Data
exposure[] disease[]
0.3 0
0.4 0
0.6 0
...
This model works in WINBUGS for me. Now, I need to extend this model further because I have actually ragged data which I have not considered in the model above.
Details about my ragged data:
- 1000 observations for 50 individuals
- observations per individual can range anywhere from 1 to 32. One person could have 1 obs, another 3 obs and so on...
** What does my actual data look like? **
ID | Exposure | Disease |
---|---|---|
1 | 0.1 | 1 |
1 | 0.2 | 1 |
1 | 0.4 | 1 |
2 | 0.4 | 0 |
2 | 0.9 | 0 |
3 | 0.1 | 1 |
3 | 0.3 | 1 |
3 | 0.6 | 1 |
4 | 0.2 | 1 |
**Places I've already been for help with little success: ** https://costa.eeb.ucsc.edu/wp-content/uploads/2015/10/hierarchical-slides.pdf https://www.mrc-bsu.cam.ac.uk/wp-content/uploads/manual14.pdf and more, but these were most relevant to my goal.
What am I trying to achieve at this stage?
Simply, a model in which I want to
- Sum up my exposures for each individual
- Use the summed-up exposures as "new" exposure values in the model I've shown above
- Get betas and probability of disease as I've done with the model I've shown above...
Now, I know I could just sum up my exposure values for each individual using R and pre-process the data, but I am purposely trying to avoid this. The idea is to code a model that incorporates this even though it is unnecessarily complex.
Anyway, so what have I actually tried?
Tried to add this:
for (i in 1:50) {
sumexposure[i] <- sum(exposure[id == i])
}
It didn't work. Error: "Expected a left square bracket" (sorry I forgot what it said exactly)...
I thought I might declare the new variable sumexposure
for (i in 1:50) {
sumexposure[i] <- 0
}
for (i in 1:50) {
sumexposure[i] <- sum(sumexposure[id == i])
}
It didnt work. Error: "Expected a left square bracket" (Again, sorry I forgot what it said exactly)
The list goes on of variations of the codes above. I've tried this too:
for(i in 1:50) {
for(j in 1:32) {
sumexposure[i] <- sum(exposure[i,j])
}
}
I think this is just wrong completely to be honest.
for (i in 1:50) {
sumexposure[i] <- 0
}
# Loop for individuals and exposure observations
for (i in 1:50) {
for (j in 1:32) {
sumexposure[i] <- sumexposure[i] + exposure[i, j]
}
}
In the future, I want to avoid this and just account for the varying number of observations per individual using bayesian hierarchal models but for now, I really just need to work this part out. I apologise if anything above is confusing and am willing to have questions to clarify.