R: simulation of longitudinal data with an ordered categorical variable

222 Views Asked by At

I'm trying to simulate longitudinal data for my research. The data must contain a categorical variable and another ordered categorical variable. To do this, I'm using the simstudy R package, which can simulate longitudinal data.

1- For the simple categorical variable (var_c_3m), I am able to do so with the following code:

library(tidyverse)
library(simstudy)

def <- defData(varname = "id", formula = "1:10") # Creating identifiers

## Longitudinal data with varying observation and interval times 
## Source: https://kgoldfeld.github.io/simstudy/articles/longitudinal.html
def <- defData(def, varname = "nCount", dist = "noZeroPoisson", formula = 6) 
def <- defData(def, varname = "mInterval", dist = "gamma", formula = 30, variance = 0.01)
def <- defData(def, varname = "vInterval", dist = "nonrandom", formula = 0.07)
df <- genData(n, def)
df <- addPeriods(df)

# nCount defines the number of measurements for an individual
# mInterval specifies the average time between intervals for a subject
# vInterval specifies the variance of those interval times

# Simulating a categorical variable with 3 categories according to the distribution (.5, .3, .2)
def_ <- defDataAdd(varname = "var_c_3m", dist = "categorical", 
                   formula = ".5;.3;.2",
                   variance = "Ibuprofen;Paracetamol;Aspirin")

df <- addColumns(def_, df)
df

    id period time timeID    var_c_3m
 1:  1      0    0      1     Aspirin
 2:  1      1   19      2   Ibuprofen
 3:  1      2   47      3     Aspirin
 4:  1      3   66      4     Aspirin
 5:  2      0    0      5 Paracetamol
 6:  2      1   33      6 Paracetamol
 7:  2      2   81      7   Ibuprofen
 8:  2      3  126      8   Ibuprofen
 9:  2      4  156      9 Paracetamol
10:  2      5  199     10   Ibuprofen
11:  2      6  254     11 Paracetamol
12:  2      7  292     12 Paracetamol
...
48: 10      0    0     48   Ibuprofen
49: 10      1   32     49 Paracetamol
50: 10      2   68     50     Aspirin
51: 10      3   94     51   Ibuprofen
52: 10      4  122     52 Paracetamol

2- For the ordered categorical variable, I am trying to create it based on the df dataframe, taking the time (here period) and for each id variables. The simstudy package offers the genOrdCat() function, but it seems to work only for cross-sectional data, i.e., when id is not repeated as in longitudinal data.

Any solutions to add an ordered categorical variable with 3 categories to my df dataframe according to the distribution (.5, .3, .2) Or another approach would be greatly appreciated.

    id period time timeID    var_c_3m var_ord_3m
 1:  1      0    0      1     Aspirin          1
 2:  1      1   19      2   Ibuprofen          1
 3:  1      2   47      3     Aspirin          2
 4:  1      3   66      4     Aspirin          3
 5:  2      0    0      5 Paracetamol          1
 6:  2      1   33      6 Paracetamol          1
 7:  2      2   81      7   Ibuprofen          1
 8:  2      3  126      8   Ibuprofen          2
 9:  2      4  156      9 Paracetamol          2
10:  2      5  199     10   Ibuprofen          2
11:  2      6  254     11 Paracetamol          3
12:  2      7  292     12 Paracetamol          3
0

There are 0 best solutions below