I'm trying to simulate longitudinal data for my research. The data must contain a categorical variable and another ordered categorical variable. To do this, I'm using the simstudy R package, which can simulate longitudinal data.
1- For the simple categorical variable (var_c_3m), I am able to do so with the following code:
library(tidyverse)
library(simstudy)
def <- defData(varname = "id", formula = "1:10") # Creating identifiers
## Longitudinal data with varying observation and interval times
## Source: https://kgoldfeld.github.io/simstudy/articles/longitudinal.html
def <- defData(def, varname = "nCount", dist = "noZeroPoisson", formula = 6)
def <- defData(def, varname = "mInterval", dist = "gamma", formula = 30, variance = 0.01)
def <- defData(def, varname = "vInterval", dist = "nonrandom", formula = 0.07)
df <- genData(n, def)
df <- addPeriods(df)
# nCount defines the number of measurements for an individual
# mInterval specifies the average time between intervals for a subject
# vInterval specifies the variance of those interval times
# Simulating a categorical variable with 3 categories according to the distribution (.5, .3, .2)
def_ <- defDataAdd(varname = "var_c_3m", dist = "categorical",
formula = ".5;.3;.2",
variance = "Ibuprofen;Paracetamol;Aspirin")
df <- addColumns(def_, df)
df
id period time timeID var_c_3m
1: 1 0 0 1 Aspirin
2: 1 1 19 2 Ibuprofen
3: 1 2 47 3 Aspirin
4: 1 3 66 4 Aspirin
5: 2 0 0 5 Paracetamol
6: 2 1 33 6 Paracetamol
7: 2 2 81 7 Ibuprofen
8: 2 3 126 8 Ibuprofen
9: 2 4 156 9 Paracetamol
10: 2 5 199 10 Ibuprofen
11: 2 6 254 11 Paracetamol
12: 2 7 292 12 Paracetamol
...
48: 10 0 0 48 Ibuprofen
49: 10 1 32 49 Paracetamol
50: 10 2 68 50 Aspirin
51: 10 3 94 51 Ibuprofen
52: 10 4 122 52 Paracetamol
2- For the ordered categorical variable, I am trying to create it based on the df dataframe, taking the time (here period) and for each id variables. The simstudy package offers the genOrdCat() function, but it seems to work only for cross-sectional data, i.e., when id is not repeated as in longitudinal data.
Any solutions to add an ordered categorical variable with 3 categories to my df dataframe according to the distribution (.5, .3, .2) Or another approach would be greatly appreciated.
id period time timeID var_c_3m var_ord_3m
1: 1 0 0 1 Aspirin 1
2: 1 1 19 2 Ibuprofen 1
3: 1 2 47 3 Aspirin 2
4: 1 3 66 4 Aspirin 3
5: 2 0 0 5 Paracetamol 1
6: 2 1 33 6 Paracetamol 1
7: 2 2 81 7 Ibuprofen 1
8: 2 3 126 8 Ibuprofen 2
9: 2 4 156 9 Paracetamol 2
10: 2 5 199 10 Ibuprofen 2
11: 2 6 254 11 Paracetamol 3
12: 2 7 292 12 Paracetamol 3