I am running a structural topic model using the stm package in R. My model includes an interaction effect between faction_id and numeric_date (a measure of time). I am using the following code to first estimate and then plot proportion of topic 9 over time for both factions present in the data, (see plot).
# Model fit
model.fac.dat.int <- stm(
documents = docs,
vocab = vocab,
K = 20,
prevalence = ~ faction_id * s(numeric_date),
max.em.its = 75,
data = meta,
reportevery = 50,
verbose = TRUE,
init.type = "Spectral"
)
# Estimate effect for topic 9
est.fac.dat.int <-
estimateEffect(
formula = c(9) ~ faction_id * numeric_date,
stmobj = model.fac.dat.int,
metadata = meta,
uncertainty = "None"
)
# Plot for faction_id = Greens
plot(
est.fac.dat.int,
covariate = "numeric_date",
model = model.fac.dat.int,
method = "continuous",
moderator = "faction_id",
moderator.value = "Greens",
linecol = "green",
xlab = "Time",
ylim = c(0, 0.1),
printlegend = F
)
# Add plot for faction_id = CDU
plot(
est.fac.dat.int,
covariate = "numeric_date",
model = model.fac.dat.int,
method = "continuous",
moderator = "faction_id",
moderator.value = "CDU",
linecol = "red",
add = T,
printlegend = F
)
In a next step, I would like to allow for a break in the linear plots at numeric_date = 7000 (date of elections). I have theoretical reasons to believe the plot lines shift to a lower level after the cutoff point, and believe the current plot may hide this effect. So essentially, I would like to create an RDD-like plot.
I am not sure how to go about this, as the stm package does not specifically provide a function for this scenario. I have also considered using the rdd package, but I do not know how to combine it with my stm setup.
Would it make more sense to simply estimate the effect for numeric_date < 7000 and > 7000 separately and then add the corresponding two plots together?
Thank you, and feel free to ask if you need me to explain more.