Consider this code:
# Load libraries
library(RCurl)
library(TraMineR)
library(PST)
# Get data
x <- getURL("https://gist.githubusercontent.com/aronlindberg/08228977353bf6dc2edb3ec121f54a29/raw/c2539d06771317c5f4c8d3a2052a73fc485a09c6/challenge_level.csv")
data <- read.csv(text = x)
# Load and transform data
data <- read.table("thread_level.csv", sep = ",", header = F, stringsAsFactors = F)
# Create sequence object
data.seq <- seqdef(data[2:nrow(data),2:ncol(data)], missing = NA, right= NA, nr = "*")
# Make a tree
S1 <- pstree(data.seq, ymin = 0.05, L = 6, lik = FALSE, with.missing = TRUE)
# Look at contexts
cmine(S1, pmin = 0, state = "N3", l = 3)
I can then calculate the significance thresholds for lift values for two particular "association rules" in the following manner:
# Calculate lift threshold for N2-QU->N3
ngood_idea <- sum(data.seq == "N3")
nn <- nrow(data.seq)*ncol(data.seq)
p_good_idea <- ngood_idea/nn
x <- seqdef("N2-QU")
p_context <- predict(S1, x, decomp = F, output = "prob")
p_not_context_good_idea <- (1-p_context)*(1-(p_good_idea))
p_context_good_idea <- p_context*p_good_idea
N2_QU_N3_threshold <- 1+1.645*sqrt(((1/nn)*(p_not_context_good_idea/p_context_good_idea)))
# Calculate lift threshold for N2-QU->N1
nbad_idea <- sum(data.seq == "N1")
nn <- nrow(data.seq)*ncol(data.seq)
p_bad_idea <- nbad_idea/nn
p_not_context_bad_idea <- (1-p_context)*(1-(p_bad_idea))
p_context_bad_idea <- p_context*p_bad_idea
N2_QU_N1_threshold <- 1+1.645*sqrt(((1/nn)*(p_not_context_bad_idea/p_context_bad_idea)))
# Print lift thresholds
N2_QU_N3_threshold
N2_QU_N1_threshold
However, what if I want to compare two lift values with each other, to see if they are significantly different from each other (in a manner similar to how I can compare two regression coefficients to each other to see if they are significantly different from each other)? How can I accomplish this?
Utilizing this equation:
Where
$SE\beta$is the standard error of$\beta$.This equation is provided by Clogg et al (1995)
Source: https://stats.stackexchange.com/questions/93540/testing-equality-of-coefficients-from-two-different-regressions
We can analogize, using the lifts as the coefficients, and the calculation of the variance of each lift based on Lenca et al (2008, p. 619)
The z-value of the difference is
0.2556881.References
Clogg, C. C., Petkova, E., & Haritou, A. (1995). Statistical methods for comparing regression coefficients between models. American Journal of Sociology, 100(5), 1261-1293.]
Lenca, P., Meyer, P., Vaillant, B., and Lallich, S. 2008. “On selecting interestingness measures for association rules: User oriented description and multiple criteria decision aid,” European Journal of Operational Research (184:2), pp. 610–626 (doi: 10.1016/j.ejor.2006.10.059).