kaplan meier estimator not functioning properly

87 Views Asked by Clay Gibbons At 18 August 2025 at 07:04

I am trying to run a survivorship curve for field data, and the resulting curve is clearly incorrect. At the last field assessment, the survivorship should be like so:

Utah: 0.5505464480874317

Ridgecrest: 0.10817941952506596

Cactus Mine: 0.22146739130434784

Amargosa: 0.005361930294906166

Here is the code I used to generate the above numbers:

print('Utah:', UT['Garden'].loc[(UT['dummy']==True)&(UT['period']==5)].count() / UT['Garden'].loc[UT['period']==5].count())
print('Ridgecrest:', RC['Garden'].loc[(RC['dummy']==True)&(RC['period']==5)].count() / RC['Garden'].loc[RC['period']==5].count())
print('Cactus Mine:', CM['Garden'].loc[(CM['dummy']==True)&(CM['period']==5)].count() / CM['Garden'].loc[CM['period']==5].count())
print('Amargosa:', AM['Garden'].loc[(AM['dummy']==True)&(AM['period']==5)].count() / AM['Garden'].loc[AM['period']==5].count())

period is the assessment column, dummy is my dummy column for alive/dead

However, The graphs show Utah as the worst site, and Amargosa as the best: survivorship curve with all gardens over 5 assessment periods

I tried to generate this graph with both the scikit package and the lifelines package, and both gave the same result. What am I doing wrong?

scikit code:

for value in df2["Garden"].unique():
mask = df2["Garden"] == value
time_cell, survival_prob_cell = kaplan_meier_estimator(df2["dummy"][mask],
                                                       df2["doyr"][mask])
plt.step(time_cell, survival_prob_cell, where="post",
         label="%s (n = %d)" % (value, mask.sum()))
plt.ylabel("est. probability of survival $\hat{S}(t)$")
plt.xlabel("time $t$")
plt.legend(loc="best")

lifelines code:

kmf = KaplanMeierFitter()
X= df2['period'].loc[df2['Garden']=='Utah']
Y= df2['period'].loc[df2['Garden']=='Utah']
kmf.fit(X, event_observed = Y)
kmf.plot()
plt.title("Kaplan Meier estimates")
plt.xlabel("Time")
plt.ylabel("Survival")
plt.show()

and the lifelines total data survivorship curve, where the survivorship is 0 at the last assessment for some reason.

lifelines survivorship curve

Original Q&A

kaplan meier estimator not functioning properly

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in SCIKIT-SURVIVAL

Trending Questions

Popular # Hahtags

Popular Questions