I have some values below which show weekly results over a period of time. On week 19, there was a new process implemented which was supposed to lower the results further.
However, it is clear that there was already a week/week reduction in the results before Week 19. What is the best way to quantify the impact of the 'New Process' versus the rate of improvement before Week 19? I do not want to 'double-count' the effect of the New Process.
Week # Result Status
Week 1 849.27 NA
Week 2 807.59 NA
Week 3 803.59 NA
Week 4 849.7 NA
Week 5 852.19 NA
Week 6 845.06 NA
Week 7 833.77 NA
Week 8 788.46 NA
Week 9 800.32 NA
Week 10 814.66 NA
Week 11 829.21 NA
Week 12 799.49 NA
Week 13 812.24 NA
Week 14 772.62 NA
Week 15 782.13 NA
Week 16 779.66 NA
Week 17 752.86 NA
Week 18 758.39 NA
Week 19 738.47 New Process
Week 20 721.11 New Process
Week 21 642.04 New Process
Week 22 718.72 New Process
Week 23 743.47 New Process
Week 24 709.57 New Process
Week 25 704.48 New Process
Week 26 673.51 New Process
Trying out the example, it looks like the improvement is around 6%, but with a wide confidence interval. A break in trend doesn't look significant.
The first models below are estimated with OLS with a shift in the constant. In the first case also a shift in trend.
I use Poisson in the last model, since the values of the dependent variable are positive and it estimates an exponential model. The standard errors are correct if we use robust covariance matrix. (We are using Poisson just to estimate an exponential model, we don't assume that the underlying distribution is Poisson).
Notes: It's a pure numpy version, I didn't bother using pandas or patsy formulas. Poisson has optimization problems if some of the explanatory variables are too large.