I am willing to segment a sequence into n subsequences (n known) where my points in each subsequence should be similar in a way that a piecewise linear function could fit the points (minimize the distance in each subsequence and the overall sequence).
I have tried the package ruptures with the algo Binseg which allows to provide the number of segments and I have also tried numpy/scipy to directly fit piecewise linear functions.
Then I realized I need to apply weights to my points, else what I want to achieve doesn't work great. How could I trick either solution, or do you know another solution that could directly take as an argument an array of weights?
Edit for more context:
- The curve is usually flat, steepening, flattening, concave or convex, or a mix, and consists of at most 40 points (i.e. max sequence is 40)
- There can be 1 or 2 outliers (usually the first point but not necessarily)
- I am aiming at
nbetween 4 and 8 (i.e. between 4 and 8 subsequences, defined as a parameter). - My sequence is at most 40 points, and it's in a loop with ~20 iterations. It must take less than 30sec for the whole loop, so max ~1.5sec per sequence.
Here is an example of the data:
df = pd.DataFrame({
'Term Dummy':[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32],
'Shock':[131.759276601612,-5.28111953539055,-5.30412333137685,6.19553924065018,-5.97658803726517,-7.8325986545673,-9.50784210778306,-15.7385664305344,-23.3182508381464,-29.4897840376819,-31.467551725682,-33.4723203935889,-34.6650947285782,-35.7471724234754,-36.4799776375108,-37.3264043303424,-37.4155331344124,-37.8155991350952,-38.7550833588797,-38.3608088160098,-36.7211814243519,-35.7477615422699,-34.1458248652337,-32.8287847811565,-31.4018236645802,-29.9742754473972,-28.6193854123123,-24.90985538625,-21.3217573325541,-18.7350606702909,-16.0799516664911,-16.1549305201347,-16.1433168994669],
'Weight':[1924,41170,120247,289092,311692,50265,127579,38255,225164,300420,96928,189792,177827,511969,417120,17840,72257,160679,89074,186051,102120,53770,662958,100838,765414,820977,533239,113092,60063,174082,238152,215960,115665]
})
Shock is the sequence that needs to be reduced to n subsequences.
For example, a solution with n=6 could be (grouping the index/Term dummy):
[
[0],
[1,2],
[3],
[4,5,6,7,8,9,10,11,12,13,14,15,16,17,18],
[19,20,21,22,23,24,25,26,27,28,29],
[30,31,32]
]
The plot below shows 3 curves of Shock on the top, and the Sensitivity (Delta in the plot) and the Impact. Weight is the absolute value of Sensitivity.


This question is all over the place. I'm fairly convinced that it's an x/y problem and you should actually be doing something else; but here is what you originally asked for:
It doesn't do anything clever with regard to global optimization so this may be prone to local solutions, but for your provided data it works well.
Well-ish. It works better if you use a 2nd-order differential initialization vector (especially for an example like n=8):