I have created the following pandas dataframe:
import numpy as np
import pandas as pd
ds = {'col1' : [11,22,33,24,15,6,7,68,79,10,161,12,113,147,115]}
df = pd.DataFrame(data=ds)
predFeature = []
for i in range(len(df)):
predFeature.append(0)
predFeature[i] = predFeature[i-1]+1
df['predFeature'] = predFeature
arrayTarget = []
arrayPred = []
target = np.array(df['col1'])
predFeature = np.array(df['predFeature'])
for i in range(len(df)):
arrayTarget.append(target[i-4:i])
arrayPred.append(predFeature[i-4:i])
df['arrayTarget'] = arrayTarget
df['arrayPred'] = arrayPred
Which looks like this:
col1 predFeature arrayTarget arrayPred
0 11 1 [] []
1 22 2 [] []
2 33 3 [] []
3 24 4 [] []
4 15 5 [11, 22, 33, 24] [1, 2, 3, 4]
5 6 6 [22, 33, 24, 15] [2, 3, 4, 5]
6 7 7 [33, 24, 15, 6] [3, 4, 5, 6]
7 68 8 [24, 15, 6, 7] [4, 5, 6, 7]
8 79 9 [15, 6, 7, 68] [5, 6, 7, 8]
9 10 10 [6, 7, 68, 79] [6, 7, 8, 9]
10 161 11 [7, 68, 79, 10] [7, 8, 9, 10]
11 12 12 [68, 79, 10, 161] [8, 9, 10, 11]
12 113 13 [79, 10, 161, 12] [9, 10, 11, 12]
13 147 14 [10, 161, 12, 113] [10, 11, 12, 13]
14 115 15 [161, 12, 113, 147] [11, 12, 13, 14]
I need to generate a new column called slope, which corresponds to the coefficient of a linear regression trained for each row and for which:
- target = each array contained in
arrayTarget - predictive features = each array contained in
arrayPred
For example:
the
slopefor the first 4 rows isnull.the slope for the 5th row is given by the coefficient of the linear regression which considers the following values:
- independent (or predictive) values:
[1, 2, 3, 4] - dependent (or predicted) values:
[11, 22, 33, 24]The result would be:0.10204081632653061.
- independent (or predictive) values:
the slope for the 6th row is given by the coefficient of the linear regression which considers the following values:
- independent (or predictive) values:
[2, 3, 4, 5] - dependent (or predicted) values:
[22, 33, 24, 15]The result would be:-0.09090909090909091.
- independent (or predictive) values:
And so on.
Can anyone help me, please?
You can define a function that uses
sklearn.linear_model.LinearRegressionand apply it onaxis=1. Won't be very efficient if your dataframe is too large.