input batching for linear regression classifier in python

92 Views Asked by At

I want to batch inputs from 2000-2020 using 3 year windows while creating a linear regression algorithm in python to see what those three years meant for the fourth years data.

input data example: [2000, 2001, 2002], [2001, 2002, 2003], [2002, 2003, 2004]

output data: [2003], [2004] [2005]

I want my classifier to parse through the entire data set from 2000-2020 as such and fit the linear regression to its findings as a whole.

I don't have an "I've done this so far" example code as I thought about a for loop for the years but if I'm understanding correctly, my reg.fit(x,y) command would reset the fit each time through the loop and ultimately my fit would just end up being an input of [2017, 2018, 2019] with [2020] being the output overwriting all the previous fits.

1

There are 1 best solutions below

0
On

I'm going to assume that the goal is to predict something like your example, i.e. you have 3 data points as input:

  • X1: value v over a sliding window from years N-4 to N-2
  • X2: value v over a sliding window from years N-3 to N-1
  • X3: value v over a sliding window from years N-2 to N

and you want to predict Y representing v over the next sliding window from years N-1 to N+1.

In ML design this option implies that you have three features, i.e. 3 input dimensions for the linear regression model. In other words, a single instance is made of (X1, X2, X3, Y). As usual the training set is made of multiple such instances, so you train a single model based on all the instances in the training set.

This option has the advantage that the model will take into account the variations across years in how 3 years of v impact the next year of v. However this model can only predict the next year, whereas a model trained on a single dimension X could potentially predict any year.