How to predict a multidimensional time series using python, sklearn with unknown X values

Question

How to predict a multidimensional time series using python, sklearn with unknown X values

614 Views Asked by user7337539 At 28 July 2025 at 02:48

By trying to predict future Bitcoin prices, I ran into the following predicament:

I can only predict the the y label (for instance Open Price) by providing all the X features that I used to train my model. However, what I need is a prediction into the future, which means my X feature values are also unknown.

Here is a snippet of my data (6 feature columns, 1 label):

                   Open    High     Low    HL-PCT  PCT-change  \

2016-01-01 00:00:00 430.89 432.58 429.82 0.642129 -0.030161
2016-01-01 01:00:00 431.51 432.01 429.08 0.682856 0.348829
2016-01-01 02:00:00 430.00 431.69 430.00 0.393023 -0.132383
2016-01-01 03:00:00 430.50 433.37 430.03 0.776690 -0.662252
2016-01-01 04:00:00 433.34 435.72 432.55 0.732863 -0.406794
2016-01-01 05:00:00 435.11 436.00 434.47 0.352153 -0.066605
2016-01-01 06:00:00 435.44 435.44 430.08 1.246280 0.440569
2016-01-01 07:00:00 434.71 436.00 433.50 0.576701 0.126681
2016-01-01 08:00:00 433.82 434.19 431.00 0.740139 -0.059897
2016-01-01 09:00:00 433.99 433.99 431.23 0.640030 0.460648

                 Volume (BTC)   Label

2016-01-01 00:00:00 41.32 434.87
2016-01-01 01:00:00 31.21 434.44
2016-01-01 02:00:00 12.25 433.47
2016-01-01 03:00:00 74.98 431.80
2016-01-01 04:00:00 870.80 433.28
2016-01-01 05:00:00 78.53 433.31
2016-01-01 06:00:00 177.11 433.39
2016-01-01 07:00:00 158.45 432.61
2016-01-01 08:00:00 210.59 432.80
2016-01-01 09:00:00 129.68 432.17

Here is my code:

#First get my own data
symbols = ["bitstamp_hourly_2016"]
timestamp = pd.date_range(start='2016-01-01 00:00', end='2016-12-23 09:00', 
                      freq='1h', periods=None)

df_all = bf.get_data2(symbols, timestamp)    
#Feature Slicing
df = df_all[['Open', 'High', 'Low', 'Close', 'Volume (BTC)']]    

df.loc[:,'HL-PCT'] = (df['High'] - df['Low'])/df['Low']*100.0
df.loc[:,'PCT-change'] = (df['Open'] - df['Close'])/df['Close']*100.0

#only relevant features
df= df[['Open','High', 'Low', 'HL-PCT', 'PCT-change', 'Volume (BTC)']]

df.fillna(-99999, inplace=True)

#cut off the last 24 hours
forecast_out = int(math.ceil(0.0027*len(df)))

forecast_col = 'Open'
df['Label'] = df[forecast_col].shift(-forecast_out)

#X Features and y Label
X = np.array(df.drop(['Label'],1))
X = preprocessing.scale(X)

#Last 24 hours
X_lately = X[-forecast_out:]
X = X[:-forecast_out]
y = np.array(df['Label'])
y = y[:-forecast_out]

#Train and Test set
test_size= int(math.ceil(0.3*len(df)))
X_train, y_train = X[:-test_size], y[:-test_size]
X_test, y_test= X[-test_size:], y[-test_size:]

#use linear regression
clf = LinearRegression(n_jobs=-1)
clf.fit(X_train, y_train)

#BIG QUESTION: WHAT TO INSERT HERE TO GET THE REAL FUTURE VALUES
prediction = clf.predict(X_lately)

# The coefficients
print('Coefficients: \n', clf.coef_)
# The mean squared error
print("Mean squared error: %.4f"
      % np.mean((clf.predict(X_test) - y_test) ** 2))
# Explained variance score: 1 is perfect prediction
print('Variance score: %.4f' % clf.score(X_test, y_test))

Outcome:

How many Hours were predicted:  24
Coefficients: [  5.30676009e+00   1.05641430e+02   1.44632212e+01       1.47255264e+00
-1.52247332e+00  -6.26777634e-03]
Mean squared error: 133.4017
Variance score: 0.9717

What I want to do is: Give just a new Date, use the trained model and its knowledge from the past to give me a reasonable outcome for lets say the next 24 hours (the actual future, for which I do not have data). So far, I can only work with past data on clf.predict().

This should be possible somehow with the Regression line, but how? I could also just use the Date as my X dataframe, but would that not make my model useless?

Thanks

Original Q&A

There are 1 best solutions below

**G. Iacono** · Accepted Answer

If you want to stick to linear regression and not using merely the date, you can try to predict (with whatever model you like) the regressors of your model and then perform the linear regression with the forecasted values.

Anyway it seems that the type of advice you need is not programming-related, I think your question is more appropriate for https://stats.stackexchange.com/

How to predict a multidimensional time series using python, sklearn with unknown X values

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in MACHINE-LEARNING

Related Questions in SCIKIT-LEARN

Related Questions in LINEAR-REGRESSION

Related Questions in PREDICTION

Trending Questions

Popular # Hahtags

Popular Questions