I have an error Subsetting a data frame in pandas. Here's my code:
`import pandas as pd
import matplotlib.pyplot as plt
import scipy
from gluonts.dataset.pandas import PandasDataset
from gluonts.dataset.split import split
from gluonts.torch import DeepAREstimator`
# Load data from a CSV file into a PandasDataset
df = pd.read_csv(
"city_temperature.csv"
)
df.head()
df = df[df["City"]=="Algiers"]
dataset = PandasDataset(df2)
I tried to subset a part of data which city name is called "Algiers" in a global city temperature data. City is a column of the data set, and I am trying to use the PandasDataset() function. I am not sure how to use this function, since I searched the help file but cannot find it. After attempting to code, I got this error:
/tmp/ipykernel_712/289762879.py:9: DtypeWarning: Columns (2) have mixed types. Specify dtype option on import or set low_memory=False.
df = pd.read_csv(
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[5], line 15
13 df.dropna()
14 df2 = df[df["City"]=="Algiers"]
---> 15 dataset = PandasDataset(df2)
17 # Split the data for training and testing
18 #training_data, test_gen = split(df, offset=-36)
19 #test_data = test_gen.generate_instances(prediction_length=12, windows=3)
(...)
31 # forecast.plot()
32 #plt.legend(["True values"], loc="upper left", fontsize="xx-large")
File <string>:12, in __init__(self, dataframes, target, feat_dynamic_real, past_feat_dynamic_real, timestamp, freq, static_features, future_length, unchecked, assume_sorted, dtype)
File /opt/conda/lib/python3.10/site-packages/gluonts/dataset/pandas.py:119, in PandasDataset.__post_init__(self, dataframes, static_features)
114 if self.freq is None:
115 assert (
116 self.timestamp is None
117 ), "You need to provide `freq` along with `timestamp`"
--> 119 self.freq = infer_freq(first(pairs)[1].index)
121 static_features = Maybe(static_features).unwrap_or_else(pd.DataFrame)
123 object_columns = static_features.select_dtypes(
124 "object"
125 ).columns.tolist()
File /opt/conda/lib/python3.10/site-packages/gluonts/dataset/pandas.py:319, in infer_freq(index)
316 if isinstance(index, pd.PeriodIndex):
317 return index.freqstr
--> 319 freq = pd.infer_freq(index)
320 # pandas likes to infer the `start of x` frequency, however when doing
321 # df.to_period("<x>S"), it fails, so we avoid using it. It's enough to
322 # remove the trailing S, e.g `MS` -> `M
323 if len(freq) > 1 and freq.endswith("S"):
File /opt/conda/lib/python3.10/site-packages/pandas/tseries/frequencies.py:193, in infer_freq(index, warn)
191 if isinstance(index, Index) and not isinstance(index, DatetimeIndex):
192 if isinstance(index, (Int64Index, Float64Index)):
--> 193 raise TypeError(
194 f"cannot infer freq from a non-convertible index type {type(index)}"
195 )
196 index = index._values
198 if not isinstance(index, DatetimeIndex):
TypeError: cannot infer freq from a non-convertible index type <class 'pandas.core.indexes.numeric.Int64Index'>
Can somebody help me with this?
I tried googling this error but I cannot find answer.
I think the index of your DataFrame is causing the problem, because it is of type Int64Index.
To resolve the problem, you can try resetting the index of your DataFrame before passing it to the PandasDataset constructor. Just try something like this:
By resetting the index with reset_index(drop=True), you remove the existing index and create a new integer-based index starting from 0. This will ensure that the index is convertible by PandasDataset. Let me know if it works!