How Can I Fix the Error I'm Getting When Using Pandas in Jupyter?

239 Views Asked by At

I have an error Subsetting a data frame in pandas. Here's my code:

`import pandas as pd
import matplotlib.pyplot as plt
import scipy

from gluonts.dataset.pandas import PandasDataset
from gluonts.dataset.split import split
from gluonts.torch import DeepAREstimator`
# Load data from a CSV file into a PandasDataset
df = pd.read_csv(
    "city_temperature.csv"
)
df.head()
df = df[df["City"]=="Algiers"]
dataset = PandasDataset(df2)

I tried to subset a part of data which city name is called "Algiers" in a global city temperature data. City is a column of the data set, and I am trying to use the PandasDataset() function. I am not sure how to use this function, since I searched the help file but cannot find it. After attempting to code, I got this error:

/tmp/ipykernel_712/289762879.py:9: DtypeWarning: Columns (2) have mixed types. Specify dtype option on import or set low_memory=False.
  df = pd.read_csv(
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[5], line 15
     13 df.dropna()
     14 df2 = df[df["City"]=="Algiers"]
---> 15 dataset = PandasDataset(df2)
     17 # Split the data for training and testing
     18 #training_data, test_gen = split(df, offset=-36)
     19 #test_data = test_gen.generate_instances(prediction_length=12, windows=3)
   (...)
     31 #  forecast.plot()
     32 #plt.legend(["True values"], loc="upper left", fontsize="xx-large")

File <string>:12, in __init__(self, dataframes, target, feat_dynamic_real, past_feat_dynamic_real, timestamp, freq, static_features, future_length, unchecked, assume_sorted, dtype)

File /opt/conda/lib/python3.10/site-packages/gluonts/dataset/pandas.py:119, in PandasDataset.__post_init__(self, dataframes, static_features)
    114 if self.freq is None:
    115     assert (
    116         self.timestamp is None
    117     ), "You need to provide `freq` along with `timestamp`"
--> 119     self.freq = infer_freq(first(pairs)[1].index)
    121 static_features = Maybe(static_features).unwrap_or_else(pd.DataFrame)
    123 object_columns = static_features.select_dtypes(
    124     "object"
    125 ).columns.tolist()

File /opt/conda/lib/python3.10/site-packages/gluonts/dataset/pandas.py:319, in infer_freq(index)
    316 if isinstance(index, pd.PeriodIndex):
    317     return index.freqstr
--> 319 freq = pd.infer_freq(index)
    320 # pandas likes to infer the `start of x` frequency, however when doing
    321 # df.to_period("<x>S"), it fails, so we avoid using it. It's enough to
    322 # remove the trailing S, e.g `MS` -> `M
    323 if len(freq) > 1 and freq.endswith("S"):

File /opt/conda/lib/python3.10/site-packages/pandas/tseries/frequencies.py:193, in infer_freq(index, warn)
    191 if isinstance(index, Index) and not isinstance(index, DatetimeIndex):
    192     if isinstance(index, (Int64Index, Float64Index)):
--> 193         raise TypeError(
    194             f"cannot infer freq from a non-convertible index type {type(index)}"
    195         )
    196     index = index._values
    198 if not isinstance(index, DatetimeIndex):

TypeError: cannot infer freq from a non-convertible index type <class 'pandas.core.indexes.numeric.Int64Index'>

​ Can somebody help me with this?

I tried googling this error but I cannot find answer.

1

There are 1 best solutions below

0
On

I think the index of your DataFrame is causing the problem, because it is of type Int64Index.

To resolve the problem, you can try resetting the index of your DataFrame before passing it to the PandasDataset constructor. Just try something like this:

# Subset the DataFrame for the desired city
df2 = df[df["City"] == "Algiers"]

# Reset the index of the DataFrame
df2.reset_index(drop=True, inplace=True)

# Create the PandasDataset
dataset = PandasDataset(df2)

By resetting the index with reset_index(drop=True), you remove the existing index and create a new integer-based index starting from 0. This will ensure that the index is convertible by PandasDataset. Let me know if it works!