Why does df.shift() not work when using modin?

109 Views Asked by At

In the following example code I am trying to use the df.shift() function which pandas normally executes flawlessly. However, when using modin, the .shift() function ceases to work. Is there any way to fix this?

import modin.pandas as pd
import ray
ray.init(runtime_env={'env_vars': {'__MODIN_AUTOIMPORT_PANDAS__': '1'}})

df = pd.read_csv('dataframe.csv')

df['test'] = 1
df['shift'] = df['test'].shift()

ValueError: Length mismatch: Expected axis has 2 elements, new values have 604402 elements

2

There are 2 best solutions below

0
Mahesh Vashishtha On

This is a known bug in Modin. To work around the bug, you can use modin's _to_pandas method, apply the method on the pandas dataframe, then convert the result back to Modin, e.g.:

import modin.pandas as pd
import numpy as np

df = pd.DataFrame(np.arange(16).reshape(-1,4), columns=list(range(4)), index=[0,1,2,0])
pandas_df = df._to_pandas()
pandas_result = pandas_df[0].shift(1)
modin_result = pd.Series(pandas_result)
print(modin_result)
0
Karthik Velayutham On

Following up on Mahesh's comment (since I don't have enough reputation to add a comment), a fix has now been merged into main: https://github.com/modin-project/modin/pull/5823