cannot reshape array of size 4 into shape (4,4) in DataFrame where clause

25 Views Asked by At

Can anyone explain to me what is going on? Here's the piece of code. If I have the DataFrame of precisely length 4, the statement in the try clause throws an exception. If I make the dataframe of any other size than 4, it works. Moreover, if I remove a column 'temp2' and make a DataFrame of length 3, it will also produce exception cannot reshape array of size 3 into shape (3,3). All I want to do is put zeroes for columns where value staleness is less or equal than 1. I'd rather use the list than create a new DataFrame of all zeroes. Thanks in advance.

import pandas as pd

zeroeble_cols = ['volume','trade_count', 'temp', 'temp2']

stalenesses = [0.3, 0.4, 1.2, 3.4]
length = len(stalenesses)
df = pd.DataFrame(data = {'volume' : [10 for i in range(length)],
                          'trade_count' : [10 for i in range(length)],
                          'temp' : [1 for i in range(length)],
                          'temp2' : [1 for i in range(length)],
                          'staleness' : stalenesses})


try:
    df[zeroeble_cols] = df[zeroeble_cols].where(df['staleness'] <= 1, [0 for i in range(len(zeroeble_cols))], axis = 1)
except Exception as e:
    print(f'Exception: {e}')
    zeroDF = pd.DataFrame(data = {k : [0] for k in zeroeble_cols})
    df[zeroeble_cols] = df[zeroeble_cols].where(df['staleness'] <= 1, zeroDF, axis = 1)

print(df)

Everything is described above.

1

There are 1 best solutions below

1
e-motta On

In the try block you have a DataFrame of length 4, and the where method is trying to reshape the list [0, 0, 0, 0] into a shape of (4, 4) to match the shape of the selected subset of the DataFrame, resulting in an error.

Using where, you should replace [0 for i in range(len(zeroeble_cols))] with just 0 on axis=0 (the default), and Pandas will set the appropriate rows to this value on the selected columns (zeroeble_cols). Moreover, if you're using where, you should include in the condition the rows that you want to keep as they are, and Pandas will change the other rows according to the condition:

df[zeroeble_cols] = df[zeroeble_cols].where(df["staleness"] > 1, 0)

You can also use mask which is kind of the opposite of where, so you include in the condition the rows that you want to change:

df[zeroeble_cols] = df[zeroeble_cols].mask(df["staleness"] <= 1, 0)

Or you can use loc, which I tend to prefer:

df.loc[df["staleness"] <= 1, zeroeble_cols] = 0

All these options will yield the same result:

   volume  trade_count  temp  temp2  staleness
0       0            0     0      0        0.3
1       0            0     0      0        0.4
2      10           10     1      1        1.2
3      10           10     1      1        3.4