Pandas: Adding zero values where no rows exist (sparse)

1.5k Views Asked by At

I have a Pandas DataFrame with a MultiIndex. The MultiIndex has values in the range (0,0) to (1000,1000), and the column has two fields p and q.

However, the DataFrame is sparse. That is, if there was no measurement corresponding to a particular index (say (3,2)), there won't be any row for it (3,2). I'd like to make it not sparse, by filling in these rows with p=0 and q=0. Continuing the example, if I do df.loc[3].loc[2], I want it to return p=0 q=0, not No Such Record (as it currently does).

Clarification: By "sparse", I mean it only in the sense I used it, that zero values are omitted. I'm not referring to anything in Pandas or Numpy internals.

1

There are 1 best solutions below

5
On BEST ANSWER

Consider this df

data = {
    (1, 0): dict(p=1, q=1),
    (3, 2): dict(p=1, q=1),
    (5, 4): dict(p=1, q=1),
    (7, 6): dict(p=1, q=1),
}
df = pd.DataFrame(data).T
df

     p  q
1 0  1  1
3 2  1  1
5 4  1  1
7 6  1  1

Use reindex with fill_value=0 from a constructed pd.MultiIndex.from_product

mux = pd.MultiIndex.from_product([range(8), range(8)])
df.reindex(mux, fill_value=0)

     p  q
0 0  0  0
  1  0  0
  2  0  0
  3  0  0
  4  0  0
  5  0  0
  6  0  0
  7  0  0
1 0  1  1
  1  0  0
  2  0  0
  3  0  0
  4  0  0
  5  0  0
  6  0  0
  7  0  0
2 0  0  0
  1  0  0
  2  0  0
  3  0  0

response to comment
You can get min, max of index levels like this

def mn_mx(idx):
    return idx.min(), idx.max()

mn0, mx0 = mn_mx(df.index.levels[0])
mn1, mx1 = mn_mx(df.index.levels[1])

mux = pd.MultiIndex.from_product([range(mn0, mx0 + 1), range(mn1, mx1 + 1)])
df.reindex(mux, fill_value=0)