Converting pandas DataFrame to datacube?

679 Views Asked by At

I have a DataFrame with four columns: X, Y, Z, and t. The values in the first three columns are discrete and represent a 3D index. The fourth column is a floating-point number. For example,

df = pd.DataFrame({'X':[1,2,3,2,3,1],
                   'Y':[1,1,2,2,3,3],
                   'Z':[1,2,1,2,1,2],
                   't':np.random.rand(6)})
#   X  Y  Z         t
#0  1  1  1  0.410462
#1  2  1  2  0.385973
#2  3  2  1  0.434947
#3  2  2  2  0.880702
#4  3  3  1  0.297190
#5  1  3  2  0.750949

How can I efficiently extend df into a 3D datacube? (With 18 vertices in this case.) The values of t in the new rows should be np.nan. In other words, I want to add all the "missing" rows, such as:

...
#6  1  1  2  nan
#7  1  1  3  nan
#8  1  2  1  nan
...

The extents of X, Y, and Z are large but not huge (say, 10, 200, and 1000 unique values). Numpy-based solutions are welcome, too!

1

There are 1 best solutions below

0
On

Here is one way to do it with product from Python standard library's itertool module:

from itertools import product

import pandas as pd


axis = ["X", "Y", "Z"]

df = (
    pd.concat(
        [
            df,
            pd.DataFrame(
                product(df["X"].unique(), repeat=df["X"].nunique()),
                columns=axis,
            ),
        ]
    )
    .drop_duplicates(subset=axis, keep="first")
    .sort_values(axis, ignore_index=True)
)

Then:

print(df)
# Output
    X  Y  Z         t
0   1  1  1  0.994531
1   1  1  2       NaN
2   1  1  3       NaN
3   1  2  1       NaN
4   1  2  2       NaN
5   1  2  3       NaN
6   1  3  1       NaN
7   1  3  2  0.937584
8   1  3  3       NaN
9   2  1  1       NaN
10  2  1  2  0.168245
11  2  1  3       NaN
12  2  2  1       NaN
13  2  2  2  0.362854
14  2  2  3       NaN
15  2  3  1       NaN
16  2  3  2       NaN
17  2  3  3       NaN
18  3  1  1       NaN
19  3  1  2       NaN
20  3  1  3       NaN
21  3  2  1  0.634389
22  3  2  2       NaN
23  3  2  3       NaN
24  3  3  1  0.953114
25  3  3  2       NaN
26  3  3  3       NaN