Python panda multi-index dataframes: How to associate specific columns to a specific value from a new index
I have meteorological data in a multi-index panda dataframe. For simplicity let’s limit the example to temperatures at different pressure levels (you can imagine these as corresponding to altitudes) and precipitation. Presently the df has 3 indices: Latitude, longitude and time. The temperature values are organised as one column per pressure level. Hence the df looks like:
Idx1 | idx2 | idx3 | Precipitation(mm) | T@1000hPa | T@900hPa | T@800hPa … | .. | T@10hPa | comment |
---|---|---|---|---|---|---|---|---|---|
Lat(deg.) | Long(deg.) | T(hrs) | |||||||
30 | 20 | 0 | 0.0 | 25.1 | 13.7 | 2.1 .. | -30.5 | ||
30 | 21 | 0 | 0.1 | 25.0 | 13.6 | 1.8 .. | -30.3 | ||
30 | 22 | 0 | 0.5 | 24.6 | 13.2 | 1.6 .. | -30.4 | ||
30 | 23 | 0 | 1.2 | 22.1 | 13.0 | 1.9 .. | -30.8 | ||
30 | 24 | 0 | 1.7 | 21.9 | 12.8 | 2.1 .. | -30.5 | ||
31 | 20 | 0 | 0.9 | 23.3 | 13.1 | 2.2 .. | -31.2 | # next latitude | |
31 | 21 | 0 | 0.2 | 23.7 | 13.1 | 2.0 .. | -30.9 | ||
… | … | … | … | … | … | … | … | … | |
30 | 20 | 1 | 0.2 | 24.7 | 13.8 | 2.3 .. | -30.4 | # next time step | |
30 | 21 | 1 | 0.8 | 23.6 | 13.7 | 2.0 .. | -30.9 | ||
… | … | … | … | … | … | … | … | … |
What I would like to achieve is the following:
Idx1 | idx2 | idx3 | idx4 | T(degC) | comment |
---|---|---|---|---|---|
Lat(deg.) | Long(deg.) | T(hrs) | pressureLevel(hPa) | ||
30.0 | 20.0 | 0 | 1000 | 25.1 | |
30.0 | 21.0 | 0 | 1000 | 25.0 | |
30.0 | 22.0 | 0 | 1000 | 24.6 | |
30.0 | 23.0 | 0 | 1000 | 22.1 | |
30.0 | 24.0 | 0 | 1000 | 21.9 | |
31.0 | 20.0 | 0 | 1000 | 23.3 | |
31.0 | 21.0 | 0 | 1000 | 23.7 | |
… | … | … | … | … | |
30 | 20 | 1 | 1000 | 24.7 | |
30 | 21 | 1 | 1000 | 23.6 | |
… | … | … | … | … | |
39 | 23 | 48 | 1000 | 26.2 | |
39 | 24 | 48 | 1000 | 26.0 | # last time step (48hrs) |
… | … | … | … | … | |
30.0 | 20.0 | 0 | 900 | 13.7 | start 900hPa pressure level |
30.0 | 21.0 | 0 | 900 | 13.6 | |
30.0 | 22.0 | 0 | 900 | 13.2 | |
30.0 | 23.0 | 0 | 900 | 13.0 | |
30.0 | 24.0 | 0 | 900 | 12.8 | |
31.0 | 20.0 | 0 | 900 | 13.1 | next latitude |
31.0 | 21.0 | 0 | 900 | 13.1 | |
… | … | … | … | … | |
30 | 20 | 1 | 900 | 13.8 | next time step |
30 | 21 | 1 | 900 | 13.7 | |
… | … | … | … | … | |
39 | 23 | 48 | 900 | 14.0 | |
39 | 24 | 48 | 900 | 14.2 | # last time step (48hrs) |
… | … | … | … | … | |
30.0 | 20.0 | 0 | 10 | -30.5 | start 10hPa pressure level |
30.0 | 21.0 | 0 | 10 | - 30.3 | |
… | … | … | … | … | |
39 | 23 | 48 | 10 | -32.4 | |
39 | 24 | 48 | 10 | -32.9 | # last time step (48hrs) |
… | … | … | … | … |
Idx1 | idx2 | idx3 | idx4 | precipitation(mm) | tCloudCover | (+many others) |
---|---|---|---|---|---|---|
Lat(deg.) | Long(deg.) | T(hrs) | pressureLevel(hPa) | |||
30.0 | 20.0 | 0 | 9999 | 0.0 | 0.12 | |
30.0 | 21.0 | 0 | 9999 | 0.1 | 0.25 | |
… | … | … | … | … | … | |
39 | 23 | 48 | 9999 | 0.2 | 0.82 | |
39 | 24 | 48 | 9999 | 0.8 | 0.84 | # last time step (48hrs) |
Presently I can for example plot the temperate over a geographic area for a specific pressure level, but I cannot plot a vertical temperature profile at a specific lat/long location let alone a time series of such a profile.
How can I associate all the existing ‘T@1000hPa’ column values with the index for 1000hPa, the ‘T@900hPa’ column with the 900hPa index, the ‘T@10hPa’ column with the 10hPa index and so on? There are other parameters like wind gusts that are only available at lower levels yet not at all pressure levels. And lastly how can I associate the precipitation column (and other quantities) with 9999hPa (meaning that data is not available at different pressure levels, but only as a sum over all levels or at ground)?
One of the first stumble stones is creating the index. An attempt to insert 16 pressure levels as data column with the intention to then convert that data column into an index fails with: “ValueError: Length of values (16) does not match length of index (12400)”.
I tried different (cumbersome) approaches, but could not get it to work thus far, let alone in an elegant way. Or would I have to create a second, independent dataframe to achieve this (speaking of elegance…)? Is there anyone with a good idea who can spare the 2 minutes to give me some pointers? Thanks!