I have 2 tables in consideration:
Table 1:
| Country | Age | Marks |
|---|---|---|
| A | 25 | 7 |
| B | 45 | 8 |
Table 2:
| Age Band From | Age Band To | Country A | Country B |
|---|---|---|---|
| 20 | 30 | ||
| 40 | 50 |
The output I want to have:
| Age Band From | Age Band To | Country A | Country B |
|---|---|---|---|
| 20 | 30 | 7 | NaN |
| 40 | 50 | NaN | 8 |
Here's what I tried:
for index,row in table2.iterrows():
table2.loc[index,'Country A'] = table1[(table1['Country']=='A')&
(table1['Age']>=row[0])&
(table1['Age']<=row[1])]['Marks'].values[0]
But this gives the following error:
index 0 is out of bounds for axis 0 with size 0
I think I've probably guessed where the error occurs: Whenever the compiler encounters a Age Band from Table2 for which a corresponding Age doesn't exist in Table1.
Any help on this problem is much appreciated! Thank You in advance...
There is no efficient builtin solution in pandas to perform this operation in a generic way. A cross-merge will work on small dataframes but will have poor to terrible efficiency on large datasets. In fact as it has a quadratic complexity, this will even crash python for anything above a few thousand rows.
A robust option is to
pivotthen perform aconditional_joinusingpyjanitor:Output:
Comparison of efficiencies
Using a length
Nfor both tables with random data (and deduplication):Alternatively, if you only have a one to one mapping (i.e. no overlapping intervals, only one match in table1 per row of table2), you can use
merge_asoffor a pure pandas solution:Output: