pandas fillna based on condition from another dataframe

85 Views Asked by At

Table 1

ID Condition Access1 Access2
1 aa nan nan
2 aa nan nan
3 bb nan nan
4 bb nan nan
5 aa nan nan
6 bb nan nan
7 aa nan nan

Table 2

Name Condition Accessor1 Accessor2
John aa Yes No
Mary aa No Yes
Bob aa Yes Yes
Ben bb Yes Yes
Peter bb No Yes

Table 1 contains the condition.

Table 2 contains the condition and the name of the accessors with their authorization. Either they can access to 1 or 2 or both.

For example, John can access to 1 but not 2, so we have to match the condition of Table 1. ID 1 condition is 'aa', so John can access1 and his name is not fill in access2 and continue to find for other name who has access to it. Mary's condition is 'aa' and has access to 2 so her name is filled in access2.

Bob's condition is 'aa' and ID 2 is 'aa'. Bob has access to 1 or 2. Hence, his name is filled in ID 2's access1 and access2.

ID 5, since Bob is the last name with condition 'aa'. Hence, it will loop back to John again. Same goes to the rest.

Result Table

ID Condition Access1 Access2
1 aa John Mary
2 aa Bob Bob
3 bb Ben Ben
4 bb Ben Peter
5 aa John Mary
6 bb Ben Ben
7 aa Bob Bob
1

There are 1 best solutions below

0
Davinder Singh On

My idea is to create a cyclic lists for a condition using table 2 and then fill table 1. My solution is mix of use of conditional statements and pandas broadcasting concept.

Let take example we want to fill table 1 column Access1 with condition aa,

  • From table 2 with condition aa we have three name to fill [John, Mary, Bob]
  • But with Accessor1 column condition we are only able fill table 1 with [John, Bob]

1. Convert Accessor columns of table 2 into dtype bool so that we can use pandas boolean indexing to create conditons

import numpy as np
import pandas as pd
from itertools import cycle

table_1 = pd.DataFrame(
    {
        "ID": list(range(1, 7)),
        "Condition": ['aa', 'aa', 'bb', 'bb', 'aa', 'bb'],
        "Access1": [np.nan]*6,
        "Access2": [np.nan]*6
    }
)

table_2 = pd.DataFrame(
    {
        "Name": ['John', 'Mary', 'Bob', 'Ben', 'Peter'],
        "Condition": ['aa', 'aa', 'aa', 'bb', 'bb'],
        "Accessor1": ['Yes', 'No', 'Yes', 'Yes', 'No'],
        "Accessor2": ['No', 'Yes', 'Yes', 'Yes', 'Yes']
    }
)

table_2['Accessor1'] = table_2['Accessor1'].apply(lambda x: True if x == 'Yes' else False)
table_2['Accessor2'] = table_2['Accessor2'].apply(lambda x: True if x == 'Yes' else False)

2. Creating lists from which we are allowed to fill the column

condition_aa = table_2[table_2['Condition'] == 'aa']
condition_bb = table_2[table_2['Condition'] == 'bb']
condition_aa_access1 = condition_aa['Name'][condition_aa['Accessor1']].to_list()
condition_aa_access2 = condition_aa['Name'][condition_aa['Accessor2']].to_list()
condition_bb_access1 = condition_bb['Name'][condition_bb['Accessor1']].to_list()
condition_bb_access2 = condition_bb['Name'][condition_bb['Accessor2']].to_list()

3. Converting list to circular list so that we can fill values of column which require more row to fill than what we have in our hand

condition_aa_access1_cycle = cycle(condition_aa_access1)
condition_bb_access1_cycle = cycle(condition_bb_access1)
condition_aa_access2_cycle = cycle(condition_aa_access2)
condition_bb_access2_cycle = cycle(condition_bb_access2)
condition_access1 = []

4. Creating column size list based on condition

for i in range(6):
    if table_1.loc[i, 'Condition'] == 'aa':
        condition_access1.append(next(condition_aa_access1_cycle))
    else:
        condition_access1.append(next(condition_bb_access1_cycle))

condition_access2 = []
for i in range(6):
    if table_1.loc[i, 'Condition'] == 'aa':
        condition_access2.append(next(condition_aa_access2_cycle))
    else:
        condition_access2.append(next(condition_bb_access2_cycle))

4. Finally filling na with above list

table_1.loc[table_1.Access1.isnull(), 'Access1'] = condition_access1
table_1.loc[table_1.Access2.isnull(), 'Access2'] = condition_access2

table_1

Results

 ID Condition Access1 Access2
  1        aa    John    Mary
  2        aa     Bob     Bob
  3        bb     Ben     Ben
  4        bb     Ben   Peter
  5        aa    John    Mary
  6        bb     Ben     Ben

Note: Lot of things are here to optimize here, would love to know new solution that full depends upon pandas concept at all.