Pandas->Styler: Hide specific cells when two or more rows have cells with the same value

539 Views Asked by At

I have in the following data frame three rows describing the same person, but with different phones:

data = {'Name':['tom', 'tom', 'tom', 'nick', 'krish', 'jack'],
        'Age':[20, 20, 20, 21, 19, 18],
        'Phone':[1234, 2345, 4576, 7890, 6767, 7676]}
df = pd.DataFrame(data)
In [16]: df
Out[16]:
    Name  Age  Phone
0    tom   20   1234
1    tom   20   2345
2    tom   20   4576
3   nick   21   7890
4  krish   19   6767
5   jack   18   7676

I would like to generate html in which the style is defined to hide duplicated cells in the following matching rows leaving only the difference:

table_output

Name    Age  Phone
tom     20   1234
             2345
             4576
nick    21   7890
krish   19   6767
jack    18   7676

How can I:

  1. identify duplicate values (I am using the following PSB but perhaps there is a better way)
  2. Take those and hide them in Styler object similar to the table_output above?
In [22]: df.duplicated("Name")
Out[22]:
0    False
1     True
2     True
3    False
4    False
5    False
dtype: bool

In [23]: df.duplicated("Age")
Out[23]:
0    False
1     True
2     True
3    False
4    False
5    False
dtype: bool

I've managed to create

df_dup = df[df["Name"].duplicated()]
df_dup.style.hide_columns([0,1])

But I couldnt intersect df with df_dup -> style wise..

Thanks.

1

There are 1 best solutions below

1
On BEST ANSWER

With the following dataframe in a Jupyter notebook:

import pandas as pd

df = pd.DataFrame(
    data={
        "Name": ["tom", "tom", "tom", "nick", "krish", "jack"],
        "Age": [20, 20, 20, 21, 19, 18],
        "Phone": [1234, 2345, 4576, 7890, 6767, 7676],
    }
)
df

Output

enter image description here

You can do this:

def mask_values(val):
    return f"opacity: {0}"


df.style.applymap(
    mask_values,
    subset=(
        df[df.duplicated(subset=["Name", "Age"], keep="first")].index,
        ["Name", "Age"],
    ),
)

Output

enter image description here

You can check that the dataframe beneath is unchanged:

df.loc[0, "Name"]  # Output: 'tom'