Double check on columns from two DataFrame

65 Views Asked by At

I have an issue implementing a check from a column of one DataFrame to a column of a different DataFrame. I have two Data Frame, one is the child of the other. First one has index with info of surface and building id, and a column with solar radiation values, here a sample:

> Index                                         Irradiation
> #Igh                                            1221900.0
> 1589(Building1589):233687():Irradiance(W/m2)     371071.0
> 1589(Building1589):233688():Irradiance(W/m2)     373723.0
> 1589(Building1589):233689():Irradiance(W/m2)     374250.0

and the second one has basically the same info, but a lower number of rows (I filtered the rows to get only ones with a minimum value of irradiation, here is the sample:

     Irradiation building_id surface_id
1       744146.0        1589     233720
2       756753.0        1589     233721
3       764957.0        1589     233722
4       739676.0        1589     233723
5       755673.0        1589     233724

Basically, I would like to check that everything worked properly by looking at the irradiation of DataFrame2 and checking if the building id value and the surface id value of that rows are in the Index of the DataFrame 1, so that no mistake occurred. I can easily turn the index of DataFrame1 into a column, but then I don't know how to set a boolean check between the two DataFrames.

1

There are 1 best solutions below

0
Colim On BEST ANSWER
df1 = pd.DataFrame(
    {
        "Index": [
            "#Igh",
            "1589(Building1589):233687():Irradiance(W/m2)",
            "1589(Building1589):233688():Irradiance(W/m2)",
            "1589(Building1589):233689():Irradiance(W/m2)",
            "0000(Building0000):233689():Irradiance(W/m2)",
            "1589(Building1589):000000():Irradiance(W/m2)",
            "1589(Building0000):233724():Irradiance(W/m2)"
        ],
        "Irradiation": [1221900.0, 371071.0, 373723.0, 374250.0, 1, 2 , 3],
    }
)

# df2
#      Irradiation building_id surface_id
# 1       744146.0        1589     233720
# 2       756753.0        1589     233721
# 3       764957.0        1589     233722
# 4       739676.0        1589     233723
# 5       755673.0        1589     233724

df2 = pd.DataFrame(
    {
        "Irradiation": [744146.0, 756753.0, 764957.0, 739676.0, 755673.0],
        "building_id": [1589, 1589, 1589, 1589, 1589],
        "surface_id": [233720, 233721, 233722, 233723, 233724]
    }
)

df1["building_id"] = df1["Index"].str[:4].astype('str')
df1["surface_id"] = df1["Index"].str[19:25].astype('str')

#check if building_id surface_id of df2 are in df1, and add to df2["check"]
df2["check"] = df2.building_id.astype("str").isin(df1.building_id) & \
            df2.surface_id.astype("str").isin(df1.surface_id)
print(df2)