Data quality rules with pandera for PV timeseries

39 Views Asked by At

I am trying to apply some data quality rules using pandera library. I am trying to check the quality of PV timeseries and i want to apply these 2 rules (if there are any negative values, and if a particular threshold is exceeded). I tried this code, but it seems that it executes only the 1st rule, am I missing something?:


import pandas as pd
import pandera as pa

df = pd.read_excel('Axxx PV data values only modified.xlsx')
# Convert 'timestamp' column to datetime objects
df['timestamp'] = pd.to_datetime(df['timestamp'], format='%Y-%m-%dT%H:%M:%S.%fZ')


#df = pd.DataFrame(data)

# Define the schema with element-wise Pandera rules
schema = pa.DataFrameSchema({
 "timestamp": pa.Column(pa.DateTime, required=True),
 "value": pa.Column(pa.Float, checks=[

     pa.Check(lambda s: (s >= 100),  element_wise= True, error="Value exceeds threshold"),
     pa.Check(lambda s: (s >= 0), element_wise= True, error="Negative values not allowed")

 ])
})

thanks in advance

0

There are 0 best solutions below