Why I am not able to generate schema using tfdv.infer_schema()?

180 Views Asked by At

"TypeError: statistics is of type StatsOptions, should be a DatasetFeatureStatisticsList proto." error shows when I am generating schema using tfdv.infer_schema() option but I am not able to do when I filter relevant feature using tfdv.StatsOptions class using feature_allowlist. So can anyone help me in this ?

features_remove= {"region","fiscal_week"}

columns= [col for col in df.columns if col not in features_remove]
stat_Options= tfdv.StatsOptions(feature_allowlist=columns)
print(stat_Options.feature_allowlist)


schema= tfdv.infer_schema(stat_Options)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-53-e61b2454028e> in <module>
----> 1 schema= tfdv.infer_schema(stat_Options)
      2 schema

C:\ProgramData\Anaconda3\lib\site-packages\tensorflow_data_validation\api\validation_api.py in infer_schema(statistics, infer_feature_shape, max_string_domain_size, schema_transformations)
     95   """
     96   if not isinstance(statistics, statistics_pb2.DatasetFeatureStatisticsList):
---> 97     raise TypeError(
     98         'statistics is of type %s, should be '
     99         'a DatasetFeatureStatisticsList proto.' % type(statistics).__name__)

TypeError: statistics is of type StatsOptions, should be a DatasetFeatureStatisticsList proto.
1

There are 1 best solutions below

0
Amine_h On

For the very simple reason that you have to pass a statistics_pb2.DatasetFeatureStatisticsList object to the tfdv.infer_schema function and not the statsOptions.

You should go this way :

features_remove= {"region","fiscal_week"}

columns= [col for col in df.columns if col not in features_remove]
stat_Options= tfdv.StatsOptions(feature_allowlist=columns)
print(stat_Options.feature_allowlist)

stats = tfdv.generate_statistics_from_dataframe(df, stat_Options)
schema= tfdv.infer_schema(stats)