I want to create a new TensorFlow Data Validation schema from scratch with fixed features name, type and presence.
import tensorflow_data_validation as tfdv
from tensorflow_metadata.proto.v0 import schema_pb2
# Initialisation
my_schem=schema_pb2.Schema()
# New features (one per available type)
for k in schema_pb2.FeatureType.items():
my_schem.feature.add(name=f'feat_{k[1]}', type=k[0])
tfdv.display_schema(schema=my_schem)
The code above returns the following schema:
| Feature name | Type | Presence | Valency | Domain |
|---|---|---|---|---|
| 'feat_0' | TYPE_UNKNOWN | - | ||
| 'feat_1' | BYTES | - | ||
| 'feat_2' | INT | - | ||
| 'feat_3' | FLOAT | - | ||
| 'feat_4' | STRUCT | - |
How can I set a Presence property to my features ?
As mentioned in the FeaturePresence documentation, two arguments are possible:
min_fraction: minimum fraction of examples that have this featuremin_count: minimum number of examples that have this featureIf
min_fraction=1, 100% of examples need to have this feature, i.e. the feature is required. If not, the feature is optional.The code above returns the following schema: