In TFX, is it possible to infer Schema with dates?

431 Views Asked by At

I'm using TFX (more precisely TensorFlow Data Validation) with the infer_schema method documented there https://www.tensorflow.org/tfx/data_validation/api_docs/python/tfdv/infer_schema. It generates a schema from a csv file describing column types.

It works well on Float, Bytes, categories... But I would also like to detect Dates. I haven't found it in tutorials or guides. The proto message that is generated supports Dates, so that would not be an issue (see TimeDomain). https://github.com/tensorflow/metadata/blob/master/tensorflow_metadata/proto/v0/schema.proto

I tried with a CSV file with that format (non-US date format), it is recognized as Byte :(

date, amount
15/08/2001, 0.3120682494
16/08/2001, 0.9310268917
17/08/2001, 0.902986235

The code is the same as in the tutorial, so more or less:

train_stats = tfdv.generate_statistics_from_csv(data_location="/content/csv_with_dates.csv")
schema = tfdv.infer_schema(statistics=train_stats)
tfdv.display_schema(schema=schema)

which displays:

Type    Presence    Valency Domain
Feature name                
'date'  BYTES   required        -
'amount'    FLOAT   required        -

Could I make it work? How?

1

There are 1 best solutions below

1
On BEST ANSWER

Not at the moment maybe in an upcoming version. if you check the link that you've mentionned you'll find that features support the following types (dates are not included):

enum FeatureType {
  TYPE_UNKNOWN = 0;
  BYTES = 1;
  INT = 2;
  FLOAT = 3;
  STRUCT = 4;
}