TensorFlow io.decode_csv and select data along one dimension

228 Views Asked by At

I have a set of comma-separated records in a collection of files, on which I call the TensorFlow API function tf.io.decode_csv(). Records look like the following one:

tf.Tensor(b'249,EMR,2019-09-13,65.55,65.58,66.2099,65.16', shape=(), dtype=string)

I use a default object for the records of the type:

defaults = [tf.constant([0])] + [tf.constant([], dtype=tf.string)] + [tf.constant([], dtype=tf.string)] + [tf.constant([0.0])]*4

Running the decode_csv() function:

ds = SP500fileNamesShuffle.map(lambda fn : tf.io.decode_csv(fn, defaults))

I get as expected a dataset of the type

<DatasetV1Adapter shapes: ((), (), (), (), (), (), ()), types: (tf.int32, tf.string, tf.string, tf.float32, tf.float32, tf.float32, tf.float32)>

There are 7 types per record, hence the tuple of 7 elements. I don't know how to iterate over a specific element, say, the elements on the second tuple. I would be grateful for your help. I have tried:

for e in ds.take(10):
    print(e[1])

and I get the following error message:

{{function_node __inference_Dataset_map_<lambda>_6530}} Expect 7 fields but have 1 in record 0
     [[{{node DecodeCSV}}]] [Op:IteratorGetNextSync]
1

There are 1 best solutions below

0
On

Just to close this topic as the solution is simple: I didn't specify the "defaults" record properly. In this particular case it should be:

defaults = [tf.constant([0])] + [tf.constant([''], dtype=tf.string)] + [tf.constant([''], dtype=tf.string)] + [tf.constant([0.0])]*4

And the decoding works after that.