Tensorflow TFDV does not work with images

677 Views Asked by At

I'm trying to get TFDV working with RGB images as feature inputs, reading from a TFRecords file. I can read/write the image data to TFRecord files fine. Here's the relevant code snippets for writing, where img is a numpy [32,32,3] array:

feature = {'train/label': _int64_feature(y_train[i]),
           'train/image': _bytes_feature(tf.compat.as_bytes(img.tostring()))
          }

And reading back:

read_features = {'train/label': tf.FixedLenFeature([], tf.int64),
             'train/image': tf.FixedLenFeature([], tf.string)}

I can then use frombuffer and reshape to get back my image correcty.

The issue is that when I run tfdv.generate_statistics_from_tfrecord() using that TFRecords file. It throws an error :

ValueError: '\xff ...... \x87' has type str, but isn't valid UTF-8 encoding. Non-UTF-8 strings must be converted to unicode objects before being added. [while running 'GenerateStatistics/RunStatsGenerators/TopKStatsGenerator/TopK_ConvertToSingleFeatureStats']

I've tried all kinds of different ways of writing the images using astype(unicode) and more, but I can;t get this working.

Any ideas please?

Thanks, Paul

1

There are 1 best solutions below

0
On

try the following:

image_string = open(image_location, 'rb').read()
feature = {'train/label': _int64_feature(y_train[i]),
           'train/image': _bytes_feature(image_string)
          }

referred from official tutorial