I have 100k pics, and it doesn't fit into ram, so I need read it from disc while training.
dataset = tf.data.Dataset.from_tensor_slices(in_pics)
dataset = dataset.map(extract_fn)
def extract_fn(x):
x = tf.read_file(x)
x = tf.image.decode_jpeg(x, channels=3)
x = tf.image.resize_images(x, [64, 64])
return x
But then I try to train, I get this error
File system scheme '[local]' not implemented (file: '/content/anime-faces/black_hair/danbooru_2629248_487b383a8a6e7cc0e004383300477d66.jpg')
Can I work around it somehow? Also tried with TFRecords API, get the same error.
The Cloud TPU you use in this scenario is not colocated on the same VM where your python runs. Easiest is to stage your data on GCS and use a gs:// URI to point the TPU at it.
To optimize performance when using GCS add
prefetch(AUTOTUNE)
to your tf.data pipeline, and for small (<50GB) datasets usecache()
.