Accessing downloaded Imagenet 2012 database using Tensorflow 2.0

1k Views Asked by At

I want to load the imagenet database 2012 version using the tensorflow 2.0 library. I followed the steps mentioned in Preparing the ImageNet dataset with TensorFlow.

My final code is as follows:

import tensorflow_datasets as tfds
import os

dataset_dir = '/home/imagenet'  # directory where you downloaded the tar files to
temp_dir = '/home/temp'  # a temporary directory where the data will be stored intermediately

download_config = tfds.download.DownloadConfig(
    extract_dir=os.path.join(temp_dir, 'extracted'),
    manual_dir=dataset_dir
)

builder = tfds.builder("imagenet2012")
builder.download_and_prepare(download_config=download_config)

  • My Imagenet tar files are located in dataset_dir: /home/imagenet.

  • Tar file names are : ILSVRC2012_img_train.tar and ILSVRC2012_img_val.tar

  • Whenever I execute the above code, I get the following error:

DownloadError: Failed to get url http://www.image-net.org/challenges/LSVRC/2012/nnoupb/ILSVRC2012_img_train.tar. HTTP code: 404.

I am not sure why is it trying to download the Imagenet files. The DownloadConfig contains the manual_dir parameter which points to the location of the downloaded Imagenet tar files.

Any help is appreciated.

2

There are 2 best solutions below

1
On

In TensorFlow DataSetBuilder documentation, it is mentioned that the method download_and_prepare "downloads and prepares the dataset for reading".

Alternatively, you can use tfds.load() function to load the dataset that has already been downloaded.

According to the documentation, tfds.load() comprises of following processes:

  1. Fetch the tfds.core.DataBuilder by name
  2. Generate the data (when download=True it will try to download, else it will look the data in local files)
  3. Load the tf.data.Dataset object using as_dataset function.

You can utilize tfds.load() as follows:

dataset = tfds.load(
          'imagenet2012',
          split = ['train', 'val'], # or 'train' or 'val' if you need to load them seperately
          data_dir = dataset_dir,
          download = False
          )

More details can be found on the tfds.load() documentation.

1
On

The method that finally worked for me is detailed in the following link.

Gist is as follows:

Download 2012 imagenet files (ILSVRC2012_img_train.tar and ILSVRC2012_img_val.tar) and copy in ~/tensorflow_datasets/downloads

Rename files as:

imag-net.org_chal_LSVR_2012_nnou_ILSV_img_sIIAonqONCGKDlj942sP6Pc7w3f0rOotkWAgV8PKRbs.tar

Create the following file:

imag-net.org_chal_LSVR_2012_nnou_ILSV_img_sIIAonqONCGKDlj942sP6Pc7w3f0rOotkWAgV8PKRbs.tar.INFO

add the following metadata in the .INFO file:

{"dataset_names": ["imagenet2012_corrupted", "imagenet2012"], "original_fname": "ILSVRC2012_img_train.tar", "urls": ["http://www.image-net.org/challenges/LSVRC/2012/nnoupb/ILSVRC2012_img_train.tar"]}

For ILSVRC2012_img_val.tar, rename file as:

imag-net.org_chal_LSVR_2012_nnou_ILSV_img_x-BqbAuszwbY2-tld9ce__hGc6Xb3VBjOrRPjqBFauA.tar

Create the following file:

imag-net.org_chal_LSVR_2012_nnou_ILSV_img_x-BqbAuszwbY2-tld9ce__hGc6Xb3VBjOrRPjqBFauA.tar.INFO

add the following metadata in the .INFO file:

{"dataset_names": ["imagenet2012"], "original_fname": "ILSVRC2012_img_val.tar", "urls": ["http://www.image-net.org/challenges/LSVRC/2012/nnoupb/ILSVRC2012_img_val.tar"]}

I used the following code which used the pre-downloaded imagenet files:

data_dir = '/home/tensorflow_datasets/downloads/'  # directory where you downloaded the tar files
write_dir = '/home/temp'  # a temporary directory where the data will be stored intermediately

download_config = tfds.download.DownloadConfig(
    extract_dir=os.path.join(write_dir, 'extracted'),
    manual_dir=data_dir
)

builder = tfds.builder("imagenet2012")
builder.download_and_prepare(download_config=download_config)