Converting image folder to numpy array is consuming the entire RAM

354 Views Asked by At

I am trying to convert the celebA dataset(https://www.kaggle.com/jessicali9530/celeba-dataset) images folder into a numpy array for later to be converted into a .pkl file(for using the data as simply as mnist or cifar).

I am willing to find a better way of converting since this method is absolutely consuming the whole RAM.

from PIL import Image
import pickle
from glob import glob
import numpy as np

TARGET_IMAGES = "img_align_celeba/*.jpg"

def generate_dataset(glob_files):
   dataset = []
   for _, file_name in enumerate(sorted(glob(glob_files))):
       img = Image.open(file_name)
       pixels = list(img.getdata())
       dataset.append(pixels)
   return np.array(dataset)

celebAdata = generate_dataset(TARGET_IMAGES)

I am rather curious on how the mnist authors did this themselves but any approach that works is welcome.

1

There are 1 best solutions below

0
bugo99iot On BEST ANSWER

You can transform any kind of data on the fly in Keras and load in memory one batch at the time during training. See documentation, search for 'Example of using .flow_from_directory(directory)'.