How do I use the given XML annotation files in my CNN to classify images

1.4k Views Asked by At

I have been learning about Convolutional Neural Networks over the last month and am finally trying to understand how to use annotated images when doing some sort of categorical classification. I am currently using the images/annotations found here:

http://web.mit.edu/torralba/www/indoor.html

After downloading the tar file linked for the annotations, I dont understand how I'm supposed to use the extracted XML files to help my CNN classify images. I don't understand if they need to be formatted another way or just combined somehow with the normal images I have. I have been looking for references on how it is supposed to be done, but I haven't found anything as far as I can tell.

This is my current code that I am using to build my original image set without the annotations.

I would appreciate any guidance on what I need to do.

import matplotlib.pyplot as plt
from sklearn.preprocessing import OneHotEncoder
import os
import cv2
import pickle
import random


DATADIR = "C:/Users/cadan/OneDrive/Desktop/IndoorImages/Images"
CATEGORIES = os.listdir(DATADIR)
#CATEGORIES = ["airport_inside","artstudio","auditorium","bakery","bar","bathroom","bedroom","bookstore","bowling","buffet"]

new_shape = len(CATEGORIES)

IMG_SIZE = 100
enc = OneHotEncoder(handle_unknown='ignore', categories = 'auto')
NEW_CATEGORIES = np.array(CATEGORIES).reshape(new_shape,1)
transformed = enc.fit_transform(NEW_CATEGORIES[:]).toarray()
training_data = []


def create_training_data():
    for category in CATEGORIES:
        path = os.path.join(DATADIR, category)
        class_num = CATEGORIES.index(category)
        for img in os.listdir(path):
            try:
                img_array = cv2.imread(os.path.join(path,img))
                new_array = cv2.resize(img_array, (IMG_SIZE,IMG_SIZE))
                training_data.append([new_array,transformed[class_num]])
            except Exception as e:
                pass
            
create_training_data()



random.shuffle(training_data)

X = []
y = []

for features, label in training_data:
    X.append(features)
    y.append(label)

X = np.array(X).reshape(-1, IMG_SIZE, IMG_SIZE, 3)
y = np.array(y)

pickle_out = open("images","wb")
pickle.dump(X, pickle_out)
pickle_out.close()

pickle_out = open("categories","wb")
pickle.dump(y, pickle_out)
pickle_out.close()
1

There are 1 best solutions below

2
On

It really depends on the task that you want to solve and your description is not completely clear.

Since you are starting to get into DL, I would suggest you start with a simple classification task where you have the set of images as an input, and a set of single labels as an output (in this case, you can use the categories provided by the given dataset). To solve this, you can start with a CNN architecture, for example ResNet. In Keras, you can just import the model architecture and change the top layers to match your desired output shape (that is two lines of code!). I really like the examples given by the Keras community, here you can find a very good entry point for a simple classification task from scratch.

For your specific dataset, I would go in the following way (oversimplified):

  • Build an XML parser for the image class and pass it to a Pandas Dataframe. One column for the filename and another for the label.
  • Build the CNN as in the previous links.
  • Use a Keras ImageDataGenerator from the created Pandas Dataframe.
  • Train the model using .fit()