I'm building a simple Cats vs Dogs classification model and I'm struggling. I was following a tutorial which did this so I don't know what I'm doing wrong.
- Read images (2000 images, 1000 cats, 1000 dogs, original dataset has 25000 images)
- Resize to 470x320
- Flatten (End up with 2000 rows of length 452000)
- Split and fit
This is the code:
path = 'data/train/'
data = [] # Initialize list to hold image data
labels = [] # Initialize list to hold labels
count = 0
# loading and setting up my training images
for filename in os.listdir(path):
count += 1
# Check if the file is an image (you can add more image extensions if needed)
if filename.endswith('.jpg') and (count < 1000 or (count > 12500 and count < 13500)):
img = cv2.imread(os.path.join(path, filename))
img = (cv2.resize(img, (470, 320))).flatten()
data.append(img) # Append each flattened image to the list
if filename.startswith('dog'):
labels.append(1)
else:
labels.append(0)
# Convert lists to NumPy arrays
data = np.array(data)
labels = np.array(labels)
X_train, X_test, y_train, y_test = train_test_split(data, labels, test_size=0.2, shuffle=True)
clf = SVC()
parameters = [{'gamma':[0.01, 0.001, 0.0001], 'C':[1, 10, 100, 1000]}]
gridSearch = GridSearchCV(clf, parameters, verbose=True)
gridSearch.fit(X_train, y_train)
Im working in a jupyter notebook so I know my code works up until the fit line. My memory usage just skyrockets to 99% and my laptop freezes. Why can't I train this simple of a model?
Sidenote, I tried resizing to 170x100 so I'd end up with way less column (around 50k) and now the fit function has been running for hours. Hasn't completed. But also hasnt crashed my memory. I don't understand any of this.