I am working with the MNIST dataset and performing different classification methods on it, but my runtimes are ridiculous, so I am looking for a way to maybe use an a portion of the training part of the set, but keep the test portion at 10K. I have tried a number of different options but nothing is working.
I need to take a sample either from the entire set, or lower the training x and y from 60000 to maybe 20000.
My current code:
library(keras)
mnist <- dataset_mnist()
train_images <- mnist$train$x
train_labels <- mnist$train$y
test_images <- mnist$test$x
test_labels <- mnist$test$y
I have tried to use the sample() function and other types of splits to no avail.
In the following example I'm downloading MNIST myself and loading it through
reticulate/numpy. Shouldn't make much difference. When you want to get a sample withsample(), you usually take a sample of indices you'll use for subsetting. To get a balanced sample, you might want to draw a specific number or proportion from each label group:Created on 2024-03-29 with reprex v2.1.0