TFLearn: Create a Train Test set using Only tflearn

1k Views Asked by At

I'm using my own dataset and I want to do a Deep Neural Network using tflearn.

This is a part of my code.

import tflearn
from tflearn.data_utils import load_csv

#Load the CSV File    
X, Y = load_csv('data.csv')

#Split Data in train and Test with tflearn

¿How could I do a function in TFLearn to split X, Y and get train_X, test_X, train_Y, test_Y ?

I know how to do with numpy and other libraries, but I would like to do using tflearn.

2

There are 2 best solutions below

1
On

In the fit method for the tflearn.DNN model in tflearn (http://tflearn.org/models/dnn/), you can set the option validation_set to a float less than 1, and then the model will automatically split your input in a training and validation set, while training.

Example

import tflearn
from tflearn.data_utils import load_csv

#Load the CSV File    
X, Y = load_csv('data.csv')

# Define some network
network = ... 

# Training
model = tflearn.DNN(network, tensorboard_verbose=0)
model.fit(X, Y, n_epoch=20, validation_set=0.1) # will use 10% for validation
     

This will create a validation set while training, which is different from a test set. If you just want a train and test set, I recommend taking a look at the train_test_split function from sklearn, which also can split your data for you.

0
On

the answer from Nicki is the simplest solution i think.

But, another easy solution is to use sklearn and the train_test_split()

from sklearn.model_selection import train_test_split 
data, target = load_raw_data(data_size) # own method, data := ['hello','...'] target := [1 0 -1] label
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.33, random_state=42)

Or the numpy version:

import numpy as np
texts, target = load_raw_data(data_size) # own method, texts := ['hello','...'] target := [1 0 -1] label
train_indices = np.random.choice(len(target), round(0.8 * len(target)), replace=False)
test_indices = np.array(list(set(range(len(target))) - set(train_indices)))
x_train = [x for ix, x in enumerate(texts) if ix in train_indices]
x_test = [x for ix, x in enumerate(texts) if ix in test_indices]
y_train = np.array([x for ix, x in enumerate(target) if ix in train_indices])
y_test = np.array([x for ix, x in enumerate(target) if ix in test_indices])

So it's your choice, happy coding :)