RNN with numeric data in SKFLOW

1.4k Views Asked by At

I want to try an SKFLOW recurrent neural network on some time sequence data with real values for a binary classification problem. Each row of my data contains 57 features (variables) and I would like to look at the previous 2 samples and the next 2 samples to make predictions on each row.

My data looks like this:

sample -2: f1, f2, f3, f4,...f57, sample -1: f1, f2, f3, f4,...f57, current sample: f1, f2, f3, f4,...f57, sample +1: f1, f2, f3, f4,...f57, sample +2: f1, f2, f3, f4,...f57

I started with the SKFLOW example RNN for text classification.

MAX_DOCUMENT_LENGTH = 10

vocab_processor = skflow.preprocessing.VocabularyProcessor(MAX_DOCUMENT_LENGTH)
X_train = np.array(list(vocab_processor.fit_transform(X_train)))
X_test = np.array(list(vocab_processor.transform(X_test)))

n_words = len(vocab_processor.vocabulary_)
print('Total words: %d' % n_words)

### Models

EMBEDDING_SIZE = 50

# Customized function to transform batched X into embeddings
def input_op_fn(X):
    # Convert indexes of words into embeddings.
    # This creates embeddings matrix of [n_words, EMBEDDING_SIZE] and then
    # maps word indexes of the sequence into [batch_size, sequence_length,
    # EMBEDDING_SIZE].
    word_vectors = skflow.ops.categorical_variable(X, n_classes=n_words,
        embedding_size=EMBEDDING_SIZE, name='words')
    # Split into list of embedding per word, while removing doc length dim.
    # word_list results to be a list of tensors [batch_size, EMBEDDING_SIZE].
    word_list = skflow.ops.split_squeeze(1, MAX_DOCUMENT_LENGTH, word_vectors)
    return word_list

# Single direction GRU with a single layer
classifier = skflow.TensorFlowRNNClassifier(rnn_size=EMBEDDING_SIZE, 
    n_classes=15, cell_type='gru', input_op_fn=input_op_fn,
    num_layers=1, bidirectional=False, sequence_length=None,
    steps=1000, optimizer='Adam', learning_rate=0.01, continue_training=True)

It looks as if I should be able to just modify the input_op_fn to make it work but I am not sure how to correctly convert my numpy array to a tensor for the skflow.TensorFlowRNNClassifier. This is what it looks like for the text classification example.

>>> word_vectors.get_shape() 

TensorShape([Dimension(560000), Dimension(10), Dimension(50)])

>>> len(word_list)

10

If I am interpreting the text problem correctly then for my problem it would be TensorShape([Dimension(# rows), Dimension(57), Dimension(3)])

1

There are 1 best solutions below

10
On

Check out this unit test for RNN.

Suppose this is the numeric data: data = np.array(list([[2, 1, 2, 2, 3], [2, 2, 3, 4, 5], [3, 3, 1, 2, 1], [2, 4, 5, 4, 1]]), dtype=np.float32) labels = np.array(list([1, 0, 1, 0]), dtype=np.float32)

data is of shape (4, 5) where 4 is batch_size and 5 is the sequence_length. Then you want to have tf.split(1, 5, X) in input_op_fn(). Hope this helps. You are welcomed to submit an PR for adding an example dealing with this.