I am new to probabilistic programming and ML. I am following a code on deep Markov model given on pyro's website. The link to the github page to that code is:
https://github.com/pyro-ppl/pyro/blob/dev/examples/dmm/dmm.py
I understand most part of the code. The part I don't understand is mini batch idea they are using from line 175.
Question 1: Could someone explain what are they doing there when they are using mini-batch?
In pyro documentation they say
mini_batch is a three dimensional tensor, with the first dimension being the batch dimension, the second dimension being the temporal dimension, and the final dimension being the features (88-dimensional in our case)'
Question 2: What does temporal dimension means here?
Because I want to use this code on my dataset which is a sequential data. I have done one hot encoding of my data such that it's dimension is (10000,500,20) where 10000 is the number of examples/Sequences, 500 is the length of each of these sequences and 20 is the number of features.
Question 3: How can I use my one hot encoded data as mini batch here?
I'm sorry if it is a really basic question but, insights will be appreciated.
Link to that documentation is:
To optimize most of the deep learning models, we use mini-batch gradient descent. Here, A
mini_batch
refers to a small number of examples. Let's say, we have 10,000 training examples and we want to create mini-batches of 50 examples. So, in total there will be 200 mini-batches and we will perform 200 parameter updates during one iteration over the entire dataset.In your data:
(10000, 500, 20)
, the second dimension refers to the temporal dimension. You can consider you have examples with 500 timesteps(t1, t2, ..., t500)
.In your scenario, you can split your data
(10000, 500, 20)
into 200 small batches of size(50, 500, 20)
where 50 is the number of examples/Sequences in the mini-batch, 500 is the length of each of these sequences and 20 is the number of features.How do we decide the mini-batch size? Basically, we can tune the batch size just like any other hyperparameters of our model.