What does BUFFER_SIZE do in Tensorflow Dataset shuffling?

4k Views Asked by At

So I've been playing around with this code: https://www.tensorflow.org/tutorials/generative/dcgan and have almost developed a good idea about its functioning. However, I can't quite discover what is the BUFFER_SIZE variable's use. I suspect that it may be used to create a subset of the database of size BUFFER_SIZE and then the batches are taken from this subset, but I don't see the point on it and neither can find someone explaining it.

So, if someone could explain me what BUFFER_SIZE does, I would be thankful ❤

2

There are 2 best solutions below

0
On BEST ANSWER

It's used as the buffer_size argument in tf.data.Dataset.shuffle. Have you read the docs?

This dataset fills a buffer with buffer_size elements, then randomly samples elements from this buffer, replacing the selected elements with new elements. For perfect shuffling, a buffer size greater than or equal to the full size of the dataset is required.

For instance, if your dataset contains 10,000 elements but buffer_size is set to 1,000, then shuffle will initially select a random element from only the first 1,000 elements in the buffer. Once an element is selected, its space in the buffer is replaced by the next (i.e. 1,001-st) element, maintaining the 1,000 element buffer.

0
On

In the documentation of TensorFlow, the buffer_size define a random first element between the size of buffer_size. After choose this random one, the next numbers will follow the size of buffer_size

samples = 1000
buffer_size = 100

choose a random between (0, 100)
random = 37
the sample will be (37 to 137)