Auto-encoder to reduce input data size

1.8k Views Asked by At

Currently, I want to use the autoencoder for reducing the input data size in order to use the reduced data for another neural networks. My task is to take a video and then give the images of the video to the autoencoder. When I use only a few images as input, the autoencoder works well but when I want to have a sequences of images, it does not.

Imagine taking video from a ball moving. We have for example 200 images. If I use autoencoder for 200 images the error is big but if I use only for 5 images, the reconstruction error is small and acceptable. It seems that autoencoder does not learn the sequence or temporal movement of the ball circulating. I also tries denoting stacked autoencoder but the results are not good.

Does any one know what the problem is or it is possible to use the autoencoder for this task?

1

There are 1 best solutions below

0
On

Autoencoders/Variational Autoencoders does not learn about sequences, it learns to "map" the input data to a latent space which has fewer dimensions. For example if the image is 64x64x3 you could map that to a 32 dim tensor/array.

For learning a sequence of images, you would need to connect the output of the autoencoder encoder part to a RNN (LSTM/GRU) which could learn about the sequence of the encoded frames (consecutive frames in latent space). After that, the output of the RNN could connect to the decoder part of the autoencoder so you could see the reconstructed frames.

Here you can find a GitHub project which tries to encode the video frames and then predict sequences