For extracting features from video frames (2 sample/sec) I use keras framework in python and load VGG16 that input size is (150,150,3) and output size is (4,4,512). After the feature extraction step I want to cluster frame features with Hierarchical K-Means.
My problems are as follow:
I save each frame features in a vector which size is 8192. For a video that have 8000 frames if only reduce each frame size to (150,150) and extract features then I have a feature matrix with size (640,8192). As you can see feature matrix for even one video is very large ans besides "sparse". What is the best way to reduction its dimension?
What is the best metric for calculation distance between two pair of frame features? The space is so sparse and even feature values are so small, so Euclidean Distance is not a wise choise!!