I am using Magenta's Polyphony RNN to generate MIDI music. I have the dataset, but when it comes to Magenta's preprocessing, it produces fewer outputs than I expected to produce (it should be the same as inputs). I can observe that there are some metrics to get rid of the inputs, but I don't really understand which or why are they being used.
I don't really understand either how can I get the exact amount of inputs I don't get as outputs (some pipelines generate multiple variations).
Attached to this post, I'll leave one screenshot of how is the log of the pipeline once it has processed everything:
Polyphony RNN pipeline log:
INFO:tensorflow:Processed 3902 inputs total. Produced 819 outputs.
I0426 09:22:31.260396 140551050180416 pipeline.py:388] Processed 3902 inputs total. Produced 819 outputs.
INFO:tensorflow:DAGPipeline_PolyExtractor_eval_polyphonic_track_lengths_in_bars:
[1,10): 9
[10,20): 45
I0426 09:22:31.260454 140551050180416 statistics.py:137] DAGPipeline_PolyExtractor_eval_polyphonic_track_lengths_in_bars:
[1,10): 9
[10,20): 45
INFO:tensorflow:DAGPipeline_PolyExtractor_eval_polyphonic_tracks_discarded_more_than_1_program: 13317
I0426 09:22:31.260496 140551050180416 statistics.py:137] DAGPipeline_PolyExtractor_eval_polyphonic_tracks_discarded_more_than_1_program: 13317
INFO:tensorflow:DAGPipeline_PolyExtractor_eval_polyphonic_tracks_discarded_too_long: 81
I0426 09:22:31.260534 140551050180416 statistics.py:137] DAGPipeline_PolyExtractor_eval_polyphonic_tracks_discarded_too_long: 81
INFO:tensorflow:DAGPipeline_PolyExtractor_eval_polyphonic_tracks_discarded_too_short: 9135
I0426 09:22:31.260581 140551050180416 statistics.py:137] DAGPipeline_PolyExtractor_eval_polyphonic_tracks_discarded_too_short: 9135
INFO:tensorflow:DAGPipeline_PolyExtractor_training_polyphonic_track_lengths_in_bars:
[1,10): 369
[10,20): 234
[20,30): 162
I0426 09:22:31.260628 140551050180416 statistics.py:137] DAGPipeline_PolyExtractor_training_polyphonic_track_lengths_in_bars:
[1,10): 369
[10,20): 234
[20,30): 162
INFO:tensorflow:DAGPipeline_PolyExtractor_training_polyphonic_tracks_discarded_more_than_1_program: 119675
I0426 09:22:31.260667 140551050180416 statistics.py:137] DAGPipeline_PolyExtractor_training_polyphonic_tracks_discarded_more_than_1_program: 119675
INFO:tensorflow:DAGPipeline_PolyExtractor_training_polyphonic_tracks_discarded_too_long: 711
I0426 09:22:31.260702 140551050180416 statistics.py:137] DAGPipeline_PolyExtractor_training_polyphonic_tracks_discarded_too_long: 711
INFO:tensorflow:DAGPipeline_PolyExtractor_training_polyphonic_tracks_discarded_too_short: 133902
I0426 09:22:31.260736 140551050180416 statistics.py:137] DAGPipeline_PolyExtractor_training_polyphonic_tracks_discarded_too_short: 133902
INFO:tensorflow:DAGPipeline_RandomPartition_eval_poly_tracks_count: 388
I0426 09:22:31.260771 140551050180416 statistics.py:137] DAGPipeline_RandomPartition_eval_poly_tracks_count: 388
INFO:tensorflow:DAGPipeline_RandomPartition_training_poly_tracks_count: 3514
I0426 09:22:31.260805 140551050180416 statistics.py:137] DAGPipeline_RandomPartition_training_poly_tracks_count: 3514
INFO:tensorflow:DAGPipeline_TranspositionPipeline_eval_skipped_due_to_range_exceeded: 30
I0426 09:22:31.260839 140551050180416 statistics.py:137] DAGPipeline_TranspositionPipeline_eval_skipped_due_to_range_exceeded: 30
INFO:tensorflow:DAGPipeline_TranspositionPipeline_eval_transpositions_generated: 22587
I0426 09:22:31.260874 140551050180416 statistics.py:137] DAGPipeline_TranspositionPipeline_eval_transpositions_generated: 22587
INFO:tensorflow:DAGPipeline_TranspositionPipeline_training_skipped_due_to_range_exceeded: 187
I0426 09:22:31.260908 140551050180416 statistics.py:137] DAGPipeline_TranspositionPipeline_training_skipped_due_to_range_exceeded: 187
INFO:tensorflow:DAGPipeline_TranspositionPipeline_training_transpositions_generated: 255053
The relevant snippet of code in which these songs are filtered, is found with the metric executed between lines 447 - 453, which is called DAGPipeline_PolyExtractor_training_polyphonic_tracks_discarded_more_than_1_program
Thank you in advance.