I'm trying to train a magenta model on a set of hi hat MIDI patterns and upon running
drums_rnn_train --config='one_drum' --run_dir=/tmp/drums_rnn/logdir/run2 --sequence_example_file=/tmp/drums_rnn/sequence_examples/training_drum_tracks.tfrecord --hparams="batch_size=32,rnn_layer_sizes=[32,32]" --num_training_steps=1000
I'm seeing the below logs after a bunch of deprecation warnings.
I1003 13:21:29.452953 4436757952 events_rnn_train.py:103] Starting training loop...
I1003 13:21:29.453077 4436757952 basic_session_run_hooks.py:546] Create CheckpointSaverHook.
W1003 13:21:29.549679 4436757952 deprecation.py:323] From /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
I1003 13:21:29.589996 4436757952 monitored_session.py:246] Graph was finalized.
2020-10-03 13:21:29.590419: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-10-03 13:21:29.609557: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fd08e7ae700 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-10-03 13:21:29.609573: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
I1003 13:21:29.672456 4436757952 session_manager.py:505] Running local_init_op.
I1003 13:21:29.678084 4436757952 session_manager.py:508] Done running local_init_op.
W1003 13:21:29.695948 4436757952 deprecation.py:323] From /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py:906: start_queue_runners (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
I1003 13:21:30.106312 4436757952 basic_session_run_hooks.py:614] Calling checkpoint listeners before saving checkpoint 0...
I1003 13:21:30.106546 4436757952 basic_session_run_hooks.py:618] Saving checkpoints for 0 into ./tmp/drums_rnn/logdir/run5/train/model.ckpt.
I1003 13:21:30.187100 4436757952 basic_session_run_hooks.py:626] Calling checkpoint listeners after saving checkpoint 0...
The model remains stuck on this first "Calling checkpoint listeners after saving" line. I've verified it's not a performance issue as I can easily train models using larger batch sizes for polyphonic melodies. Has anyone seen an issue like this? Could this be due to Magenta relying on an older version of Tensorflow?