Setup
keras-tuner==1.1.0
tensorflow==2.8.0
Python 3.10.2
Chief and Tuner0 running on one machine
Tuner1 running on another machine
Hyperband initialization:
hp = Hyperband(
hypermodel=em.get_model,
objective='val_accuracy',
max_epochs=int(config.get(eid, 'epochs')),
project_name=project_folder,
hyperband_iterations=int(config.get(eid, 'tuner_iterations'))
)
print(hp.search_space_summary())
# TensorBoard logs
# tlogs = 'tboard_logs/' + eid
lr_schedule = LearningRateScheduler(exp_scheduler)
early_stop = int(config.get(eid, 'early_stop'))
if len(output_keys) > 1:
hp.search(train, steps_per_epoch=train_steps,
validation_data=test, validation_steps=test_steps, verbose=2,
callbacks=[EarlyStopping(patience=early_stop), lr_schedule, Combined_Accuracy(len(output_keys))])
else:
hp.search(train, steps_per_epoch=train_steps,
validation_data=test, validation_steps=test_steps, verbose=2,
callbacks=[EarlyStopping(patience=early_stop), lr_schedule])
Issue:
After Tuner0 and Tuner1 complete the search, the chief starts running the trials. Ideally the chief is suppose to only provide the variables for trials being conducted by the workers. Also, because I have restricted the chief to only run on CPU, it's very slow. Here are logs from the chief script:
Oracle server on chief is exiting in 10s.The chief will go on with post-search code.
Search space summary
Default search space size: 18
enc_dropout (Float)
{'default': 0.0, 'conditions': [], 'min_value': 0.0, 'max_value': 0.4, 'step': None, 'sampling': None}
enc_layer_norm (Float)
{'default': 0.0, 'conditions': [], 'min_value': 0.0, 'max_value': 0.6, 'step': None, 'sampling': None}
enc_l2_reg (Float)
{'default': 0.0, 'conditions': [], 'min_value': 0.0, 'max_value': 0.6, 'step': None, 'sampling': None}
pos_dropout (Float)
{'default': 0.0, 'conditions': [], 'min_value': 0.0, 'max_value': 0.4, 'step': None, 'sampling': None}
pos_layer_norm (Float)
{'default': 0.0, 'conditions': [], 'min_value': 0.0, 'max_value': 0.6, 'step': None, 'sampling': None}
pos_l2_reg (Float)
{'default': 0.0, 'conditions': [], 'min_value': 0.0, 'max_value': 0.6, 'step': None, 'sampling': None}
decoder_dropout (Float)
{'default': 0.0, 'conditions': [], 'min_value': 0.0, 'max_value': 0.4, 'step': None, 'sampling': None}
decoder_layer_norm (Float)
{'default': 0.0, 'conditions': [], 'min_value': 0.0, 'max_value': 0.6, 'step': None, 'sampling': None}
decoder_l2_reg (Float)
{'default': 0.0, 'conditions': [], 'min_value': 0.0, 'max_value': 0.6, 'step': None, 'sampling': None}
learning_rate (Float)
{'default': 1e-05, 'conditions': [], 'min_value': 1e-05, 'max_value': 9e-05, 'step': None, 'sampling': None}
enc_dense_stack (Choice)
{'default': 2, 'conditions': [], 'values': [2, 3, 4], 'ordered': True}
bert_url (Choice)
{'default': 'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-256_A-4/2', 'conditions': [], 'values': ['https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-256_A-4/2'], 'ordered': False}
pos_enc_blocks (Choice)
{'default': 2, 'conditions': [], 'values': [2, 3, 4], 'ordered': True}
pos_attn_heads (Choice)
{'default': 2, 'conditions': [], 'values': [2, 3, 4], 'ordered': True}
pos_dense_stack (Choice)
{'default': 2, 'conditions': [], 'values': [2, 3, 4], 'ordered': True}
decoder_enc_blocks (Choice)
{'default': 2, 'conditions': [], 'values': [2, 3, 4], 'ordered': True}
decoder_attn_heads (Choice)
{'default': 2, 'conditions': [], 'values': [2, 3, 4], 'ordered': True}
decoder_dense_stack (Choice)
{'default': 2, 'conditions': [], 'values': [2, 3, 4], 'ordered': True}
None
Search: Running Trial #218
Hyperparameter |Value |Best Value So Far
enc_dropout |0.37332 |0.10642
enc_layer_norm |0.15571 |0.12288
enc_l2_reg |0.48613 |0.57864
pos_dropout |0.17162 |0.14473
pos_layer_norm |0.11009 |0.26961
pos_l2_reg |0.49191 |0.20803
decoder_dropout |0.24864 |0.051037
decoder_layer_norm|0.46016 |0.57878
decoder_l2_reg |0.41414 |0.013985
learning_rate |7.8417e-05 |6.716e-05
enc_dense_stack |4 |3
bert_url |https://tfhub.d...|https://tfhub.d...
pos_enc_blocks |2 |4
pos_attn_heads |4 |4
pos_dense_stack |2 |4
decoder_enc_blocks|2 |3
decoder_attn_heads|2 |3
decoder_dense_s...|2 |2
tuner/epochs |50 |50
tuner/initial_e...|0 |17
tuner/bracket |0 |2
tuner/round |0 |2
Epoch 1/50
85/85 - 215s - loss: 149.9310 - accuracy: 0.8909 - val_loss: 103.2796 - val_accuracy: 0.9896 - lr: 6.4203e-05 - 215s/epoch - 3s/step
Epoch 2/50
85/85 - 220s - loss: 94.1549 - accuracy: 0.9897 - val_loss: 83.6212 - val_accuracy: 0.9896 - lr: 6.4203e-05 - 220s/epoch - 3s/step
Epoch 3/50
85/85 - 210s - loss: 75.2738 - accuracy: 0.9897 - val_loss: 67.1717 - val_accuracy: 0.9896 - lr: 6.4203e-05 - 210s/epoch - 2s/step
Epoch 4/50
85/85 - 190s - loss: 60.2264 - accuracy: 0.9898 - val_loss: 53.5418 - val_accuracy: 0.9896 - lr: 6.4203e-05 - 190s/epoch - 2s/step
According to Keras Tuner - Distributed Tuning you should add the distributed_strategy parameter to the Hyperband constructor.