Keras Tuner - Chief running trials instead of the workers

Question

Keras Tuner - Chief running trials instead of the workers

217 Views Asked by Shishir Mane At 17 August 2025 at 16:06

Setup
keras-tuner==1.1.0
tensorflow==2.8.0
Python 3.10.2

Chief and Tuner0 running on one machine
Tuner1 running on another machine

Hyperband initialization:

hp = Hyperband(
    hypermodel=em.get_model,
    objective='val_accuracy',
    max_epochs=int(config.get(eid, 'epochs')),
    project_name=project_folder,
    hyperband_iterations=int(config.get(eid, 'tuner_iterations'))
)

print(hp.search_space_summary())

# TensorBoard logs
# tlogs = 'tboard_logs/' + eid

lr_schedule = LearningRateScheduler(exp_scheduler)
early_stop = int(config.get(eid, 'early_stop'))
if len(output_keys) > 1:
    hp.search(train, steps_per_epoch=train_steps,
              validation_data=test, validation_steps=test_steps, verbose=2,
              callbacks=[EarlyStopping(patience=early_stop), lr_schedule, Combined_Accuracy(len(output_keys))])
else:
    hp.search(train, steps_per_epoch=train_steps,
              validation_data=test, validation_steps=test_steps, verbose=2,
              callbacks=[EarlyStopping(patience=early_stop), lr_schedule])

Issue:
After Tuner0 and Tuner1 complete the search, the chief starts running the trials. Ideally the chief is suppose to only provide the variables for trials being conducted by the workers. Also, because I have restricted the chief to only run on CPU, it's very slow. Here are logs from the chief script:

Oracle server on chief is exiting in 10s.The chief will go on with post-search code.
Search space summary
Default search space size: 18
enc_dropout (Float)
{'default': 0.0, 'conditions': [], 'min_value': 0.0, 'max_value': 0.4, 'step': None, 'sampling': None}
enc_layer_norm (Float)
{'default': 0.0, 'conditions': [], 'min_value': 0.0, 'max_value': 0.6, 'step': None, 'sampling': None}
enc_l2_reg (Float)
{'default': 0.0, 'conditions': [], 'min_value': 0.0, 'max_value': 0.6, 'step': None, 'sampling': None}
pos_dropout (Float)
{'default': 0.0, 'conditions': [], 'min_value': 0.0, 'max_value': 0.4, 'step': None, 'sampling': None}
pos_layer_norm (Float)
{'default': 0.0, 'conditions': [], 'min_value': 0.0, 'max_value': 0.6, 'step': None, 'sampling': None}
pos_l2_reg (Float)
{'default': 0.0, 'conditions': [], 'min_value': 0.0, 'max_value': 0.6, 'step': None, 'sampling': None}
decoder_dropout (Float)
{'default': 0.0, 'conditions': [], 'min_value': 0.0, 'max_value': 0.4, 'step': None, 'sampling': None}
decoder_layer_norm (Float)
{'default': 0.0, 'conditions': [], 'min_value': 0.0, 'max_value': 0.6, 'step': None, 'sampling': None}
decoder_l2_reg (Float)
{'default': 0.0, 'conditions': [], 'min_value': 0.0, 'max_value': 0.6, 'step': None, 'sampling': None}
learning_rate (Float)
{'default': 1e-05, 'conditions': [], 'min_value': 1e-05, 'max_value': 9e-05, 'step': None, 'sampling': None}
enc_dense_stack (Choice)
{'default': 2, 'conditions': [], 'values': [2, 3, 4], 'ordered': True}
bert_url (Choice)
{'default': 'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-256_A-4/2', 'conditions': [], 'values': ['https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-256_A-4/2'], 'ordered': False}
pos_enc_blocks (Choice)
{'default': 2, 'conditions': [], 'values': [2, 3, 4], 'ordered': True}
pos_attn_heads (Choice)
{'default': 2, 'conditions': [], 'values': [2, 3, 4], 'ordered': True}
pos_dense_stack (Choice)
{'default': 2, 'conditions': [], 'values': [2, 3, 4], 'ordered': True}
decoder_enc_blocks (Choice)
{'default': 2, 'conditions': [], 'values': [2, 3, 4], 'ordered': True}
decoder_attn_heads (Choice)
{'default': 2, 'conditions': [], 'values': [2, 3, 4], 'ordered': True}
decoder_dense_stack (Choice)
{'default': 2, 'conditions': [], 'values': [2, 3, 4], 'ordered': True}
None

Search: Running Trial #218

Hyperparameter    |Value             |Best Value So Far 
enc_dropout       |0.37332           |0.10642           
enc_layer_norm    |0.15571           |0.12288           
enc_l2_reg        |0.48613           |0.57864           
pos_dropout       |0.17162           |0.14473           
pos_layer_norm    |0.11009           |0.26961           
pos_l2_reg        |0.49191           |0.20803           
decoder_dropout   |0.24864           |0.051037          
decoder_layer_norm|0.46016           |0.57878           
decoder_l2_reg    |0.41414           |0.013985          
learning_rate     |7.8417e-05        |6.716e-05         
enc_dense_stack   |4                 |3                 
bert_url          |https://tfhub.d...|https://tfhub.d...
pos_enc_blocks    |2                 |4                 
pos_attn_heads    |4                 |4                 
pos_dense_stack   |2                 |4                 
decoder_enc_blocks|2                 |3                 
decoder_attn_heads|2                 |3                 
decoder_dense_s...|2                 |2                 
tuner/epochs      |50                |50                
tuner/initial_e...|0                 |17                
tuner/bracket     |0                 |2                 
tuner/round       |0                 |2                 

Epoch 1/50
85/85 - 215s - loss: 149.9310 - accuracy: 0.8909 - val_loss: 103.2796 - val_accuracy: 0.9896 - lr: 6.4203e-05 - 215s/epoch - 3s/step
Epoch 2/50
85/85 - 220s - loss: 94.1549 - accuracy: 0.9897 - val_loss: 83.6212 - val_accuracy: 0.9896 - lr: 6.4203e-05 - 220s/epoch - 3s/step
Epoch 3/50
85/85 - 210s - loss: 75.2738 - accuracy: 0.9897 - val_loss: 67.1717 - val_accuracy: 0.9896 - lr: 6.4203e-05 - 210s/epoch - 2s/step
Epoch 4/50
85/85 - 190s - loss: 60.2264 - accuracy: 0.9898 - val_loss: 53.5418 - val_accuracy: 0.9896 - lr: 6.4203e-05 - 190s/epoch - 2s/step

Original Q&A

There are 1 best solutions below

**kory** · Answer 1

kory On 07 June 2022 at 18:19

According to Keras Tuner - Distributed Tuning you should add the distributed_strategy parameter to the Hyperband constructor.

Keras Tuner - Chief running trials instead of the workers

There are 1 best solutions below

Related Questions in TENSORFLOW

Related Questions in KERAS

Related Questions in DEEP-LEARNING

Related Questions in HYPERPARAMETERS

Related Questions in KERAS-TUNER

Trending Questions

Popular # Hahtags

Popular Questions