How to tune hyper parameters for CIFAR100 in tensorflow_federated TFF without dropping in the accuracy?

455 Views Asked by At

I'm trying to test this tutorial https://www.tensorflow.org/federated/tutorials/tff_for_federated_learning_research_compression with CIFAR100 dataset, but the accuracy is dropping each round!

Does my tuning for the hyper parameter is the reason??

Here is my code:

cifar_train, cifar_test = tff.simulation.datasets.cifar100.load_data()

MAX_CLIENT_DATASET_SIZE = 418

CLIENT_EPOCHS_PER_ROUND = 1
CLIENT_BATCH_SIZE = 20
TEST_BATCH_SIZE = 500

def reshape_cifar_element(element):
  return (tf.expand_dims(element['image'], axis=-1), element['label'])

def preprocess_train_dataset(dataset):
  """Preprocessing function for the EMNIST training dataset."""
  return (dataset
          # Shuffle according to the largest client dataset
          .shuffle(buffer_size=MAX_CLIENT_DATASET_SIZE)
          # Repeat to do multiple local epochs
          .repeat(CLIENT_EPOCHS_PER_ROUND)
          # Batch to a fixed client batch size
          .batch(CLIENT_BATCH_SIZE, drop_remainder=False)
          # Preprocessing step
          .map(reshape_cifar_element))

cifar_train = cifar_train.preprocess(preprocess_train_dataset)

# defining a model 
def create_original_fedavg_cnn_model():
  data_format = 'channels_last'

  max_pool = functools.partial(
      tf.keras.layers.MaxPooling2D,
      pool_size=(2, 2),
      padding='same',
      data_format=data_format)
  conv2d = functools.partial(
      tf.keras.layers.Conv2D,
      kernel_size=5,
      padding='same',
      data_format=data_format,
      activation=tf.nn.relu)

  model = tf.keras.models.Sequential([
      tf.keras.layers.InputLayer(input_shape=(32, 32, 3)),
      conv2d(filters=32),
      max_pool(),
      conv2d(filters=64),
      max_pool(),
      tf.keras.layers.Flatten(),
      tf.keras.layers.Dense(512, activation=tf.nn.relu),
      tf.keras.layers.Dense(100, activation=None),
      tf.keras.layers.Softmax(),
  ])
  return model

input_spec = cifar_train.create_tf_dataset_for_client(
    cifar_train.client_ids[0]).element_spec

def tff_model_fn():
  keras_model = create_original_fedavg_cnn_model()
  return tff.learning.from_keras_model(
      keras_model=keras_model,
      input_spec=input_spec,
      loss=tf.keras.losses.SparseCategoricalCrossentropy(),
      metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])

# training the model 
federated_averaging = tff.learning.build_federated_averaging_process(
    model_fn=tff_model_fn,
    client_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=0.05),
    server_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=1.0))

# utility function
def format_size(size):
  size = float(size)
  for unit in ['bit','Kibit','Mibit','Gibit']:
    if size < 1024.0:
      return "{size:3.2f}{unit}".format(size=size, unit=unit)
    size /= 1024.0
  return "{size:.2f}{unit}".format(size=size, unit='TiB')

def set_sizing_environment():
  sizing_factory = tff.framework.sizing_executor_factory()
  context = tff.framework.ExecutionContext(executor_fn=sizing_factory)
  tff.framework.set_default_context(context)

  return sizing_factory

def train(federated_averaging_process, num_rounds, num_clients_per_round, summary_writer):
  environment = set_sizing_environment()

  # Initialize the Federated Averaging algorithm to get the initial server state.
  state = federated_averaging_process.initialize()

  with summary_writer.as_default():
    for round_num in range(num_rounds):
      # Sample the clients parcitipated in this round.
      sampled_clients = np.random.choice(
          cifar_train.client_ids,
          size=num_clients_per_round,
          replace=False)
      # Create a list of `tf.Dataset` instances from the data of sampled clients.
      sampled_train_data = [
          cifar_train.create_tf_dataset_for_client(client)
          for client in sampled_clients
      ]
      state, metrics = federated_averaging_process.next(state, sampled_train_data)

      size_info = environment.get_size_info()
      broadcasted_bits = size_info.broadcast_bits[-1]
      aggregated_bits = size_info.aggregate_bits[-1]

      print('round {:2d}, metrics={}, broadcasted_bits={}, aggregated_bits={}'.format(round_num, metrics, format_size(broadcasted_bits), format_size(aggregated_bits)))

      # Add metrics to Tensorboard.
      for name, value in metrics['train'].items():
          tf.summary.scalar(name, value, step=round_num)

      # Add broadcasted and aggregated data size to Tensorboard.
      tf.summary.scalar('cumulative_broadcasted_bits', broadcasted_bits, step=round_num)
      tf.summary.scalar('cumulative_aggregated_bits', aggregated_bits, step=round_num)
      summary_writer.flush()

# Clean the log directory to avoid conflicts.
try:
  tf.io.gfile.rmtree('/tmp/logs/scalars')
except tf.errors.OpError as e:
  pass  # Path doesn't exist

# Set up the log directory and writer for Tensorboard.
logdir = "/tmp/logs/scalars/original/"
summary_writer = tf.summary.create_file_writer(logdir)

train(federated_averaging_process=federated_averaging, num_rounds=10,
      num_clients_per_round=100, summary_writer=summary_writer)

And this is the output:

round  0, metrics=OrderedDict([('broadcast', ()), ('aggregation', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('train', OrderedDict([('sparse_categorical_accuracy', 0.0299), ('loss', 15.586388), ('num_examples', 10000), ('num_batches', 500)]))]), broadcasted_bits=6.56Gibit, aggregated_bits=6.56Gibit
round  1, metrics=OrderedDict([('broadcast', ()), ('aggregation', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('train', OrderedDict([('sparse_categorical_accuracy', 0.0046), ('loss', 16.042076), ('num_examples', 10000), ('num_batches', 500)]))]), broadcasted_bits=13.13Gibit, aggregated_bits=13.13Gibit
round  2, metrics=OrderedDict([('broadcast', ()), ('aggregation', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('train', OrderedDict([('sparse_categorical_accuracy', 0.0107), ('loss', 15.945647), ('num_examples', 10000), ('num_batches', 500)]))]), broadcasted_bits=19.69Gibit, aggregated_bits=19.69Gibit
round  3, metrics=OrderedDict([('broadcast', ()), ('aggregation', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('train', OrderedDict([('sparse_categorical_accuracy', 0.0104), ('loss', 15.950482), ('num_examples', 10000), ('num_batches', 500)]))]), broadcasted_bits=26.26Gibit, aggregated_bits=26.26Gibit
round  4, metrics=OrderedDict([('broadcast', ()), ('aggregation', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('train', OrderedDict([('sparse_categorical_accuracy', 0.0115), ('loss', 15.932754), ('num_examples', 10000), ('num_batches', 500)]))]), broadcasted_bits=32.82Gibit, aggregated_bits=32.82Gibit
round  5, metrics=OrderedDict([('broadcast', ()), ('aggregation', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('train', OrderedDict([('sparse_categorical_accuracy', 0.0111), ('loss', 15.9391985), ('num_examples', 10000), ('num_batches', 500)]))]), broadcasted_bits=39.39Gibit, aggregated_bits=39.39Gibit
round  6, metrics=OrderedDict([('broadcast', ()), ('aggregation', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('train', OrderedDict([('sparse_categorical_accuracy', 0.0112), ('loss', 15.937586), ('num_examples', 10000), ('num_batches', 500)]))]), broadcasted_bits=45.95Gibit, aggregated_bits=45.95Gibit
round  7, metrics=OrderedDict([('broadcast', ()), ('aggregation', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('train', OrderedDict([('sparse_categorical_accuracy', 0.012), ('loss', 15.924692), ('num_examples', 10000), ('num_batches', 500)]))]), broadcasted_bits=52.52Gibit, aggregated_bits=52.52Gibit
round  8, metrics=OrderedDict([('broadcast', ()), ('aggregation', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('train', OrderedDict([('sparse_categorical_accuracy', 0.0105), ('loss', 15.948869), ('num_examples', 10000), ('num_batches', 500)]))]), broadcasted_bits=59.08Gibit, aggregated_bits=59.08Gibit
round  9, metrics=OrderedDict([('broadcast', ()), ('aggregation', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('train', OrderedDict([('sparse_categorical_accuracy', 0.0096), ('loss', 15.963377), ('num_examples', 10000), ('num_batches', 500)]))]), broadcasted_bits=65.64Gibit, aggregated_bits=65.64Gibit

Here is the input structure:

OrderedDict([('coarse_label', TensorSpec(shape=(), dtype=tf.int64, name=None)), ('image', TensorSpec(shape=(32, 32, 3), dtype=tf.uint8, name=None)), ('label', TensorSpec(shape=(), dtype=tf.int64, name=None))])

I don't know where is my mistake!

  • Are the hyper parameter that are defined in the layers in create_original_fedavg_cnn_model() wrong? or in preprocess_train_dataset()?

  • How to tune the parameters for the same tutorial for CIFAR100 dataset?

Appreciate any help! Thanks.

1

There are 1 best solutions below

1
On BEST ANSWER

A couple of notes here:

  1. Since it seems to be training (though not well), some of the most salient hyperparameters will be the learning rates for the optimizers. You may want to try other hyperparameters there, or even other optimizers.

  2. The CNN model you're using is pretty small, and may just not do well on CIFAR-100 as a whole. One helpful thing to do would be to try training the model on the dataset in a centralized manner first (as a consistency check) and then move on to federated training.

  3. One nice rule of thumb for how to initialize hyperparameter settings is to take the optimzier/hyperparameters that work well in the centralized training (see bullet 2), and using these as the client optimizer, while keeping the server optimizer as SGD with learning rate 1. This may not be optimal, but can often do pretty well.

Unfortunately, model training is still an art, not a science, and federated training can be different than centralized training. As a result, some trial-and-error will probably be needed. Best of luck.