I'm building a model with pytorch lightning and I'm using the Distributed Data Parallel (DDP) strategy on 2 GPU for accelerating the process. After the fitting of the model, I need to return 4 pairs of prediction with structure (output, ground_truth) that are two 2d vectors (images), while the input of the predict_step method is a batch of a single element loaded once a time with a dataloader object.
Considering my specific settings the GPUs execute 2 prediction each once at a time and the results are returned.
With the following code I'm tring to store all the prediction's outputs with Wheights&Biases library that allows to do it natively, but I'm having some problems because only 2 images are stored.
def on_predict_batch_end(self, outputs, batch, batch_idx):
gt, pred, axis, idx_slice = outputs
gt = gt.squeeze().cpu().numpy()
pred = pred.squeeze().cpu().numpy()
if axis != 0:
gt = np.rot90(gt, 2)
gt = np.flip(gt, 1)
pred = np.rot90(pred, 2)
pred = np.flip(pred, 1)
self.logger.log_image(
key="eval_output",
images=[gt, pred],
caption=[
f"gt_data_{axis}_{idx_slice}",
f"output_model_{axis}_{idx_slice}",
],
)
Have you some ideas on how to fix this problems? The images' names are all different so images are not overridden
I tried also to use the Callback class provided by pytorch lightning but the situation remains the same. Using only one GPU the logging procedure works