I'm working with the Stable Diffusion XL (SDXL) model from Hugging Face's diffusers library and encountering an issue where my callback function, intended to generate preview images during the diffusion process, only produces black images. This setup used to work with Stable Diffusion 1.5, but seems to have issues with SDXL.
The main difference I've noticed is in the handling of callbacks in SDXL, where latents are now stored in callback_kwargs. I've tried to adapt my code accordingly, but the previews are still not generated correctly.
Here's a minimal example of my current implementation:
from diffusers import StableDiffusionXLPipeline
import torch
pipe = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
).to("cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
def callback(pipe, step_index, timestep, callback_kwargs):
latents = callback_kwargs.get("latents")
with torch.no_grad():
latents = 1 / 0.18215 * latents
image = pipe.vae.decode(latents).sample
image = (image / 2 + 0.5).clamp(0, 1)
image = image.cpu().permute(0, 2, 3, 1).float().numpy()
image = pipe.numpy_to_pil(image)[0]
image.save(f"./imgs/{step_index}.png")
return callback_kwargs
image = pipe(prompt=prompt, callback_on_step_end=callback).images[0]
The resulting images saved in ./imgs/ are just black. I suspect the issue might be related to the handling of latents or the image conversion process, but I'm not sure what specifically is going wrong.
Has anyone experienced a similar issue or can provide insight into why this might be happening with the SDXL model?
Using a different model - this code works. When using StableDiffusionXLPipeline, you may need to change the normalization factors from the latents back to the image.