Generating preview images with Stable Diffusion XL pipeline results in black images

370 Views Asked by At

I'm working with the Stable Diffusion XL (SDXL) model from Hugging Face's diffusers library and encountering an issue where my callback function, intended to generate preview images during the diffusion process, only produces black images. This setup used to work with Stable Diffusion 1.5, but seems to have issues with SDXL.

The main difference I've noticed is in the handling of callbacks in SDXL, where latents are now stored in callback_kwargs. I've tried to adapt my code accordingly, but the previews are still not generated correctly.

Here's a minimal example of my current implementation:

from diffusers import StableDiffusionXLPipeline
import torch

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
).to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"

def callback(pipe, step_index, timestep, callback_kwargs):
    latents = callback_kwargs.get("latents")
    
    with torch.no_grad():
        latents = 1 / 0.18215 * latents
        image = pipe.vae.decode(latents).sample
        image = (image / 2 + 0.5).clamp(0, 1)
        
        image = image.cpu().permute(0, 2, 3, 1).float().numpy()
        
        image = pipe.numpy_to_pil(image)[0]
        image.save(f"./imgs/{step_index}.png")
        
    return callback_kwargs

image = pipe(prompt=prompt, callback_on_step_end=callback).images[0]

The resulting images saved in ./imgs/ are just black. I suspect the issue might be related to the handling of latents or the image conversion process, but I'm not sure what specifically is going wrong.

Has anyone experienced a similar issue or can provide insight into why this might be happening with the SDXL model?

3

There are 3 best solutions below

1
On

Using a different model - this code works. When using StableDiffusionXLPipeline, you may need to change the normalization factors from the latents back to the image.

from diffusers import StableDiffusionPipeline
import torch

model = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", revision="fp16", torch_dtype=torch.float16)
model = model.to("cuda")    
def callback(iter, t, latents):
    with torch.no_grad():
        latents = 1 / 0.18215 * latents
        image = model.vae.decode(latents).sample

        image = (image / 2 + 0.5).clamp(0, 1)

        image = image.cpu().permute(0, 2, 3, 1).float().numpy()

        image = model.numpy_to_pil(image)
        plt.figure()
        plt.imshow(image[0])
        plt.show()       

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = model(prompt, callback=callback, callback_steps=5)

enter image description here enter image description here enter image description here enter image description here enter image description here enter image description here

0
On

The process of normalizing is slightly different in SDXL than SD, you can see the source code for both of the pipelines in the Github repo. I was able to get it to work with the following code. You can also try using VAE such as https://huggingface.co/madebyollin/sdxl-vae-fp16-fix

from diffusers import StableDiffusionXLPipeline
import torch
import matplotlib.pyplot as plt

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
).to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"

def callback(pipe, step_index, timestep, callback_kwargs):
    latents = callback_kwargs.get("latents")

    with torch.no_grad():
        pipe.upcast_vae()
        latents = latents.to(next(iter(pipe.vae.post_quant_conv.parameters())).dtype)
        images = pipe.vae.decode(latents / pipe.vae.config.scaling_factor, return_dict=False)[0]
        images = pipe.image_processor.postprocess(images, output_type='pil')
        
        plt.figure()
        plt.imshow(images[0])
        plt.show()

    return callback_kwargs

pipe(prompt=prompt, callback_on_step_end=callback)

preview

0
On

There's a known issue with the default SDXL VAE when using float16. Try this fix instead.

from diffusers import StableDiffusionXLPipeline, AutoencoderKL
import torch

vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", vae=vae, torch_dtype=torch.float16, variant="fp16", use_safetensors=True
).to("cuda")

# Same as before...