I'm using the Stable Diffussion Pipeline from Huggingface, and been trying to start the diffusion process with a custom latent using the latents param, nevertheless, the result is unexpected.
I took the output (PIL image) of a Stable Diffussion Pipeline and used pil_to_latents() function (shown at the end) to get the latent representation, to later call the second Stable Diffussion Pipeline with the latents param as follows:
pipe(
latents = pil_to_latents(result_image_from_first_pipe)[0],
*** other pipe arguments
)
But at the end, I'm getting a weird and blurry output. If I run the pipeline without this argument, the results seems normal.Does anybody has an idea why this happens? Thanks !
Supporting code:
def pil_to_latents(self, image):
'''
Function to convert image to latents
'''
init_image = tfms.ToTensor()(image).unsqueeze(0) * 2.0 - 1.0
init_image = init_image.to(device="cuda", dtype=torch.float16)
init_latent_dist = self.vae.encode(init_image).latent_dist.sample() * 0.18215
return init_latent_dist
def latents_to_pil(self, latents):
'''
Function to convert latents to images
'''
latents = (1 / 0.18215) * latents
with torch.no_grad():
image = self.vae.decode(latents).sample
image = (image / 2 + 0.5).clamp(0, 1)
image = image.detach().cpu().permute(0, 2, 3, 1).numpy()
images = (image * 255).round().astype("uint8")
pil_images = [Image.fromarray(image) for image in images]
return pil_images