In a nutshell: Attempting to pass an image into StableCascadeCombinedPipeline gives a runtime error complaining about tensors not all being in cuda. The app works perfectly if I comment out the image argument so that it relies only on the text prompt, i.e. as a text-to-image generator.
A gist of the app code (60 lines of python) with the image input commented out is visible here
The doc for the pipeline defines the optional image argument as:
images (torch.Tensor, PIL.Image.Image, List[torch.Tensor], List[PIL.Image.Image], optional) — The images to guide the image generation for the prior.
I'm passing a PIL.Image.Image by way of a Gradio Image Component.
Since the app runs without passing an image, it seems like I need, somehow, to ensure that the image ends up in cuda
, but so far I haven't found any instructions for how to do that.
Here's the part of the code that sets up the pipeline and defines the generate function:
# Constants
repo = "stabilityai/stable-cascade"
# Ensure model and scheduler are initialized in GPU-enabled function
if torch.cuda.is_available():
pipe = StableCascadeCombinedPipeline.from_pretrained(repo, variant="bf16", torch_dtype=torch.bfloat16)
pipe.to("cuda")
# The generate function
@spaces.GPU(enable_queue=True)
def generate_image(prompt):
#def generate_image(prompt, images):
seed = random.randint(-100000,100000)
results = pipe(
prompt=prompt,
#images=[images],
height=1024,
width=1024,
num_inference_steps=20,
generator=torch.Generator(device="cuda").manual_seed(seed)
)
return results.images[0]