Running Z-Image-Turbo on CPU

Table of contents

The purpose
Build environment
1. Move to the venv environment (Optional)
2. Install library
Execute
1. Execute Time
Impressions
Freebie

The purpose

“Trying out Alibaba’s Z-Image-Turbo image generation model in a non-CUDA environment.”

Build environment

Create a folder for the work.

Move to the venv environment (Optional)

If necessary, run the following commands in the Command Prompt to create and activate a venv environment.

python -mvenv venv
venv\scripts\activate.bat

Install library

Run the following command to install the required libraries.

pip install git+https://github.com/huggingface/diffusers
pip install torch torchvision
pip install transformers
pip install accelerate

Save the following content as run.py.

import torch
from diffusers import ZImagePipeline

pipe = ZImagePipeline.from_pretrained(
    "Tongyi-MAI/Z-Image-Turbo",
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=False,
)
pipe.to("cpu")

prompt = "Young Chinese woman in red Hanfu, intricate embroidery. Impeccable makeup, red floral forehead pattern. Elaborate high bun, golden phoenix headdress, red flowers, beads. Holds round folding fan with lady, trees, bird. Neon lightning-bolt lamp (⚡️), bright yellow glow, above extended left palm. Soft-lit outdoor night background, silhouetted tiered pagoda (西安大雁塔), blurred colorful distant lights."

# 2. Generate Image
image = pipe(
    prompt=prompt,
    height=256,
    width=256,
    num_inference_steps=9,  
    guidance_scale=0.0,    
    generator=torch.Generator().manual_seed(42),
).images[0]

image.save("example.png")

This code is basically based on the page (model card) below, but I have modified it to run on CPU. I also reduced the generated image size for testing purposes.

Tongyi-MAI/Z-Image-Turbo · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Execute

Run the following command to execute the script. An ‘example.png’ file will be created in the folder where you ran it.

python run.py

Execute Time

The first run will be slow because it involves downloading the model. Depending on your environment, this may take an additional 1 to 2 hours, as it requires downloading over 20 GB of data.

Subsequent iterations should take about 20 minutes each.

However, it takes quite a while after the progress bar reaches 100%. (I haven’t timed it exactly, but maybe an extra hour or so?)

Impressions

To be honest, Flux.1 took 20 minutes to generate an image, which felt quite slow.

However, with other models, I’ve never been able to get a decent result when trying to generate 256×256 images. In contrast, Z-Image-Turbo produced a high-quality image just like the samples.

Freebie

A record of my failed attempt to run　Z-Image-Turbo with DirectML

import torch
from diffusers import ZImagePipeline
import torch_directml
dml = torch_directml.device()

pipe = ZImagePipeline.from_pretrained(
    "Tongyi-MAI/Z-Image-Turbo",
    torch_dtype=torch.float,
    low_cpu_mem_usage=False,
)
pipe.to(dml)

prompt = "Young Chinese woman in red Hanfu, intricate embroidery. Impeccable makeup, red floral forehead pattern. Elaborate high bun, golden phoenix headdress, red flowers, beads. Holds round folding fan with lady, trees, bird. Neon lightning-bolt lamp (⚡️), bright yellow glow, above extended left palm. Soft-lit outdoor night background, silhouetted tiered pagoda (西安大雁塔), blurred colorful distant lights."

# 2. Generate Image
image = pipe(
    prompt=prompt,
    height=256,
    width=256,
    num_inference_steps=9,  # This actually results in 8 DiT forwards
    guidance_scale=0.0,     # Guidance should be 0 for the Turbo models
    generator=torch.Generator().manual_seed(42),
).images[0]

image.save("example.png")

Error

It’s float to float, so no conversion should be needed… is it a memory issue?

Traceback (most recent call last):
  File "F:\projects\python\Qwen-Image\zi.py", line 11, in <module>
    pipe.to(dml)
  File "F:\projects\python\Qwen-Image\venv\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 545, in to
    module.to(device, dtype)
  File "F:\projects\python\Qwen-Image\venv\lib\site-packages\transformers\modeling_utils.py", line 4343, in to
    return super().to(*args, **kwargs)
  File "F:\projects\python\Qwen-Image\venv\lib\site-packages\torch\nn\modules\module.py", line 1174, in to
    return self._apply(convert)
  File "F:\projects\python\Qwen-Image\venv\lib\site-packages\torch\nn\modules\module.py", line 780, in _apply
    module._apply(fn)
  File "F:\projects\python\Qwen-Image\venv\lib\site-packages\torch\nn\modules\module.py", line 780, in _apply
    module._apply(fn)
  File "F:\projects\python\Qwen-Image\venv\lib\site-packages\torch\nn\modules\module.py", line 780, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "F:\projects\python\Qwen-Image\venv\lib\site-packages\torch\nn\modules\module.py", line 805, in _apply
    param_applied = fn(param)
  File "F:\projects\python\Qwen-Image\venv\lib\site-packages\torch\nn\modules\module.py", line 1160, in convert
    return t.to(
RuntimeError