Creating Audio with Stable Audio Open Small on CPU

Table of contents

The purpose
1. about license
Environment Setup
Execute
Result
Reference

The purpose

In the previous article, I tried creating audio with Stable Audio Open 1.0, but its performance was quite borderline.

Now that the lighter Stable Audio Open Small has been released, I’ll give it a try.

I tried various methods, but I couldn’t get Stable Audio Open Small to run with DirectML. (This is because the PyTorch version used by stable-audio-tools conflicts with the PyTorch version used by torch-directml.)

about license

Please refer to the following link for the model’s license.

It’s free for non-commercial use.

Professional Membership Agreement — Stability AI

Environment Setup

Create a working folder.

Move to the venv Environment (Optional)

If needed, run the following commands in the Command Prompt to create and activate the Venv environment.

We recommend using a venv environment this time, as you might need to modify libraries.

python -mvenv venv
venv\scripts\activate.bat

Install library

Run the following commands to install the necessary libraries.

pip install stable-audio-tools

ERROR：　AttributeError: module ‘pkgutil’ has no attribute ‘ImpImporter’. Did you mean: ‘zipimporter’?

Depending on the environment, the following error was output. I was able to avoid it by changing the Python version used.

Version where the issue occurred：3.10.6

Version where the issue does not occurred：3.12.8

      AttributeError: module 'pkgutil' has no attribute 'ImpImporter'. Did you mean: 'zipimporter'?
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

Make script

Save the following content as a file named run.py.

This is the same script used in environments where CUDA can run (it switches automatically).

import torch
import torchaudio
from einops import rearrange
from stable_audio_tools import get_pretrained_model
from stable_audio_tools.inference.generation import generate_diffusion_cond

device = "cuda" if torch.cuda.is_available() else "cpu"

# Download model
model, model_config = get_pretrained_model("stabilityai/stable-audio-open-small")
sample_rate = model_config["sample_rate"]
sample_size = model_config["sample_size"]

model = model.to(device)

# Set up text and timing conditioning
conditioning = [{
    "prompt": "128 BPM tech house drum loop",
    "seconds_total": 11
}]

# Generate stereo audio
output = generate_diffusion_cond(
    model,
    steps=8,
    conditioning=conditioning,
    sample_size=sample_size,
    sampler_type="pingpong",
    device=device
)

# Rearrange audio batch to a single sequence
output = rearrange(output, "b d n -> d (b n)")

# Peak normalize, clip, convert to int16, and save to file
output = output.to(torch.float32).div(torch.max(torch.abs(output))).clamp(-1, 1).mul(32767).to(torch.int16).cpu()
torchaudio.save("output.wav", output, sample_rate)

Execute

Run the script by executing the following command.

An output.wav file will be created in the folder where you run it.

python run.py

Execution Time

As shown in the capture below, 1 iteration (it) finishes in about 1 second, so even 8 iterations will complete in about 10 seconds.

However, after the progress shown above reaches 100%, it takes a considerable amount of time to actually complete. (I didn’t measure it precisely, but it wasn’t done within 3 hours; it eventually finished after waiting overnight.)

By the way

The following process is slow.

sampled = model.pretransform.decode(sampled)

If Model Download Fails

Please refer to the article below.

Error（ValueError: high is out of bounds for int32）

Execution Failure Due to Error (Environment Dependent?)

  File "F:\projects\python\StableAL\venv\lib\site-packages\stable_audio_tools\inference\generation.py", line 138, in generate_diffusion_cond
    seed = seed if seed != -1 else np.random.randint(0, 2**32 - 1)
  File "mtrand.pyx", line 746, in numpy.random.mtrand.RandomState.randint
  File "_bounded_integers.pyx", line 1336, in numpy.random._bounded_integers._rand_int32
ValueError: high is out of bounds for int32

To work around this, modify line 138 of venv\Lib\site-packages\stable_audio_tools\inference\generation.py as follows. (This may vary depending on the version of stable_audio_tools.)

Before：

    seed = seed if seed != -1 else np.random.randint(0, 2**32 - 1, dtype=np.uint32)

After：

    seed = seed if seed != -1 else np.random.randint(0, 2**31 - 1, dtype=np.uint32)

Result

Although I was able to create audio using Stable Audio Open Small on a CPU, the significant time required for actual use means that its practical application will likely need to be discussed.

Reference

stabilityai/stable-audio-open-small · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.