The purpose
In the previous article, I tried creating audio with Stable Audio Open 1.0, but its performance was quite borderline.
Now that the lighter Stable Audio Open Small has been released, I’ll give it a try.
I tried various methods, but I couldn’t get Stable Audio Open Small to run with DirectML. (This is because the PyTorch version used by stable-audio-tools
conflicts with the PyTorch version used by torch-directml
.)
about license
Please refer to the following link for the model’s license.
It’s free for non-commercial use.
Environment Setup
Create a working folder.
Move to the venv Environment (Optional)
If needed, run the following commands in the Command Prompt to create and activate the Venv environment.
We recommend using a venv environment this time, as you might need to modify libraries.
python -mvenv venv
venv\scripts\activate.bat
Install library
Run the following commands to install the necessary libraries.
pip install stable-audio-tools
ERROR: AttributeError: module ‘pkgutil’ has no attribute ‘ImpImporter’. Did you mean: ‘zipimporter’?
Depending on the environment, the following error was output. I was able to avoid it by changing the Python version used.
Version where the issue occurred:3.10.6
Version where the issue does not occurred:3.12.8
AttributeError: module 'pkgutil' has no attribute 'ImpImporter'. Did you mean: 'zipimporter'?
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
Make script
Save the following content as a file named run.py
.
This is the same script used in environments where CUDA can run (it switches automatically).
import torch
import torchaudio
from einops import rearrange
from stable_audio_tools import get_pretrained_model
from stable_audio_tools.inference.generation import generate_diffusion_cond
device = "cuda" if torch.cuda.is_available() else "cpu"
# Download model
model, model_config = get_pretrained_model("stabilityai/stable-audio-open-small")
sample_rate = model_config["sample_rate"]
sample_size = model_config["sample_size"]
model = model.to(device)
# Set up text and timing conditioning
conditioning = [{
"prompt": "128 BPM tech house drum loop",
"seconds_total": 11
}]
# Generate stereo audio
output = generate_diffusion_cond(
model,
steps=8,
conditioning=conditioning,
sample_size=sample_size,
sampler_type="pingpong",
device=device
)
# Rearrange audio batch to a single sequence
output = rearrange(output, "b d n -> d (b n)")
# Peak normalize, clip, convert to int16, and save to file
output = output.to(torch.float32).div(torch.max(torch.abs(output))).clamp(-1, 1).mul(32767).to(torch.int16).cpu()
torchaudio.save("output.wav", output, sample_rate)
Execute
Run the script by executing the following command.
An output.wav
file will be created in the folder where you run it.
python run.py
Execution Time
As shown in the capture below, 1 iteration (it) finishes in about 1 second, so even 8 iterations will complete in about 10 seconds.

However, after the progress shown above reaches 100%, it takes a considerable amount of time to actually complete. (I didn’t measure it precisely, but it wasn’t done within 3 hours; it eventually finished after waiting overnight.)
By the way
The following process is slow.
sampled = model.pretransform.decode(sampled)
If Model Download Fails
Please refer to the article below.
Error(ValueError: high is out of bounds for int32)
Execution Failure Due to Error (Environment Dependent?)
File "F:\projects\python\StableAL\venv\lib\site-packages\stable_audio_tools\inference\generation.py", line 138, in generate_diffusion_cond
seed = seed if seed != -1 else np.random.randint(0, 2**32 - 1)
File "mtrand.pyx", line 746, in numpy.random.mtrand.RandomState.randint
File "_bounded_integers.pyx", line 1336, in numpy.random._bounded_integers._rand_int32
ValueError: high is out of bounds for int32
To work around this, modify line 138 of venv\Lib\site-packages\stable_audio_tools\inference\generation.py
as follows. (This may vary depending on the version of stable_audio_tools
.)
Before:
seed = seed if seed != -1 else np.random.randint(0, 2**32 - 1, dtype=np.uint32)
After:
seed = seed if seed != -1 else np.random.randint(0, 2**31 - 1, dtype=np.uint32)
Result
Although I was able to create audio using Stable Audio Open Small on a CPU, the significant time required for actual use means that its practical application will likely need to be discussed.
Reference

comment