The purpose
I’m going to try generating audio with Stable Audio Open 1.0 using an AMD GPU. I’ll be using DirectML.
Environment
Create a working folder.
Move to the venv Environment (Optional)
If needed, run the following commands in Command Prompt to create and activate your venv environment.
python -mvenv venv
venv\scripts\activate.bat
Install library
Run the following command to install the necessary libraries.
pip install scipy
pip install torch torchvision torchaudio
pip install torch-directml
pip install soundfile
pip install diffusers
pip install transformers
pip install torchsde
pip install accelerate
Make script
Save the following content as run.py
.
import scipy
import torch
import soundfile as sf
from diffusers import StableAudioPipeline
import torch_directml
import random
dml = torch_directml.device()
repo_id = "stabilityai/stable-audio-open-1.0"
pipe = StableAudioPipeline.from_pretrained(repo_id, torch_dtype=torch.float16, cache_dir="model")
pipe = pipe.to(dml)
prompt = "sound of heart beat"
generator = torch.Generator()
generator.manual_seed(random.randint(1, 65535))
audio = pipe(
prompt,
num_inference_steps=2,
audio_end_in_s=0.5,
num_waveforms_per_prompt=1,
generator=generator,
).audios
output = audio[0].T.float().cpu().numpy()
sf.write("output.wav", output, pipe.vae.sampling_rate)
Execute
Run the following command to execute the script.
An output.wav
file will be created in the folder where the script was run.
python run.py
If the model download fails
Please refer to the article below.
About Execution Environment
My environment (CPU: AMD Ryzen 7 7735HS / RAM: 32GB / GPU: Integrated CPU graphics) can barely run it, even after adjusting browser and editor settings. (It sometimes succeeds and sometimes fails with an out of memory error.)
Modify Script
Modify prompt
Please change the "sound of heart beat"
portion of prompt = "sound of heart beat"
.
Change length of sound file
Please change the 0.5
in audio_end_in_s=0.5
. (The unit is seconds.)
Change quality
Please change the 2
in num_inference_steps=2
.
The default value for this parameter is 100. I have set it to 2 to force it to run on my underpowered environment.
In environments using an AMD GPU board or similar, please feel free to actively increase this value.
Result
I was able to generate audio using an AMD GPU.
Reference

comment