Generating Audio with Stable Audio Open 1.0 on AMD GPUs

Table of contents

The purpose
Environment
Execute
1. If the model download fails
2. About Execution Environment
Modify Script
Result
Reference

The purpose

I’m going to try generating audio with Stable Audio Open 1.0 using an AMD GPU. I’ll be using DirectML.

Environment

Create a working folder.

Move to the venv Environment (Optional)

If needed, run the following commands in Command Prompt to create and activate your venv environment.

python -mvenv venv
venv\scripts\activate.bat

Install library

Run the following command to install the necessary libraries.

pip install scipy
pip install torch torchvision torchaudio
pip install torch-directml
pip install soundfile 
pip install diffusers 
pip install transformers
pip install torchsde
pip install accelerate

Make script

Save the following content as run.py.

import scipy
import torch
import soundfile as sf
from diffusers import StableAudioPipeline
import torch_directml
import random

dml = torch_directml.device()
repo_id = "stabilityai/stable-audio-open-1.0"
pipe = StableAudioPipeline.from_pretrained(repo_id, torch_dtype=torch.float16, cache_dir="model")
pipe = pipe.to(dml)

prompt = "sound of heart beat"

generator = torch.Generator()
generator.manual_seed(random.randint(1, 65535))

audio = pipe(
    prompt,
    num_inference_steps=2,
    audio_end_in_s=0.5,
    num_waveforms_per_prompt=1,
    generator=generator,
).audios

output = audio[0].T.float().cpu().numpy()
sf.write("output.wav", output, pipe.vae.sampling_rate)

Execute

Run the following command to execute the script.

An output.wav file will be created in the folder where the script was run.

python run.py

If the model download fails

Please refer to the article below.

About Execution Environment

My environment (CPU: AMD Ryzen 7 7735HS / RAM: 32GB / GPU: Integrated CPU graphics) can barely run it, even after adjusting browser and editor settings. (It sometimes succeeds and sometimes fails with an out of memory error.)

Modify Script

Modify prompt

Please change the "sound of heart beat" portion of prompt = "sound of heart beat".

Change length of sound file

Please change the 0.5 in audio_end_in_s=0.5. (The unit is seconds.)

Change quality

Please change the 2 in num_inference_steps=2.

The default value for this parameter is 100. I have set it to 2 to force it to run on my underpowered environment.

In environments using an AMD GPU board or similar, please feel free to actively increase this value.

Result

I was able to generate audio using an AMD GPU.

Reference

stabilityai/stable-audio-open-1.0 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.