Running an LLM Locally with AMD GPU (DirectML) – Experimental Setup

Table of contents

The purpose
Setup environment
Run
Result

The purpose

I’ll try running an LLM using an AMD GPU.

I’ll be using DirectML and its sample code.

I’ll modify the code, and please proceed at your own risk.

However, on the following environment, it barely runs (it’s unstable and gives strange answers):

CPU	AMD Ryzen 7 7735HS
Memory	32GB
Storage	external HDD （System Disk is SSD）
GPU	AMD Radeon 680M （CPU integrated）

Setup environment

Create a working folder.

Then, clone the following repository:

DirectML/PyTorch/llm at master · microsoft/DirectML

DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. DirectML provides GPU acceleration for common machine learning tas...

Move to your venv Environment (Optional)

If needed, run the following commands in Command Prompt to create and activate your venv environment:

python -mvenv venv
venv\scripts\activate.bat

Move to working folder

Run the following command to move to the LLM sample code. (The cloned repository is a collection of DirectML sample code, so you’ll need to navigate to the specific working folder to use individual samples.)

cd PyTorch\llm

Install Library

Run the following command to install the necessary libraries.

pip install -r requirements.txt
pip install torch_directml
pip install huggingface_hub

Modify code

Delete or comment out the following line in app.py:

from huggingface_hub.utils._errors import RepositoryNotFoundError

Edit as follows or delete the except block:

Before:

    except RepositoryNotFoundError as e:

after:

    except:

It seems an error is occurring due to a huggingface_hub version upgrade.

I’m currently able to run it by disabling the problematic parts.

However, it’s possible that the model download error handling isn’t working correctly.

Run

Launch the following command. (The model will be downloaded automatically, so it will take some time the first time you run it.)

python app.py

If the following is displayed in the command prompt, open the displayed URL in your browser.

Running on local URL: http://127.0.0.1:7860

If a screen like the one below appears, you’ve succeeded. (Enter your prompt at the bottom of the screen, and the answer will appear at the top.)

Result

I was able to run an LLM on AMD’s integrated CPU GPU. The response speed is also realistic. (It might even be faster than Gemini and similar services.)

However, I tried two models, but they didn’t work properly as follows.

(Since they don’t give exactly the same answers, I believe it’s either a model issue or a specification/hardware issue.)

microsoft/Phi-3-mini-4k-instruct（Default）
- It only gives similar answers no matter what I ask (though the answers aren’t exactly identical, so it seems to be working to some extent).
microsoft/phi-2
- errors out after a few interactions or exchanges.
- Garbled Japanese Characters