The purpose
I’ll try running an LLM using an AMD GPU.
I’ll be using DirectML and its sample code.
I’ll modify the code, and please proceed at your own risk.
However, on the following environment, it barely runs (it’s unstable and gives strange answers):
CPU | AMD Ryzen 7 7735HS |
---|---|
Memory | 32GB |
Storage | external HDD (System Disk is SSD) |
GPU | AMD Radeon 680M (CPU integrated) |
Setup environment
Create a working folder.
Then, clone the following repository:
Move to your venv Environment (Optional)
If needed, run the following commands in Command Prompt to create and activate your venv environment:
python -mvenv venv
venv\scripts\activate.bat
Move to working folder
Run the following command to move to the LLM sample code. (The cloned repository is a collection of DirectML sample code, so you’ll need to navigate to the specific working folder to use individual samples.)
cd PyTorch\llm
Install Library
Run the following command to install the necessary libraries.
pip install -r requirements.txt
pip install torch_directml
pip install huggingface_hub
Modify code
Delete or comment out the following line in app.py:
from huggingface_hub.utils._errors import RepositoryNotFoundError
Edit as follows or delete the except
block:
Before:
except RepositoryNotFoundError as e:
after:
except:
It seems an error is occurring due to a huggingface_hub
version upgrade.
I’m currently able to run it by disabling the problematic parts.
However, it’s possible that the model download error handling isn’t working correctly.
Run
Launch the following command. (The model will be downloaded automatically, so it will take some time the first time you run it.)
python app.py
If the following is displayed in the command prompt, open the displayed URL in your browser.
Running on local URL: http://127.0.0.1:7860
If a screen like the one below appears, you’ve succeeded. (Enter your prompt at the bottom of the screen, and the answer will appear at the top.)

Result
I was able to run an LLM on AMD’s integrated CPU GPU. The response speed is also realistic. (It might even be faster than Gemini and similar services.)
However, I tried two models, but they didn’t work properly as follows.
(Since they don’t give exactly the same answers, I believe it’s either a model issue or a specification/hardware issue.)
microsoft/Phi-3-mini-4k-instruct
(Default)- It only gives similar answers no matter what I ask (though the answers aren’t exactly identical, so it seems to be working to some extent).
microsoft/phi-2
- errors out after a few interactions or exchanges.
- Garbled Japanese Characters

comment