Running a Local LLM with Image Input Support (AMD GPU / CPU Compatible)

This article can be read in about 4 minutes.

The purpose

llama.cppを使用してローカルで画像入力ありのLLM(チャットAI)を実行します。

この記事ではgoogleのローカル向けのモデルであるQwen2.5-VLを使用します。

AMDの GPUでもGPUのない環境( CPU)でも実行可能です。

Gammaの起動は以下のページを参照してください。

Build environment

llama.cpp

Download the zip file that matches your environment from the page below.

If you want to run it on Windows with an AMD GPU (or without a GPU), it will work with the package for Vulkan.

If you are using an Nvidia GPU, it will work with the package for CUDA.

If it does not work with the versions above, use the package for  CPU.

Releases · ggml-org/llama.cpp
LLM inference in C/C++. Contribute to ggml-org/llama.cpp development by creating an account on GitHub.

Once you extract the downloaded file into the folder of your choice, the preparation is complete.

Model

Download a total of two files from the page: one of the Qwen2.5-VL-3B-Instruct-XXXXXXX.gguf files and one of the mmproj-Qwen2.5-VL-3B-Instruct-XXXXXXX.gguf files.

ggml-org/Qwen2.5-VL-3B-Instruct-GGUF at main
We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Execute

Executing as server

Run the following command in the Command Prompt.

Note: Please replace model_path and mmproj_model_path with the actual paths to the models you downloaded.

llama-server -m model_path --mmproj mmproj_model_path --port 8080

Once the model finishes loading, you will see a message like this:

main: model loaded
main: server is listening on http://127.0.0.1:8080
main: starting the main loop...
srv  update_slots: all slots are idle
When you see the message above, open a browser (such as Google Chrome) and go to
404 Not Found
(http://127.0.0.1:8080/)
.

The interface will look like this, and you can start chatting.

You can drag and drop image files onto the shown page.

Troubleshooting

An AMD bug report appeared and the system crashed when I inputted an image.

I resolved the issue by doing the following two things (though I am not sure which one was the actual cause):

1. Updated the driver from the following page:

プロセッサ/グラフィックスのドライバーとサポート
AMD 製品のドライバーとソフトウェアをダウンロード — Windows および Linux のサポート、自動検出ツール、インストールの詳細ガイドもご利用いただけます。

2. Launch AMD Software (Adrenalin Edition) and change “Memory Optimizer” under the “Performance” → “Tuning” tab to “Gaming”. (This increased the dedicated GPU memory from 2GB to 4GB.)

Reference

【備忘録】llama.cppで、マルチモーダルがサポートされたので使ってみた。|猫又
個人用の備忘録です。 llama.cppは以下を使用 ・llama-b5342-bin-win-cuda12.4-x64 モデルは以下からダウンロードして使用 ・Qwen2.5-VL-3B-Instruct-Q4_K_M.gguf ・mmproj-Qwen2.5-VL-3B-Instruct-f16...

コメント

Copied title and URL