The purpose
llama.cppを使用してローカルで画像入力ありのLLM(チャットAI)を実行します。
この記事ではgoogleのローカル向けのモデルであるQwen2.5-VLを使用します。
AMDの GPUでもGPUのない環境( CPU)でも実行可能です。
Gammaの起動は以下のページを参照してください。
Build environment
llama.cpp
Download the zip file that matches your environment from the page below.
If you want to run it on Windows with an AMD GPU (or without a GPU), it will work with the package for Vulkan.
If you are using an Nvidia GPU, it will work with the package for CUDA.
If it does not work with the versions above, use the package for CPU.
Once you extract the downloaded file into the folder of your choice, the preparation is complete.
Model
Download a total of two files from the page: one of the Qwen2.5-VL-3B-Instruct-XXXXXXX.gguf files and one of the mmproj-Qwen2.5-VL-3B-Instruct-XXXXXXX.gguf files.

Execute
Executing as server
Run the following command in the Command Prompt.
Note: Please replace model_path and mmproj_model_path with the actual paths to the models you downloaded.
llama-server -m model_path --mmproj mmproj_model_path --port 8080
Once the model finishes loading, you will see a message like this:
main: model loaded
main: server is listening on http://127.0.0.1:8080
main: starting the main loop...
srv update_slots: all slots are idle

404 Not Found(http://127.0.0.1:8080/).The interface will look like this, and you can start chatting.

You can drag and drop image files onto the shown page.
Troubleshooting
An AMD bug report appeared and the system crashed when I inputted an image.
I resolved the issue by doing the following two things (though I am not sure which one was the actual cause):
1. Updated the driver from the following page:

2. Launch AMD Software (Adrenalin Edition) and change “Memory Optimizer” under the “Performance” → “Tuning” tab to “Gaming”. (This increased the dedicated GPU memory from 2GB to 4GB.)
Reference


コメント