📸Test Feedback: Multimodal Models (Vision / Audio / Omni) #85

JamePeng · 2026-03-08T11:28:00Z

JamePeng
Mar 8, 2026
Maintainer

📸 Test Feedback: Omni-Modal / Multimodal Models (Vision / Audio / Video)

Hi all,

I am collecting real-world test results and edge-case reports for our newly overhauled Multimodal architecture (covering Vision, Audio, Omni models, and future Video support).

Whether you are running multi-turn interactive chats or stateless single-turn inferences (like ComfyUI nodes), your feedback on KV cache alignment, memory shifts, and M-RoPE position folding is extremely valuable.

⚠️ CRITICAL REQUIREMENT for Logs

To help trace the underlying C++ chunk evaluations and temporal position (n_tokens, n_pos) math, you MUST set verbose=True in both your Llama and ChatHandler initializations. (Logs without verbose output lack the C++ backend traces needed to debug multimodal issues).

📝 What to include in your reply:

If you are testing any Multimodal models (e.g., Qwen-VL, LLaVA, MiniCPM-V, etc.), please reply with:

Models & Repos: * The main LLM path (e.g., Qwen3.5-VL-7B.gguf)
- The multimodal projector path (e.g., mmproj-BF16.gguf)
Environment: OS, Hardware setup, VRAM capacity, and multiple-GPU config (if any).
Version: Your llama-cpp-python commit hash or branch.
Media Details: What are you passing to the model? (e.g., Single 1080p image, 3-12 concurrent images, a 5-second MP3/WAV audio file, or video(future) etc.)
Important Flags & Configs: Context size (n_ctx), batch size (n_batch), and checkpoint settings (ctx_checkpoints).
Performance: Prompt eval time (crucial for heavy images), generation tokens/s, and VRAM usage.
The Verbose Logs: Paste the full runtime logs (especially the lines starting with Llama.generate: and MTMDChatHandler(__call__):).
Feedback: Any crashes (like negative token errors), infinite loops, unexpected OOMs, quality degradation, or surprisingly good results!

Even a single successful log or a minor observation helps us bulletproof this architecture. Thank you! 🚀

kalle07 · 2026-03-24T06:40:00Z

kalle07
Mar 24, 2026

Hey,
Before I attach long logs here... I've noticed that only when I switch from Qwen3VLChatHandler
to Qwen35ChatHandler (I'm only changing the chat handler and oc the model) does an error occur for force_reasoning (doesnt exist).
The same applies to MTMDChatHandler
TypeError: Initialization Error in Qwen35ChatHandler: Received unexpected keyword argument(s) 'force_reasoning'.

in addition:
I don't know if the arguments presence_penalty etc.., are even present; there are errors there as well ...

windows 10
Python Version: 3.12.7
LLAMA Version: 0.3.33

0 replies

yamikumo-DSD · 2026-03-24T09:09:56Z

yamikumo-DSD
Mar 24, 2026

The parameter of current version of JamePeng's folk is named present_penalty, not presence_penalty used in the original abetlen's repo.
I guess that causes the one of your error.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

📸Test Feedback: Multimodal Models (Vision / Audio / Omni) #85

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

📸Test Feedback: Multimodal Models (Vision / Audio / Omni) #85

Uh oh!

JamePeng Mar 8, 2026 Maintainer

📸 Test Feedback: Omni-Modal / Multimodal Models (Vision / Audio / Video)

⚠️ CRITICAL REQUIREMENT for Logs

📝 What to include in your reply:

Replies: 2 comments

Uh oh!

Uh oh!

kalle07 Mar 24, 2026

Uh oh!

yamikumo-DSD Mar 24, 2026

JamePeng
Mar 8, 2026
Maintainer

kalle07
Mar 24, 2026

yamikumo-DSD
Mar 24, 2026