Replies: 2 comments
-
|
Hey, in addition: windows 10 |
Beta Was this translation helpful? Give feedback.
-
|
The parameter of current version of JamePeng's folk is named |
Beta Was this translation helpful? Give feedback.
-
|
Hey, in addition: windows 10 |
Beta Was this translation helpful? Give feedback.
-
|
The parameter of current version of JamePeng's folk is named |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
📸 Test Feedback: Omni-Modal / Multimodal Models (Vision / Audio / Video)
Hi all,
I am collecting real-world test results and edge-case reports for our newly overhauled Multimodal architecture (covering Vision, Audio, Omni models, and future Video support).
Whether you are running multi-turn interactive chats or stateless single-turn inferences (like ComfyUI nodes), your feedback on KV cache alignment, memory shifts, and M-RoPE position folding is extremely valuable.
To help trace the underlying C++ chunk evaluations and temporal position (
n_tokens,n_pos) math, you MUST setverbose=Truein both yourLlamaandChatHandlerinitializations. (Logs without verbose output lack the C++ backend traces needed to debug multimodal issues).📝 What to include in your reply:
If you are testing any Multimodal models (e.g., Qwen-VL, LLaVA, MiniCPM-V, etc.), please reply with:
Models & Repos: * The main LLM path (e.g.,
Qwen3.5-VL-7B.gguf)mmproj-BF16.gguf)Environment: OS, Hardware setup, VRAM capacity, and multiple-GPU config (if any).
Version: Your
llama-cpp-pythoncommit hash or branch.Media Details: What are you passing to the model? (e.g., Single 1080p image, 3-12 concurrent images, a 5-second MP3/WAV audio file, or video(future) etc.)
Important Flags & Configs: Context size (
n_ctx), batch size (n_batch), and checkpoint settings (ctx_checkpoints).Performance: Prompt eval time (crucial for heavy images), generation tokens/s, and VRAM usage.
The Verbose Logs: Paste the full runtime logs (especially the lines starting with
Llama.generate:andMTMDChatHandler(__call__):).Feedback: Any crashes (like negative token errors), infinite loops, unexpected OOMs, quality degradation, or surprisingly good results!
Even a single successful log or a minor observation helps us bulletproof this architecture. Thank you! 🚀
Beta Was this translation helpful? Give feedback.
All reactions