Open source C/C++ engine for high-performance local LLM inference and on-device AI.
-
Updated
Mar 14, 2026 - C++
Open source C/C++ engine for high-performance local LLM inference and on-device AI.
GPT-OSS B20 Local Execution. Lightweight local environment for running it with Python 3.12 and CUDA acceleration. - Run GPT-OSS B20 entirely offline - Optimize text generation with GPU - Enable fast, secure inference on consumer hardware.
Add a description, image, and links to the model-runtime topic page so that developers can more easily learn about it.
To associate your repository with the model-runtime topic, visit your repo's landing page and select "manage topics."