Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.
-
Updated
Mar 11, 2026 - Python
Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.
The Platform for Self-Improving Code. Ideal for GPU kernels, ML model development, feature engineering, prompt engineering, and other optimizable code.
Custom Linux kernels purpose-built for Apple Mac hardware
Forge: Swarm Agents That Turn Slow PyTorch Into Fast CUDA/Triton Kernels
Extended TileLang as a unified DSL to enable high-performance kernel development for Near-Memory Computing, Distributed Memory AI Accelerators, and Networked Accelerators.
A collection of high-performance CUDA kernels and experiments for learning and optimizing GPU compute primitives.
Optimized Ubuntu Touch for Lenovo Tab M8 HD (TB-8505F) - Kernel improvements, performance tuning, boot experience, and system optimizations for the MediaTek Helio A22 tablet
CUDA HPC Kernel Optimization Textbook: Naive to Tensor Core — GEMM, FlashAttention & Quantization | CUDA 高性能算子开发教科书:从 Naive 到 Tensor Core 完整优化路径,涵盖 GEMM/FlashAttention/量化
The Ultimate Kernel Orchestration Suite for Windows. Optimized for low-latency development and high-priority workloads.
Skill pack for custom PyTorch MPS kernels on Apple Silicon (examples, tests, and optimization patterns).
CUDA Kernel Library for LLM Inference: FlashAttention, HGEMM, Tensor Core GEMM with pybind11 Bindings | LLM 推理加速 CUDA Kernel 库:FlashAttention、HGEMM、Tensor Core GEMM,含 pybind11 Python 绑定
Add a description, image, and links to the kernel-optimization topic page so that developers can more easily learn about it.
To associate your repository with the kernel-optimization topic, visit your repo's landing page and select "manage topics."