-
Notifications
You must be signed in to change notification settings - Fork 259
Description
Is this a duplicate?
- I confirmed there appear to be no duplicate issues for this request and that I agree to the Code of Conduct
Area
cuda.core
Is your feature request related to a problem? Please describe.
cuda.core currently pays repeated JIT compilation costs for kernels which makes cold-start latency much higher than it needs to be and hurts iterative workflows. CuPy has shown that persistent JIT caching can reduce this overhead by caching compiled kernels for reuse across runs.
Describe the solution you'd like
Add a persistent JIT cache to cuda.core for compiled artifacts and any expensive intermediate/header-processing steps involved in runtime compilation. The cache should key entries on all inputs that affect correctness, such as source code, compilation options, target architecture, toolkit/runtime version, and relevant dependency/header content, so cached artifacts are safely reused only when valid. The JIT cache should support logging/telemetry so users can tell if they are hitting or missing the cache. In addition, the JIT cache should allow users to configure cache location and cache size.
Describe alternatives you've considered
No response
Additional context
No response