Relational Time Engine (RTE): runtime density regulation for compute-efficient transformer inference. Demonstrates up to 75% layer reduction with improved latency and throughput.
python benchmark machine-learning deep-learning transformer event-driven energy-efficiency ai-systems inference-optimization early-exit transformer-architecture green-ai ai-optimization runtime-systems efficient-ai energy-efficient-ai transformer-inference-overhead runtime-gating relational-time
-
Updated
Mar 12, 2026 - Python