-
Notifications
You must be signed in to change notification settings - Fork 658
Pull requests: NVIDIA/TransformerEngine
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Pytorch] Add QuantizedTensor support in FusedAdam.step for MXFP8BlockScaling and Float8BlockScaling quantized model init.
#2753
opened Mar 11, 2026 by
jomitchellnv
Loading…
13 tasks
[JAX] Change dtype of intermediate result aval of fused_topk_and_score_function_fwd to fp32
#2752
opened Mar 10, 2026 by
tdophung
Loading…
1 of 13 tasks
Support configurable number of philox rounds for stochastic rounding
2.14.0
#2751
opened Mar 10, 2026 by
ksivaman
Loading…
7 of 13 tasks
[PyTorch] Fix fuser so it releases tensors properly
#2750
opened Mar 10, 2026 by
kainzhong
Loading…
5 of 13 tasks
[JAX] Grouped GEMM Refactor to use first_dims and last_dims
#2749
opened Mar 10, 2026 by
jberchtold-nvidia
•
Draft
1 of 13 tasks
[Core] MXFP8 grouped GEMM + tensor-scaled FP8 fixes
#2748
opened Mar 9, 2026 by
jberchtold-nvidia
Loading…
13 tasks
[JAX] Add bias support for v2 grouped GEMM path
#2744
opened Mar 6, 2026 by
jberchtold-nvidia
Loading…
8 of 13 tasks
[Common] Persistent Grouped NVFP4 quantization kernel
#2743
opened Mar 6, 2026 by
Oleg-Goncharov
Loading…
8 of 13 tasks
Add guard at lowest JAX version that still supports triton kernel calling
#2741
opened Mar 6, 2026 by
tdophung
Loading…
6 of 13 tasks
[JAX] Collective GEMM with FP8 and MXFP8 support
#2740
opened Mar 5, 2026 by
phu0ngng
Loading…
7 of 13 tasks
[Common] Persistent Grouped MXFP8 quantization kernel
enhancement
New feature or request
MoE
#2738
opened Mar 5, 2026 by
Oleg-Goncharov
Loading…
9 of 13 tasks
Feat/cp nvshmem enhanced
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#2737
opened Mar 5, 2026 by
Knight-of-Thunder
Loading…
1 of 13 tasks
Feature/unswizzle
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#2732
opened Mar 4, 2026 by
int-smart
Loading…
9 of 13 tasks
fix: scope get_full_cu_seqlens cache key by device and inference mode
#2728
opened Mar 3, 2026 by
DmCarpe93
Loading…
8 of 13 tasks
[Common, pyTorch] Grouped MXFP8 dequantize support
#2722
opened Mar 2, 2026 by
ptrendx
Loading…
1 of 13 tasks
Add DCP compatibility for FSDP2-TP sharding in TransformerEngine.
#2713
opened Feb 26, 2026 by
cspades
Loading…
3 of 13 tasks
Enable sm120 support for fused attn if cuDNN is 9.18.1+
#2693
opened Feb 20, 2026 by
KshitijLakhani
•
Draft
13 tasks
Previous Next
ProTip!
Add no:assignee to see everything that’s not assigned.