NVIDIA / TransformerEngine Public

Notifications You must be signed in to change notification settings
Fork 658
Star 3.2k

Code
Issues 223
Pull requests 116
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: NVIDIA/TransformerEngine

Labels 69 Milestones 0

New pull request New

116 Open 1,938 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[Pytorch] Add QuantizedTensor support in FusedAdam.step for MXFP8BlockScaling and Float8BlockScaling quantized model init.

#2753 opened Mar 11, 2026 by jomitchellnv

Loading…

13 tasks

[JAX] Change dtype of intermediate result aval of fused_topk_and_score_function_fwd to fp32

#2752 opened Mar 10, 2026 by tdophung

Loading…

1 of 13 tasks

Support configurable number of philox rounds for stochastic rounding 2.14.0

#2751 opened Mar 10, 2026 by ksivaman

Loading…

7 of 13 tasks

[PyTorch] Fix fuser so it releases tensors properly

#2750 opened Mar 10, 2026 by kainzhong

Loading…

5 of 13 tasks

[JAX] Grouped GEMM Refactor to use first_dims and last_dims

#2749 opened Mar 10, 2026 by jberchtold-nvidia • Draft

1 of 13 tasks

[Core] MXFP8 grouped GEMM + tensor-scaled FP8 fixes

#2748 opened Mar 9, 2026 by jberchtold-nvidia

Loading…

13 tasks

[JAX] Add bias support for v2 grouped GEMM path

#2744 opened Mar 6, 2026 by jberchtold-nvidia

Loading…

8 of 13 tasks

[Common] Persistent Grouped NVFP4 quantization kernel

#2743 opened Mar 6, 2026 by Oleg-Goncharov

Loading…

8 of 13 tasks

Add guard at lowest JAX version that still supports triton kernel calling

#2741 opened Mar 6, 2026 by tdophung

Loading…

6 of 13 tasks

[JAX] Collective GEMM with FP8 and MXFP8 support

#2740 opened Mar 5, 2026 by phu0ngng

Loading…

7 of 13 tasks

[Common] Persistent Grouped MXFP8 quantization kernel enhancement

New feature or request

MoE

#2738 opened Mar 5, 2026 by Oleg-Goncharov

Loading…

9 of 13 tasks

Feat/cp nvshmem enhanced community-contribution

PRs from external contributor outside the core maintainers, representing community-driven work.

#2737 opened Mar 5, 2026 by Knight-of-Thunder

Loading…

1 of 13 tasks

Feature/unswizzle community-contribution

PRs from external contributor outside the core maintainers, representing community-driven work.

#2732 opened Mar 4, 2026 by int-smart

Loading…

9 of 13 tasks

fix: scope get_full_cu_seqlens cache key by device and inference mode

#2728 opened Mar 3, 2026 by DmCarpe93

Loading…

8 of 13 tasks

[CI] Refactor CI build on GitHub

#2723 opened Mar 2, 2026 by ptrendx

Loading…

1 of 13 tasks

[Common, pyTorch] Grouped MXFP8 dequantize support

#2722 opened Mar 2, 2026 by ptrendx

Loading…

1 of 13 tasks

Fix for async dcp checkpointing with Float8Tensors

#2721 opened Mar 2, 2026 by pstjohn • Draft

Add MXFP8 attention

#2719 opened Mar 1, 2026 by cyanguwa • Draft

13 tasks

[PyTorch] Fix CPU offloading for bulk-allocated quantized tensors

#2716 opened Feb 27, 2026 by lhb8125 • Draft

1 of 13 tasks

Add DCP compatibility for FSDP2-TP sharding in TransformerEngine.

#2713 opened Feb 26, 2026 by cspades

Loading…

3 of 13 tasks

[Common][PyTorch] Add z_loss_weight and log_sum_exp output to parallel_cross_entropy

#2707 opened Feb 26, 2026 by bassoy • Draft

8 tasks done

Newton-Schulz via cuSOLVERMp

#2706 opened Feb 25, 2026 by vcherepanov-nv

Loading…

6 of 13 tasks

[All] Added better error messages

#2705 opened Feb 25, 2026 by ptrendx

Loading…

[Draft][PyTorch] torch.compile support for TE Linear

#2701 opened Feb 24, 2026 by pggPL • Draft

13 tasks

Enable sm120 support for fused attn if cuDNN is 9.18.1+

#2693 opened Feb 20, 2026 by KshitijLakhani • Draft

13 tasks

Previous 1 2 3 4 5 Next

Previous Next

ProTip! Add no:assignee to see everything that’s not assigned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!