Conversation
Re-opening of #929 which was reverted. Add --exclusive flag to MI355X salloc (single-node) and sbatch (multi-node) to prevent node sharing during benchmarks, matching NVIDIA runner behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you |
|
@seungrokj had to re open |
|
|
||
| set -x | ||
| salloc --partition=$PARTITION --gres=gpu:$TP --cpus-per-task=128 --time=180 --no-shell --job-name="$RUNNER_NAME" | ||
| salloc --partition=$PARTITION --gres=gpu:$TP --exclusive --cpus-per-task=128 --time=180 --no-shell --job-name="$RUNNER_NAME" |
There was a problem hiding this comment.
🔴 This PR fixes --exclusive for MI355X salloc but leaves MI325X (runners/launch_mi325x-amd.sh) and MI300X (runners/launch_mi300x-amds.sh) without the flag, making the fix incomplete relative to the PR title "[AMD] Fix slurm command for AMD devices". The already-merged perf-changelog from PR #931 explicitly documents that --exclusive on MI325X salloc prevents node sharing, but the corresponding code change was never applied, creating a contradiction between documented behavior and actual behavior.
Extended reasoning...
What the bug is and how it manifests
This PR re-adds --exclusive to runners/launch_mi355x-amds.sh line 162 and benchmarks/multi_node/amd_utils/submit.sh, but the analogous fix is absent from runners/launch_mi325x-amd.sh and runners/launch_mi300x-amds.sh. Both of those files still contain salloc invocations without --exclusive, meaning benchmarks on MI325X and MI300X can share nodes with other jobs, leading to the same performance regression the PR title claims to fix.
The specific code path that triggers it
For MI325X single-node benchmarks, the salloc on line 12 of runners/launch_mi325x-amd.sh runs without --exclusive. The SLURM scheduler can therefore co-schedule other workloads on the same node during the benchmark run, introducing unpredictable contention for GPUs, memory bandwidth, and NIC resources. MI300X is in the same situation. Neither runner has an IS_MULTINODE branch that would receive the sbatch --exclusive fix from submit.sh.
Why existing code does not prevent it
The git history makes the gap clear: PR #929 added --exclusive to all AMD runners; PR #933 reverted it entirely. PR #931 (already merged and not reverted) added a perf-changelog entry stating "Document --exclusive flag on MI325X salloc prevents node sharing during benchmarks" and "May yield performance improvements" - but that PR was a chore commit that only touched perf-changelog.yaml, never the runner scripts. This PR (#934) re-adds the MI355X piece but omits MI325X and MI300X, so the gap between documentation and code persists.
Addressing the refutation
One verifier argues the omission is intentional, citing a commit from PR #929 that said "scope PR to MI355X only, remove MI325X changes. MI325X now has its own PR (#931)." However, PR #931 never added --exclusive to launch_mi325x-amd.sh - it was a documentation-only chore PR. The intent to fix MI325X in a dedicated PR was stated, but the implementation was never delivered. As a result, the merged perf-changelog now describes behavior that does not exist in the codebase, and this PR (which is explicitly a re-open of #929 with the goal of fixing AMD slurm clusters) is a natural place to complete the work.
Impact
The PR description shows benchmark screenshots demonstrating measurable performance regression without --exclusive. The perf-changelog for PR #931 documents "May yield performance improvements" specifically for gptoss-fp4-mi325x-vllm and minimaxm2.5-fp8-mi325x-vllm. Leaving MI325X and MI300X without --exclusive means those benchmarks may continue to run under node-sharing conditions, producing results that are neither reproducible nor comparable to baselines.
Step-by-step proof
- PR chore: add perf-changelog for MI325X --exclusive flag impact #931 merged commit 7468fb3:
perf-changelog.yamlgains entry claiming--exclusiveon MI325X salloc prevents node sharing. - The actual
launch_mi325x-amd.shline 12 still reads:salloc --partition=$PARTITION --gres=gpu:$TP --cpus-per-task=256 --time=480 --no-shell --job-name="$RUNNER_NAME"- no--exclusive. - This PR adds
--exclusivetolaunch_mi355x-amds.shline 162 (visible in the diff) but does not touchlaunch_mi325x-amd.shorlaunch_mi300x-amds.sh. - Any MI325X single-node benchmark run via
launch_mi325x-amd.shis still susceptible to SLURM node sharing, contradicting both the PR title and the merged perf-changelog.
How to fix it
Add --exclusive to the salloc invocation in runners/launch_mi325x-amd.sh and runners/launch_mi300x-amds.sh, mirroring the change already made in runners/launch_mi355x-amds.sh.
| - gptoss-fp4-mi355x-vllm | ||
| - minimaxm2.5-fp8-mi355x-vllm | ||
| description: | ||
| - "Add --exclusive flag to MI355X single-node salloc and multi-node sbatch to prevent node sharing during benchmarks" |
There was a problem hiding this comment.
🟡 The perf-changelog description "Only non-TP8 configs listed; TP8 already uses all GPUs on the node" is factually inaccurate — several listed configs are TP=8 (e.g., dsr1-fp4-mi355x-atom was explicitly consolidated to TP=8-only in PR #699). The prior PR #931 entry correctly said "Only non-TP8 configs are affected", but changing "affected" to "listed" falsely implies TP8 configs were excluded from the list.
Extended reasoning...
What the Bug Is
The new perf-changelog entry describes the --exclusive flag change with: "Only non-TP8 configs listed; TP8 already uses all GPUs on the node." This differs meaningfully from the PR #931 entry it was based on, which read: "Only non-TP8 configs are affected; TP8 already uses all GPUs on the node."
The Specific Issue
The difference is one word: "affected" (PR #931) vs "listed" (this PR). In PR #931, "affected" correctly meant the --exclusive flag only has a performance impact for non-TP8 sub-configurations, since TP8 already occupies all 8 GPUs making --exclusive redundant for them. "Listed" instead implies only non-TP8 config-keys appear in this entry, which is false.
Why the Description Is Inaccurate
The listed config-keys clearly include TP=8 configurations:
dsr1-fp4-mi355x-atomanddsr1-fp4-mi355x-atom-mtp: PR [AMD] Update search-space for ATOM DSR1 configs #699 perf-changelog entry explicitly states "Comment out TP=4 configs, consolidate to TP=8 only" — this config has no non-TP8 variants whatsoever.dsr1-fp8-mi355x-sglang: Search space includes tp:8 entries for 1k1k and 1k8k.dsr1-fp4-mi355x-sglang: Has both tp:4 and tp:8 entries.gptoss-fp4-mi355x-atom,gptoss-fp4-mi355x-vllm: Include tp:8 in their search spaces.
One refutation argued that all listed configs have non-TP8 variants and the description is just imprecisely worded. However, this fails for dsr1-fp4-mi355x-atom which was explicitly consolidated to TP=8 only with no non-TP8 variants remaining. Furthermore, "Only non-TP8 configs listed" in plain English means "the listed configs are non-TP8," not "configs that happen to have some non-TP8 sub-configurations."
Step-by-Step Proof
- PR chore: add perf-changelog for MI325X --exclusive flag impact #931 uses: "Only non-TP8 configs are affected" — correct, --exclusive helps only non-TP8 runs.
- This PR copies that language but changes "affected" to "listed," making it read: "Only non-TP8 configs [are] listed."
- Check
dsr1-fp4-mi355x-atom: PR [AMD] Update search-space for ATOM DSR1 configs #699 entry says "Comment out TP=4 configs, consolidate to TP=8 only" — TP=8-only, no non-TP8 variants. dsr1-fp4-mi355x-atomappears in the new entry's config-keys list.- Conclusion: a TP=8-only config is listed, directly contradicting "only non-TP8 configs listed."
Impact
Documentation-only inaccuracy with no functional impact on the correct --exclusive flag change. However, readers consulting the perf-changelog to understand which configurations benefit from --exclusive would be misled into thinking TP=8 configs were intentionally excluded.
How to Fix
Change "Only non-TP8 configs listed" back to "Only non-TP8 configs are affected" (matching PR #931 intent), making clear this notes that TP8 does not benefit from --exclusive — not that TP8 configs are absent from the list.
Hi @cquil11 @functionstackx
Motivation is AMD slurm clusters are missing --exclusive option and they showed regressed performance. NV slurm clusters are already using --exclusive option.
without --exclusive

https://github.com/SemiAnalysisAI/InferenceX/actions/runs/23433485252
with --exclusive

https://github.com/SemiAnalysisAI/InferenceX/actions/runs/23439928217
Regards,
Seungrok