chore(vllm_batch): bump batch image to vLLM 0.17.1 for NVFP4 TP>1 fix by lilyz-ai · Pull Request #785 · scaleapi/llm-engine

lilyz-ai · 2026-03-24T23:06:54Z

Summary

Bumps vllm_batch and vllm_batch_v2 image tags from 0.17.0-rc1-batch → 0.17.1-batch
vLLM 0.17.1 adds explicit Nemotron-3-Super NVFP4 support and fixes accuracy with tensor parallelism > 1 (required since batch jobs use num_shards=4)
Image built and pushed: 692474966980.dkr.ecr.us-west-2.amazonaws.com/llm-engine/batch-infer-vllm:0.17.1-batch

Build command

cd llm-engine
./model-engine/model_engine_server/inference/vllm/build_and_upload_image.sh \
    batch vllm_batch \
    --vllm-version=0.17.1 \
    --vllm-base-image=vllm/vllm-openai:v0.17.1 \
    --account=692474966980

Test plan

After merge + Helm redeploy, configmap shows 0.17.1-batch
Run test_nemotron_super_120b_batch.py end-to-end

🤖 Generated with Claude Code

Greptile Summary

This PR bumps the vllm_batch and vllm_batch_v2 Helm values from 0.17.0-rc1-batch to 0.17.1-batch to pick up vLLM's official NVFP4 tensor-parallelism fix, and pairs the image bump with a small backward-compatibility shim in vllm_server.py that re-adds --disable-log-requests as a silent no-op whenever the flag is absent from vLLM's own arg parser (it was removed in v0.17).

charts/model-engine/values.yaml: both vllm_batch and vllm_batch_v2 tags updated from 0.17.0-rc1-batch → 0.17.1-batch.
model-engine/model_engine_server/inference/vllm/vllm_server.py: defensive if not any(...) guard checks parser._actions before adding the deprecated flag, preventing a double-registration conflict if vLLM ever re-introduces the argument in a future version.
The shim is well-commented and follows existing patterns in the file; no functional logic is changed.

Confidence Score: 5/5

This PR is safe to merge — it is a targeted version bump with a minimal, well-guarded backward-compat shim and no logic regressions.
Both changes are minimal and low-risk: the values.yaml change is a single-line tag bump, and the vllm_server.py change is a defensive no-op shim that only activates when the flag is absent from the parser. The guard prevents double-registration, the help text is accurate, and the comment clearly explains the rationale. No existing behaviour is altered for any caller that does not pass the deprecated flag.
No files require special attention.

Important Files Changed

Filename	Overview
charts/model-engine/values.yaml	Bumps vllm_batch and vllm_batch_v2 image tags from 0.17.0-rc1-batch to 0.17.1-batch. Straightforward version pin change with no logic impact.
model-engine/model_engine_server/inference/vllm/vllm_server.py	Adds a backward-compatibility shim that re-registers --disable-log-requests as a no-op if vLLM v0.17+ dropped it, preventing arg-parse failures in existing callers. The guard against double-registration and use of getattr are correct defensive patterns.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[parse_args called] --> B[make_arg_parser builds vLLM parser]
    B --> C{Is '--disable-log-requests'\nalready in parser._actions?}
    C -- Yes\n(vLLM re-added it) --> E[Skip — no duplicate registration]
    C -- No\n(vLLM v0.17+ removed it) --> D[Add '--disable-log-requests'\nas no-op action=store_true]
    D --> F[parser.parse_args]
    E --> F
    F --> G[Return args to caller]

_{Reviews (1): Last reviewed commit: "chore(vllm_batch): bump batch image to v..." | Re-trigger Greptile}

…d compat vLLM 0.17.0 removed --disable-log-requests from its arg parser, but model-engine hardcodes this flag when launching vllm_server. Accept the flag as a no-op when it's not present in the vLLM parser to avoid startup failures with newer vLLM images. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

vLLM 0.17.1 adds explicit Nemotron-3-Super NVFP4 support and fixes accuracy with tensor parallelism > 1 (required for 4-GPU batch jobs). Image 0.17.1-batch built from vllm/vllm-openai:v0.17.1 and pushed to 692474966980.dkr.ecr.us-west-2.amazonaws.com/llm-engine/batch-infer-vllm:0.17.1-batch Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

lilyz-ai and others added 2 commits March 24, 2026 22:43

lilyz-ai enabled auto-merge (squash) March 24, 2026 23:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(vllm_batch): bump batch image to vLLM 0.17.1 for NVFP4 TP>1 fix#785

chore(vllm_batch): bump batch image to vLLM 0.17.1 for NVFP4 TP>1 fix#785
lilyz-ai wants to merge 2 commits intomainfrom
lilyzhu/nemotron-vllm-0.17.1-batch

lilyz-ai commented Mar 24, 2026 •

edited by greptile-apps bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lilyz-ai commented Mar 24, 2026 • edited by greptile-apps bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Build command

Test plan

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lilyz-ai commented Mar 24, 2026 •

edited by greptile-apps bot

Loading