Skip to content

chore(vllm_batch): bump batch image to vLLM 0.17.1 for NVFP4 TP>1 fix#785

Open
lilyz-ai wants to merge 2 commits intomainfrom
lilyzhu/nemotron-vllm-0.17.1-batch
Open

chore(vllm_batch): bump batch image to vLLM 0.17.1 for NVFP4 TP>1 fix#785
lilyz-ai wants to merge 2 commits intomainfrom
lilyzhu/nemotron-vllm-0.17.1-batch

Conversation

@lilyz-ai
Copy link
Collaborator

@lilyz-ai lilyz-ai commented Mar 24, 2026

Summary

  • Bumps vllm_batch and vllm_batch_v2 image tags from 0.17.0-rc1-batch0.17.1-batch
  • vLLM 0.17.1 adds explicit Nemotron-3-Super NVFP4 support and fixes accuracy with tensor parallelism > 1 (required since batch jobs use num_shards=4)
  • Image built and pushed: 692474966980.dkr.ecr.us-west-2.amazonaws.com/llm-engine/batch-infer-vllm:0.17.1-batch

Build command

cd llm-engine
./model-engine/model_engine_server/inference/vllm/build_and_upload_image.sh \
    batch vllm_batch \
    --vllm-version=0.17.1 \
    --vllm-base-image=vllm/vllm-openai:v0.17.1 \
    --account=692474966980

Test plan

  • After merge + Helm redeploy, configmap shows 0.17.1-batch
  • Run test_nemotron_super_120b_batch.py end-to-end

🤖 Generated with Claude Code

Greptile Summary

This PR bumps the vllm_batch and vllm_batch_v2 Helm values from 0.17.0-rc1-batch to 0.17.1-batch to pick up vLLM's official NVFP4 tensor-parallelism fix, and pairs the image bump with a small backward-compatibility shim in vllm_server.py that re-adds --disable-log-requests as a silent no-op whenever the flag is absent from vLLM's own arg parser (it was removed in v0.17).

  • charts/model-engine/values.yaml: both vllm_batch and vllm_batch_v2 tags updated from 0.17.0-rc1-batch0.17.1-batch.
  • model-engine/model_engine_server/inference/vllm/vllm_server.py: defensive if not any(...) guard checks parser._actions before adding the deprecated flag, preventing a double-registration conflict if vLLM ever re-introduces the argument in a future version.
  • The shim is well-commented and follows existing patterns in the file; no functional logic is changed.

Confidence Score: 5/5

  • This PR is safe to merge — it is a targeted version bump with a minimal, well-guarded backward-compat shim and no logic regressions.
  • Both changes are minimal and low-risk: the values.yaml change is a single-line tag bump, and the vllm_server.py change is a defensive no-op shim that only activates when the flag is absent from the parser. The guard prevents double-registration, the help text is accurate, and the comment clearly explains the rationale. No existing behaviour is altered for any caller that does not pass the deprecated flag.
  • No files require special attention.

Important Files Changed

Filename Overview
charts/model-engine/values.yaml Bumps vllm_batch and vllm_batch_v2 image tags from 0.17.0-rc1-batch to 0.17.1-batch. Straightforward version pin change with no logic impact.
model-engine/model_engine_server/inference/vllm/vllm_server.py Adds a backward-compatibility shim that re-registers --disable-log-requests as a no-op if vLLM v0.17+ dropped it, preventing arg-parse failures in existing callers. The guard against double-registration and use of getattr are correct defensive patterns.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[parse_args called] --> B[make_arg_parser builds vLLM parser]
    B --> C{Is '--disable-log-requests'\nalready in parser._actions?}
    C -- Yes\n(vLLM re-added it) --> E[Skip — no duplicate registration]
    C -- No\n(vLLM v0.17+ removed it) --> D[Add '--disable-log-requests'\nas no-op action=store_true]
    D --> F[parser.parse_args]
    E --> F
    F --> G[Return args to caller]
Loading

Reviews (1): Last reviewed commit: "chore(vllm_batch): bump batch image to v..." | Re-trigger Greptile

lilyz-ai and others added 2 commits March 24, 2026 22:43
…d compat

vLLM 0.17.0 removed --disable-log-requests from its arg parser, but
model-engine hardcodes this flag when launching vllm_server. Accept
the flag as a no-op when it's not present in the vLLM parser to avoid
startup failures with newer vLLM images.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
vLLM 0.17.1 adds explicit Nemotron-3-Super NVFP4 support and fixes
accuracy with tensor parallelism > 1 (required for 4-GPU batch jobs).

Image 0.17.1-batch built from vllm/vllm-openai:v0.17.1 and pushed to
692474966980.dkr.ecr.us-west-2.amazonaws.com/llm-engine/batch-infer-vllm:0.17.1-batch

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lilyz-ai lilyz-ai enabled auto-merge (squash) March 24, 2026 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant