fix: Remove evaluation metric key from schema which failed on some LLMs#105
fix: Remove evaluation metric key from schema which failed on some LLMs#105jsonbailey wants to merge 6 commits intomainfrom
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
| # Bedrock requires the foundation provider (e.g. Bedrock:Anthropic) passed in | ||
| # parameters separately from model_provider, which is used for LangChain routing. | ||
| if mapped_provider == 'bedrock_converse' and 'provider' not in parameters: | ||
| parameters['provider'] = provider |
There was a problem hiding this comment.
Bedrock provider parameter passes wrong format to LangChain
High Severity
The provider variable holds the raw LaunchDarkly provider name (e.g., "Bedrock:Anthropic" or "Bedrock"), which gets passed directly as parameters['provider'] to init_chat_model / ChatBedrockConverse. However, ChatBedrockConverse expects the provider parameter to be just the model family name in lowercase (e.g., "anthropic"), not the full LD-formatted name. Passing "Bedrock:Anthropic" will cause incorrect provider inference and likely break Bedrock model initialization.
| usage=TokenUsage(total=0, input=0, output=0), | ||
| ), | ||
| ) | ||
| return structured_response |
There was a problem hiding this comment.
Exception handler may return success=True after partial mutation
Low Severity
The except handler returns the shared mutable structured_response without resetting metrics.success. After line 110, get_ai_metrics_from_response replaces the metrics with success=True. If any exception occurs between that point and the explicit returns, the handler returns a response indicating success despite the failure. The previous code defensively created a fresh StructuredResponse with success=False in the handler.


fix: Improve metric token collection for Judge evaluations when using LangChain
fix: Include raw response when performing Judge evaluations
Note
Medium Risk
Changes the structured-output contract for Judge evaluations and modifies LangChain structured invocation/metrics extraction, which could affect evaluation parsing and reported token usage across providers.
Overview
Judge structured evaluation output is simplified from a dynamic
evaluations[{metricKey}]shape to a fixedevaluation { score, reasoning }schema, and parsing/validation is updated to key results byevaluation_metric_keyat runtime (with tests adjusted accordingly).LangChain structured invocations now request
include_raw=True, propagate the raw model message + token usage intoStructuredResponse, treat parsing errors as failures, and improve provider handling by mapping Bedrockbedrock:*tobedrock_converse(including passing the original provider string via parameters when needed) and reading token usage fromusage_metadatawhen available.Written by Cursor Bugbot for commit 1ed23cf. This will update automatically on new commits. Configure here.