Skip to content

feat: Add Oracle Cloud Infrastructure (OCI) Generative AI client support#718

Open
fede-kamel wants to merge 29 commits intocohere-ai:mainfrom
fede-kamel:feat/oci-client
Open

feat: Add Oracle Cloud Infrastructure (OCI) Generative AI client support#718
fede-kamel wants to merge 29 commits intocohere-ai:mainfrom
fede-kamel:feat/oci-client

Conversation

@fede-kamel
Copy link

@fede-kamel fede-kamel commented Jan 26, 2026

Overview

I noticed that the Cohere Python SDK has excellent integration with AWS Bedrock through the BedrockClient implementation. I wanted to contribute a similar integration for Oracle Cloud Infrastructure (OCI) Generative AI service to provide our customers with the same seamless experience.

Motivation

Oracle Cloud Infrastructure offers Cohere's models through our Generative AI service, and many of our enterprise customers use both platforms. This integration follows the same architectural pattern as the existing Bedrock client, ensuring consistency and maintainability.

Implementation

This PR adds comprehensive OCI support with:

Features

  • OciClient (V1 API) and OciClientV2 (V2 API) classes
  • Full authentication support:
    • Config file (default ~/.oci/config)
    • Custom profiles
    • Direct credentials
    • Instance principal (for OCI compute instances)
    • Resource principal
  • Complete API coverage:
    • Embed (all models: english-v3.0, light-v3.0, multilingual-v3.0)
    • Chat with streaming support (Command R and Command A models)
    • V2 API support with Command A models (command-a-03-2025)
  • Region-independent: Uses display names instead of region-specific OCIDs
  • Automatic V1/V2 API detection and transformation

Architecture

  • Follows the proven BedrockClient pattern with httpx event hooks
  • Request/response transformation between Cohere and OCI formats
  • Lazy loading of OCI SDK as optional dependency
  • Connection pooling for optimal performance

Testing

  • 14 comprehensive integration tests (100% passing)
  • Tests cover: authentication, embed, chat, chat_stream, error handling
  • Multiple model variants tested

Documentation

  • README section with usage examples
  • All authentication methods documented
  • Installation instructions for optional OCI dependency

Files Changed

  • src/cohere/oci_client.py (910 lines) - Main OCI client implementation
  • src/cohere/manually_maintained/lazy_oci_deps.py (30 lines) - Lazy OCI SDK loading
  • tests/test_oci_client.py (393 lines) - Comprehensive integration tests
  • README.md - OCI usage documentation
  • pyproject.toml - Optional OCI dependency
  • src/cohere/__init__.py - Export OciClient and OciClientV2

Test Results

14 passed, 8 skipped, 0 failed

Skipped tests are for OCI service limitations (base models not callable via on-demand inference).

Breaking Changes

None. This is a purely additive feature.

Checklist

  • Code follows repository style (ruff passing)
  • Tests added and passing
  • Documentation updated
  • No breaking changes

Note

Medium Risk
Adds a sizable new OCI transport layer with custom request signing and streaming/event transformations, which is moderately risky due to the complexity of protocol mapping and auth edge cases. Changes are largely additive/optional and should not affect existing non-OCI clients unless imported/used.

Overview
Adds optional Oracle Cloud Infrastructure (OCI) Generative AI support via new OciClient (v1) and OciClientV2 (v2), exposed from cohere.__init__.

Implements OCI request signing and endpoint mapping using httpx event hooks, including request/response body translation for embed and chat plus streaming translation to Cohere’s V1/V2 stream event formats (including V2 thinking blocks and usage extraction).

Introduces an optional oci dependency (cohere[oci]) with lazy importing, factors out a shared Streamer utility used by both AWS and OCI clients, and adds OCI-focused tests plus README documentation covering install, auth methods, supported APIs, and OCI limitations.

Written by Cursor Bugbot for commit 14b5c6e. This will update automatically on new commits. Configure here.

@fede-kamel
Copy link
Author

fede-kamel commented Jan 26, 2026

@walterbm-cohere @daniel-cohere @billytrend-cohere

Hey maintainers,

Friendly bump on this PR - would appreciate your feedback when you have a chance. Happy to address any concerns or make changes as needed.

Thanks.

@fede-kamel
Copy link
Author

Addressed Bugbot feedback:

  1. V2 streaming ends with wrong event (High) - Now emits message-end event before returning on [DONE]

  2. Direct OCI credentials can crash (Medium) - Added validation: when oci_user_id is provided, oci_fingerprint and oci_tenancy_id are now required with a clear error message

@fede-kamel
Copy link
Author

@billytrend-cohere @mkozakov @sanderland @abdullahkady — would appreciate a review on this when you have a moment.

This PR has been rebased on the latest main (no conflicts). Created a corresponding feature request: #735.

What this adds: Oracle Cloud Infrastructure (OCI) Generative AI client support, following the same architectural pattern as the existing BedrockClient. Adds OciClient and OciClientV2 classes with full authentication support, embed, chat, and streaming capabilities.

Testing: 14 integration tests passing against the OCI Generative AI service. Tested with Command R, Command A, and all embed v3 models. oci SDK is an optional dependency — no impact on existing users.

This would bring OCI to parity with the existing Bedrock integration and benefit enterprise customers running Cohere models on Oracle Cloud. Happy to address any feedback.

Thank you.

@fede-kamel
Copy link
Author

@sanderland Thanks for the approvals on this PR and the others (#717, #698, #697)! What are the next steps to get these merged?

@fede-kamel
Copy link
Author

@sanderland quick ping on this one since you approved earlier - could you please take a final look when you have a moment? Thanks!

@fede-kamel
Copy link
Author

@sanderland @billytrend-cohere quick follow-up on this OCI PR. It currently has approval and is rebased on latest main as of March 3, 2026. Could you share what remaining blocker(s) or required changes are needed to merge? If it helps, I can split this into smaller PRs (auth/client scaffolding first, then chat/embed/streaming) to speed up review.

@fede-kamel
Copy link
Author

@sanderland quick follow-up on this OCI PR.

This PR is important for Oracle + Cohere users because it adds first-class support for running Cohere models through Oracle Cloud Infrastructure (OCI) Generative AI while keeping the same cohere-python developer experience.

Why this is relevant:

  • It enables Cohere SDK users in Oracle environments to use Cohere models without building custom OCI adapters.
  • It aligns with the existing provider-extension pattern in this repo (similar to Bedrock), so it fits the SDK architecture rather than adding a one-off path.
  • It covers core enterprise workflows: auth options, embed, chat, and streaming.
  • It keeps oci as an optional dependency, so there is no impact on non-OCI users.
  • It is directly relevant to Oracle deployments where teams standardize on OCI but still want official Cohere SDK ergonomics and compatibility.

Current status:

  • Approved
  • Rebased on latest main as of March 3, 2026
  • Integration-tested against OCI Generative AI service

Could you share the specific remaining blocker(s) or required changes to merge? If review scope is the issue, I can split this into smaller PRs (auth/client scaffolding first, then chat/embed/streaming) to accelerate.

@fede-kamel
Copy link
Author

@billytrend-cohere looping you in as well on the OCI PR context above. If there are specific blockers or changes you want before merge, I can address them quickly or split this into smaller PRs to make review easier.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

…issues

- Fix OCI pip extras installation by moving from poetry groups to extras
  - Changed [tool.poetry.group.oci] to [tool.poetry.extras]
  - This enables 'pip install cohere[oci]' to work correctly

- Fix streaming to stop properly after [DONE] signal
  - Changed 'break' to 'return' in transform_oci_stream_wrapper
  - Prevents continued chunk processing after stream completion
- Add support for OCI profiles using security_token_file
- Load private key properly using oci.signer.load_private_key_from_file
- Use SecurityTokenSigner for session-based authentication
- This enables use of OCI CLI session tokens for authentication
This commit addresses all copilot feedback and fixes V2 API support:

1. Fixed V2 embed response format
   - V2 expects embeddings as dict with type keys (float, int8, etc.)
   - Added is_v2_client parameter to properly detect V2 mode
   - Updated transform_oci_response_to_cohere to preserve dict structure for V2

2. Fixed V2 streaming format
   - V2 SDK expects SSE format with "data: " prefix and double newline
   - Fixed text extraction from OCI V2 events (nested in message.content[0].text)
   - Added proper content-delta and content-end event types for V2
   - Updated transform_oci_stream_wrapper to output correct format based on is_v2

3. Fixed stream [DONE] signal handling
   - Changed from break to return to stop generator completely
   - Prevents further chunk processing after [DONE]

4. Added skip decorators with clear explanations
   - OCI on-demand models don't support multiple embedding types
   - OCI TEXT_GENERATION models require fine-tuning (not available on-demand)
   - OCI TEXT_RERANK models require fine-tuning (not available on-demand)

5. Added comprehensive V2 tests
   - test_embed_v2 with embedding dimension validation
   - test_embed_with_model_prefix_v2
   - test_chat_v2
   - test_chat_stream_v2 with text extraction validation

All 17 tests now pass with 7 properly documented skips.
- Add comprehensive limitations section to README explaining what's available
  on OCI on-demand inference vs. what requires fine-tuning
- Improve OciClient and OciClientV2 docstrings with:
  - Clear list of supported APIs
  - Notes about generate/rerank limitations
  - V2-specific examples showing dict-based embedding responses
- Add checkmarks and clear categorization of available vs. unavailable features
- Link to official OCI Generative AI documentation for latest model info
…sion

This commit fixes two issues identified in PR review:

1. V2 response detection overriding passed parameter
   - Previously: transform_oci_response_to_cohere() would re-detect V2 from
     OCI response apiFormat field, overriding the is_v2 parameter
   - Now: Uses the is_v2 parameter passed in (determined from client type)
   - Why: The client type (OciClient vs OciClientV2) already determines the
     API version, and re-detecting can cause inconsistency

2. Security token file path not expanded before opening
   - Previously: Paths like ~/.oci/token would fail because Python's open()
     doesn't expand tilde (~) characters
   - Now: Uses os.path.expanduser() to expand ~ to user's home directory
   - Why: OCI config files commonly use ~ notation for paths

Both fixes maintain backward compatibility and all 17 tests continue to pass.
- Fix authentication priority to prefer API key auth over session-based
- Transform V2 content list items type field to uppercase for OCI format
- Remove debug logging statements

All tests passing (17 passed, 7 skipped as expected)
Support the thinking/reasoning feature for command-a-reasoning-08-2025
on OCI. Transforms Cohere's thinking parameter (type, token_budget) to
OCI format and handles thinking content in both non-streaming and
streaming responses.
- Remove unused response_mapping and stream_response_mapping dicts
- Remove unused transform_oci_stream_response function
- Remove unused imports (EmbedResponse, Generation, etc.)
- Fix crash when thinking parameter is explicitly None
- Fix V2 chat response role not lowercased (ASSISTANT -> assistant)
- Fix V2 finish_reason incorrectly lowercased (should stay uppercase)
- Add unit tests for thinking=None, role lowercase, and finish_reason
- Fix thinking token_budget → tokenBudget (camelCase for OCI API)
- Add V2 response toolCalls → tool_calls conversion for SDK compatibility
- Update test for tokenBudget casing
- Add test for tool_calls conversion
OCI doesn't provide a generation ID in responses. Previously used modelId
which is the model name (e.g. 'cohere.command-r-08-2024'), not a unique
generation identifier. Now generates a proper UUID.
- Add validation for direct credentials (user_id requires fingerprint and tenancy_id)
- Emit message-end event for V2 streaming before [DONE]
@fede-kamel
Copy link
Author

fede-kamel commented Mar 15, 2026

Validation on current PR head 687ef1e8 is complete.

OCI test results from this branch:

  • local OCI suite: PYTHONPATH=src python -m pytest tests/test_oci_client.py -> 29 passed, 26 skipped
  • live OCI in us-chicago-1: 46 passed, 9 skipped
  • live OCI in eu-frankfurt-1: 45 passed, 10 skipped

Models exercised in the passing live runs:

  • embed-english-v3.0
  • embed-multilingual-v3.0
  • embed-english-light-v3.0 in us-chicago-1
  • command-r-08-2024
  • command-a-03-2025

Expected skips in live OCI remain limited to service/model availability constraints:

  • command-a-reasoning-08-2025 availability depends on region/service support
  • rerank on OCI on-demand is not available in the way these tests expect
  • embed-english-light-v3.0 is not available in eu-frankfurt-1 for the tested configuration

So on this PR head, all runnable OCI tests pass, and the remaining skips are expected OCI capability or region-availability gaps rather than code failures.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

@fede-kamel
Copy link
Author

Validation update from /Users/federico.kamelhar/Projects/cohere-python:

  • PYTHONPATH=src python -m pytest tests/test_oci_client.py -q -> 31 passed, 16 skipped
  • TEST_OCI=1 ... PYTHONPATH=src python -m pytest tests/test_oci_client.py -q -> 47 passed in one OCI region
  • TEST_OCI=1 ... PYTHONPATH=src python -m pytest tests/test_oci_client.py -q -> 47 passed in a second OCI region

The remaining 16 skipped in the local run are only the live OCI classes gated on TEST_OCI; once enabled, the supported OCI suite is fully green.

Live models exercised in the passing runs:

  • embed-english-v3.0
  • embed-multilingual-v3.0
  • command-r-08-2024
  • command-a-03-2025

The OCI test file was also trimmed to supported scenarios only, so the live runs no longer depend on permanently skipped coverage for unsupported on-demand generation/rerank or region-specific model availability.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

"embeddings": embeddings,
"texts": [], # OCI doesn't return texts
"meta": meta,
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing response_type discriminant in embed response

Medium Severity

The embed response dict is missing the response_type field required by the EmbedResponse discriminated union. EmbedResponse is an Annotated union with UnionMetadata(discriminant="response_type"), discriminating between EmbeddingsFloatsEmbedResponse (expects "embeddings_floats") and EmbeddingsByTypeEmbedResponse (expects "embeddings_by_type"). Without it, discriminated union resolution in _convert_union_type fails and falls through to undiscriminated matching, which is fragile and could break if the SDK's internal resolution logic changes.

Fix in Cursor Fix in Web

transformed_content.append(item)
oci_msg["content"] = transformed_content
else:
oci_msg["content"] = msg.get("content", [])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Content None not defaulting to empty array

Low Severity

When a V2 chat message has content explicitly set to None (e.g., an assistant message with only tool calls), msg.get("content", []) returns None rather than [] because dict.get only uses the default when the key is absent, not when its value is None. This sends "content": null to OCI instead of an empty array, which may be rejected by the OCI API.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants