Refetch ssl from registry by dixitaniket · Pull Request #204 · OpenGradient/OpenGradient-SDK

dixitaniket · 2026-03-25T14:10:26Z

bumps opg approval to 1 in examples
ssl refresh on errors (due to tee updates)

Copilot

Pull request overview

This PR adds an SSL-failure recovery path to the LLM client by refreshing TEE registry metadata/TLS configuration and retrying the request once when SSL-related errors occur.

Changes:

Refactors TEE/TLS initialization into refreshable helpers and introduces a one-time SSL retry mechanism for non-streaming requests.
Adds SSL-aware retry behavior for streaming chat setup (retry only if the stream hasn’t yielded any chunks yet).
Updates example scripts to request a larger default OPG approval amount.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
`src/opengradient/client/llm.py`	Adds TEE/TLS refresh + HTTP client rebuild and uses it to retry once on SSL-related failures for chat/completion (and stream setup).
`examples/llm_chat_streaming.py`	Increases example Permit2 approval amount.
`examples/llm_chat.py`	Increases example Permit2 approval amount.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-25T14:14:14Z

examples/llm_chat_streaming.py

 async def main():
    llm = og.LLM(private_key=os.environ.get("OG_PRIVATE_KEY"))
-    llm.ensure_opg_approval(opg_amount=0.1)
+    llm.ensure_opg_approval(opg_amount=1)


This example increases the default approval amount from 0.1 to 1 OPG, which may be an unexpected on-chain spend for someone running the example. Consider keeping the approval amount minimal (or explaining the rationale in a comment) to avoid surprising users.

Copilot · 2026-03-25T14:14:15Z

examples/llm_chat.py

 async def main():
    llm = og.LLM(private_key=os.environ.get("OG_PRIVATE_KEY"))
-    llm.ensure_opg_approval(opg_amount=0.1)
+    llm.ensure_opg_approval(opg_amount=1)


This example increases the default approval amount from 0.1 to 1 OPG, which may be an unexpected on-chain spend for someone running the example. Consider keeping the approval amount minimal (or explaining the rationale in a comment) to avoid surprising users.

Copilot · 2026-03-25T14:14:15Z

src/opengradient/client/llm.py

+        self._tee_registry_address = tee_registry_address
+        self._llm_server_url = llm_server_url
+        self._reset_lock = threading.Lock()
+
+        self._refresh_tee_config()
+        self._init_x402_stack()


Using a blocking threading.Lock inside async methods can stall the event loop if contention occurs (e.g., multiple tasks hit the SSL-retry path). Consider using an asyncio-compatible lock (asyncio.Lock/anyio.Lock) or otherwise ensuring the critical section cannot block the loop.

Copilot · 2026-03-25T14:14:15Z

src/opengradient/client/llm.py

+            old_http_client = self._http_client
+            self._refresh_tee_config()
+            self._init_x402_stack()
+
+        try:
+            await old_http_client.aclose()
+        except Exception:
+            logger.debug("Failed to close previous HTTP client during TEE refresh.", exc_info=True)
+


_refresh_tee_and_reset() swaps out self._http_client and then closes the previous client. If other coroutines are concurrently using the old client (especially streaming), closing it here can terminate in-flight requests unexpectedly. Consider avoiding closing a shared client while it may be in use (e.g., serialize all requests during reset, use per-request clients, or defer cleanup until you can guarantee no active users).

Suggested change

old_http_client = self._http_client

self._refresh_tee_config()

self._init_x402_stack()

try:

await old_http_client.aclose()

except Exception:

logger.debug("Failed to close previous HTTP client during TEE refresh.", exc_info=True)

self._refresh_tee_config()

self._init_x402_stack()

Copilot · 2026-03-25T14:14:16Z

src/opengradient/client/llm.py

+    async def _call_with_ssl_retry(
+        self,
+        operation_name: str,
+        call: Callable[[], Awaitable[T]],
+    ) -> T:
+        """Retry once with fresh TEE/TLS state when the failure is SSL-related."""
+        try:
+            return await call()
+        except Exception as exc:
+            if not self._has_ssl_cause(exc):
+                raise
+            logger.warning(
+                "SSL failure during %s; refreshing TEE and retrying once: %s",
+                operation_name,
+                exc,
+            )
+            await self._refresh_tee_and_reset()
+            return await call()


The new SSL-retry/reset behavior isn’t covered by unit tests. Adding tests that simulate an SSL-related exception on the first request and assert that the TEE config is refreshed and the call is retried (including the streaming path) would help prevent regressions.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/llm_test.py

src/opengradient/client/llm.py

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/llm_test.py

src/opengradient/client/llm.py

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-25T16:55:12Z

src/opengradient/client/llm.py

+            old_http_client = self._http_client
+            self._refresh_tee_config()
+            self._http_client = x402HttpxClient(self._x402_client, verify=self._tls_verify)
+
+        try:
+            await old_http_client.aclose()
+        except Exception:
+            logger.debug("Failed to close previous HTTP client during TEE refresh.", exc_info=True)
+


_refresh_tee_and_reset() swaps the shared HTTP client and then closes the previous client. If the same LLM instance is used concurrently, other in-flight requests may still be using the old client instance and can fail when it gets closed. Consider preventing refresh while requests are in flight (e.g., by coordinating refresh with request execution) or delaying closure until it’s safe.

Suggested change

old_http_client = self._http_client

self._refresh_tee_config()

self._http_client = x402HttpxClient(self._x402_client, verify=self._tls_verify)

try:

await old_http_client.aclose()

except Exception:

logger.debug("Failed to close previous HTTP client during TEE refresh.", exc_info=True)

self._refresh_tee_config()

self._http_client = x402HttpxClient(self._x402_client, verify=self._tls_verify)

Copilot · 2026-03-25T16:55:13Z

examples/llm_chat.py

    llm = og.LLM(private_key=os.environ.get("OG_PRIVATE_KEY"))
-    llm.ensure_opg_approval(opg_amount=0.1)
+    llm.ensure_opg_approval(opg_amount=1)



This example increases opg_amount from 0.1 to 1, which is unrelated to the PR’s stated goal (“ssl retry”) and materially changes the on-chain approval users will be prompted to grant. If this isn’t intentional for the SSL retry work, consider reverting or explaining the rationale in the PR description.

Copilot · 2026-03-25T16:55:13Z

examples/llm_chat_streaming.py

    llm = og.LLM(private_key=os.environ.get("OG_PRIVATE_KEY"))
-    llm.ensure_opg_approval(opg_amount=0.1)
+    llm.ensure_opg_approval(opg_amount=1)



This example increases opg_amount from 0.1 to 1, which is unrelated to the PR’s stated goal (“ssl retry”) and materially changes the on-chain approval users will be prompted to grant. If this isn’t intentional for the SSL retry work, consider reverting or explaining the rationale in the PR description.

Copilot · 2026-03-25T16:55:13Z

src/opengradient/client/llm.py

+        self._rpc_url = rpc_url
+        self._tee_registry_address = tee_registry_address
+        self._llm_server_url = llm_server_url
+        self._reset_lock = threading.Lock()
+


_refresh_tee_and_reset() uses a threading.Lock() inside async code. If two coroutines hit the refresh path concurrently, the second will block the event loop while waiting for the lock, potentially stalling unrelated tasks. Consider switching to an async-friendly lock (e.g., an asyncio.Lock) or using a non-blocking/lazily-created async lock so contention yields control back to the loop.

this is used when we refetch from tee registry

adambalogh · 2026-03-26T01:07:55Z

examples/llm_chat.py

 async def main():
    llm = og.LLM(private_key=os.environ.get("OG_PRIVATE_KEY"))
-    llm.ensure_opg_approval(opg_amount=0.1)
+    llm.ensure_opg_approval(opg_amount=1)


this will break most testers who can only get 0.1 OPG, do we need to bump it?

cleanup

f680d72

dixitaniket requested a review from Copilot March 25, 2026 14:10

Copilot started reviewing on behalf of dixitaniket March 25, 2026 14:11 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

tests

5e1e10a

dixitaniket requested a review from Copilot March 25, 2026 15:29

Copilot started reviewing on behalf of dixitaniket March 25, 2026 15:29 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

tests/llm_test.py Outdated Show resolved Hide resolved

src/opengradient/client/llm.py Outdated Show resolved Hide resolved

cleanup

1030563

dixitaniket requested a review from Copilot March 25, 2026 16:30

Copilot started reviewing on behalf of dixitaniket March 25, 2026 16:31 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

tests/llm_test.py Outdated Show resolved Hide resolved

tests/llm_test.py Show resolved Hide resolved

src/opengradient/client/llm.py Outdated Show resolved Hide resolved

cleanup

451ee69

dixitaniket requested a review from Copilot March 25, 2026 16:51

Copilot started reviewing on behalf of dixitaniket March 25, 2026 16:51 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

dixitaniket changed the title ~~[WIP] ssl retry~~ Refetch ssl from registry Mar 25, 2026

cleanup

e6b4491

adambalogh reviewed Mar 26, 2026

View reviewed changes

balogh.adam@icloud.com added 4 commits March 25, 2026 21:10

lower limits

a81269e

test

3bf9bbf

import

0eb13c8

move

2c974ea

adambalogh approved these changes Mar 26, 2026

View reviewed changes

adambalogh merged commit 6ffa8e2 into main Mar 26, 2026
7 checks passed

Conversation

dixitaniket commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

dixitaniket Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

adambalogh Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dixitaniket commented Mar 25, 2026 •

edited

Loading