Skip to content

fix: eliminate test port allocation race by running uvicorn in-thread#2263

Draft
maxisbey wants to merge 1 commit intomainfrom
max-158-port-allocation
Draft

fix: eliminate test port allocation race by running uvicorn in-thread#2263
maxisbey wants to merge 1 commit intomainfrom
max-158-port-allocation

Conversation

@maxisbey
Copy link
Contributor

Summary

Replaces the subprocess + port-race test fixture pattern with an in-thread uvicorn helper that uses port=0, eliminating the TOCTOU race that caused intermittent CI failures under pytest-xdist.

The Problem

Nine locations across seven test files used this pattern:

@pytest.fixture
def server_port() -> int:
    with socket.socket() as s:
        s.bind(("127.0.0.1", 0))
        return s.getsockname()[1]  # ← Port released! Race window opens

@pytest.fixture
def server(server_port: int):
    proc = multiprocessing.Process(target=run_server, kwargs={"port": server_port})
    proc.start()
    wait_for_server(server_port)  # ← May see WRONG server
    ...

Between getsockname() returning and the subprocess rebinding, another pytest-xdist worker can claim the port. This caused:

  • httpx.ConnectError: All connection attempts failed (job 63508538456)
  • 404 Not Found when connecting to the wrong server (#1573, #1776)
  • 403 WS rejection when the port was bound by a non-WS server (job 63414508256)

PR #1528 attempted worker-specific port ranges but was closed as only reducing (not eliminating) collision probability.

The Fix

tests/test_helpers.py now provides run_uvicorn_in_thread(app) — a context manager that:

  1. Starts uvicorn in a background thread with port=0
  2. Waits for server.started=True (set at the end of Server.startup(), after sockets are bound and lifespan has started)
  3. Reads the actual port from server.servers[0].sockets[0].getsockname()
  4. Yields the URL; on exit, sets should_exit=True and joins the thread

This is race-free: the OS atomically assigns the port at bind time and the server holds it from that moment until shutdown. No TOCTOU window.

All seven test files were migrated to this pattern. The *_port fixtures, run_server() subprocess entrypoints, and multiprocessing.Process + proc.kill() boilerplate are gone. wait_for_server() now has zero callers.

Bonus: SSE Stream Leak

Running the server in-process surfaced a pre-existing resource leak in SseServerTransport.connect_sse()sse_stream_reader was never closed because sse_starlette.EventSourceResponse only closes its body iterator on SendTimeout, not on normal completion or cancellation. Previously invisible because proc.kill() never runs destructors.

Fixed by moving the cleanup into a finally block and adding an explicit sse_stream_reader.aclose().

Results

Metric Value
Tests 1136 pass, 100% coverage
Pyright 0 errors, 0 warnings
LOC −407 net
test_sse.py runtime ~8s → 2.6s

AI Disclaimer

The previous pattern picked a free port via socket.bind(0), released it,
then started a uvicorn subprocess hoping to rebind — a TOCTOU race that
caused intermittent CI failures when pytest-xdist workers stole the port
between release and rebind (connection errors, 404s against wrong server,
WS 403s).

Replaced with a context manager that runs uvicorn in a background thread
with port=0 and reads the actual bound port back from the server's socket
after startup. The OS atomically assigns the port at bind time and the
server holds it until shutdown — no race window.

Also fixed a latent stream leak in SseServerTransport.connect_sse() that
the in-process server surfaced: sse_stream_reader was never closed on
normal completion or cancellation (sse_starlette only closes it on
SendTimeout). Cleanup now runs in a finally block.

Side benefits: no more subprocess spawn overhead, faster startup, shared
process state when needed, and wait_for_server() is no longer used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant