A hands-on exploration of real-time communication in Python — starting from scratch with HTTP polling, identifying its costs, and migrating step-by-step to WebSockets.
The goal was simple: understand WebSockets not as an abstract concept, but by feeling the problem they solve. Rather than jumping straight to a WebSocket implementation, we built a working chat system using HTTP polling first, instrumented it to measure its behaviour under load, then migrated it to WebSockets and compared the results directly.
By the end we had two functionally identical chat systems — same clients, same server role, same message workload — and concrete numbers showing what changed between them.
HTTP is a request/response protocol. A client opens a connection, sends a request, the server sends a response, and the connection closes. The server has no way to reach out to a client unprompted — it can only respond to requests that come in.
WebSockets change that model fundamentally. A WebSocket connection starts as an HTTP
request — the client sends an Upgrade: websocket header — but once the server
agrees, the protocol switches. What was a short-lived HTTP exchange becomes a
persistent, full-duplex TCP connection: both sides can send data at any time,
independently, without waiting for the other.
HTTP (request/response) WebSocket (persistent, full-duplex)
Client Server Client Server
|--GET------>| |--Upgrade----->|
|<--200------| |<--101---------| (handshake)
| (closed) | | |
|--GET------>| |<==data========| server pushes
|<--200------| |===data=======>| client sends
| (closed) | |<==data========| server pushes again
| (open indefinitely)
This distinction matters because it changes who drives communication. In HTTP, the client always initiates. In WebSockets, either side can. That's the capability that makes real-time applications practical.
Without WebSockets, the standard approach to "near real-time" updates is polling: the client asks the server on a fixed interval — "anything new?" — and the server answers. To simulate a chat room, each client:
- Sends a
GET /messages?since=<last_id>every 100ms - Records the highest message ID it has seen
- Posts a new message via
POST /messagesevery ~2 seconds
The server maintains a flat list of messages in memory. Every GET returns any messages
newer than the since ID. Every POST appends to the list.
polling/
├── server.py — ThreadingHTTPServer, message list, stats reporter
└── client.py — polling loop, scripted chat, latency tracking
Two clients chatting via the polling server look like this in the server log:
[server] Alice: Hey, how's it going?
[server] Bob: Did you see the game last night?
[server] Alice: This polling thing feels a bit wasteful, right?
...
[stats] --- 5s window ---
[stats] Total connections opened : 102
[stats] Connection rate : 20.4/s
[stats] Polls (GET requests) : 100
[stats] Polls that were empty : 96 (96% wasted)
[stats] Messages posted : 2
[stats] Message rate : 0.40 msg/s
Every poll is a new TCP connection. HTTP is stateless by design, so each GET request requires its own connection: TCP handshake, HTTP headers sent and received, response written, connection torn down. With two clients polling at 100ms, that's ~20 new connections per second before a single message has been sent.
Most polls find nothing. Messages arrive infrequently relative to how often
clients poll. In a quiet room, 90–96% of polls return an empty []. The client and
server both did real work — CPU, memory, a syscall, a file descriptor — for no
informational value.
Latency is bounded by the interval. A message posted at t=0ms won't be seen by another client until their next poll fires. With a 100ms interval, the average wait is 50ms and the worst case is 100ms. Reducing the interval reduces latency but multiplies the connection rate.
These costs compound. Halving the interval doubles the connection rate.
Doubling the number of clients doubles it again. The server's work grows with
clients × (1 / poll_interval) regardless of how much actual communication
is happening.
The WebSocket server accepts a persistent connection from each client. Once connected:
- The server holds a registry of all connected clients (
clients = set()) - When any client sends a message, the server broadcasts it to everyone instantly
- No client ever asks "anything new?" — the server just pushes
Each client runs two coroutines concurrently over the same connection:
await asyncio.gather(
receive_messages(websocket, name), # blocks until server pushes something
send_messages(websocket, name), # sends on a timer
)The async for raw in websocket in receive_messages is the key change. Instead of
waking up every 100ms and making a network request, the coroutine simply suspends
itself. The asyncio event loop runs other work. When the server pushes a frame, the
OS delivers it, asyncio wakes the coroutine, and it processes the message — all
without the client having asked for it.
websockets/
├── server.py — websockets.serve(), client registry, asyncio stats reporter
└── client.py — persistent connection, recv + send coroutines, reconnect backoff
| HTTP Polling | WebSockets | |
|---|---|---|
| Connection model | New connection per request | One connection per client, kept open |
| Who initiates delivery | Client (pull) | Server (push) |
| Latency | Up to POLL_INTERVAL |
~RTT (message delivered immediately) |
| Wasted work | ~90–95% of requests return nothing | None — server only acts on real events |
| Concurrency model | threading (one thread per request) |
asyncio (one event loop, many coroutines) |
| Send + receive | Two separate HTTP endpoints | Both directions on the same socket |
The server also became simpler in one respect: it no longer needs to store messages. In polling, the message list existed so clients could catch up on what they missed between polls. With push delivery, a message is broadcast the instant it arrives — there's nothing to catch up on. (Message history for late joiners is a separate concern, handled by a database layer, not the transport.)
Persistent connections introduce a lifecycle the polling model doesn't have: a connection can drop at any time and must be handled explicitly.
The server distinguishes two cases:
ConnectionClosedOK— the client sent a proper WebSocket close frame (clean shutdown, e.g. Ctrl+C). Expected.ConnectionClosedError— the connection dropped without warning (process killed, network loss). Something went wrong.
The client uses exponential backoff to reconnect:
Server unreachable (OSError): 1s → 2s → 4s → 8s → ... → 30s (cap)
Connection dropped (ClosedError): reset to 1s — server exists, retry quickly
Both benchmarks run entirely in-process — the server starts in a background thread (polling) or asyncio task (WebSockets), then N clients run concurrently for 10 seconds. Each client sends one message every ~2 seconds. Three tiers were tested: 5, 50, and 100 clients.
Latency is measured end-to-end: a sent_at timestamp is embedded in every outgoing
message and compared against the wall clock when the message is received.
Clients Connections Conn/s Wasted% Avg Latency
5 494 49.4 83% 64.11ms
50 4,823 482.3 24% 59.61ms
100 9,008 900.8 18% 63.58ms
A few things stand out:
Connections grow linearly with clients. 100 clients at 100ms produces ~900 new TCP connections per second. This number grows without bound as you add clients or reduce the poll interval.
Wasted% drops as clients increase. With 5 clients sending every 2 seconds, messages are rare — 83% of polls find nothing. With 100 clients, messages arrive more frequently so fewer polls are empty. The waste is traffic-dependent, not fixed.
Latency clusters around the theoretical average. With a 100ms poll interval the expected average wait is 50ms (a message posted at a random moment waits on average half a cycle). The observed ~60–64ms is consistent with that, plus a small amount of HTTP overhead. The floor is fixed by the interval regardless of load.
Clients Connections Conn/s Avg Latency
5 5 0.50 0.86ms
50 50 5.00 4.73ms
100 100 10.00 9.55ms
Connections equal the client count and never grow. After the initial handshake,
conn/s drops to zero. The server does no connection-related work for the remaining
9.9 seconds of the benchmark.
Latency is 7–74× lower. At 100 clients: 9.55ms vs 63.58ms. At 5 clients the gap is widest — 0.86ms vs 64ms — because polling's floor is pinned to the interval while WebSocket latency at low fan-out is essentially just loopback RTT. Over a real network the polling number would be worse (each poll adds a full HTTP round-trip on top of the interval) while the WebSocket number would grow modestly with RTT.
WebSocket latency grows with client count. 0.86ms → 9.55ms as clients increase
from 5 to 100. This is broadcast overhead: websockets.broadcast() must write a
frame to every connected socket in a single event loop pass. In production this is
managed by scoping broadcasts to rooms or channels rather than the full client set.
POLLING WEBSOCKET
5 50 100 5 50 100
Connections 494 4823 9008 5 50 100
Conn/s 49.4 482.3 900.8 0.50 5.00 10.00 *
Wasted% 83% 24% 18% 0% 0% 0%
Avg Latency 64ms 60ms 64ms 0.9ms 4.7ms 9.6ms
* WebSocket conn/s reflects only the initial handshake — after startup it is 0.
Each benchmark produces one PNG per tier. Each chart has three panels:
- Total Connections — polling climbs linearly for the full duration; WebSocket reaches N at second 1 and flatlines.
- Conn/s — polling holds a constant rate throughout; WebSocket shows a single spike at startup then drops to zero.
- Avg Latency — the y-axis scales tell the story. Polling sits in the 60–65ms range; WebSocket stays well under 10ms. The lines look visually similar — both roughly flat — but you're comparing different axes.
Polling works. It's just expensive. For very low client counts and high poll intervals it's entirely reasonable. Its simplicity — standard HTTP, no special libraries, works with any client — is a genuine advantage.
WebSockets eliminate connection churn. The single most impactful change is not latency but the complete removal of repeated connection overhead. A server handling 1,000 polling clients at 100ms is fielding 10,000 HTTP requests per second before any real work is done. The same workload over WebSockets is 1,000 persistent connections and nothing else.
Push inverts the cost model. In polling, the server's work scales with
clients × poll_rate. In WebSockets, it scales with message_rate × clients_per_room.
Quiet rooms are essentially free. The server only works when something actually happens.
WebSockets are a transport, not a full solution. They solve delivery. Message history, authentication, presence, rooms, and reconnect state are all separate concerns that sit above the WebSocket layer.