Skip to content

Jonnymcc/websockets_tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

WebSockets: From Polling to Push

A hands-on exploration of real-time communication in Python — starting from scratch with HTTP polling, identifying its costs, and migrating step-by-step to WebSockets.


What We Wanted to Learn

The goal was simple: understand WebSockets not as an abstract concept, but by feeling the problem they solve. Rather than jumping straight to a WebSocket implementation, we built a working chat system using HTTP polling first, instrumented it to measure its behaviour under load, then migrated it to WebSockets and compared the results directly.

By the end we had two functionally identical chat systems — same clients, same server role, same message workload — and concrete numbers showing what changed between them.


WebSockets: A Different Protocol

HTTP is a request/response protocol. A client opens a connection, sends a request, the server sends a response, and the connection closes. The server has no way to reach out to a client unprompted — it can only respond to requests that come in.

WebSockets change that model fundamentally. A WebSocket connection starts as an HTTP request — the client sends an Upgrade: websocket header — but once the server agrees, the protocol switches. What was a short-lived HTTP exchange becomes a persistent, full-duplex TCP connection: both sides can send data at any time, independently, without waiting for the other.

HTTP (request/response)          WebSocket (persistent, full-duplex)

Client      Server               Client         Server
  |--GET------>|                   |--Upgrade----->|
  |<--200------|                   |<--101---------|   (handshake)
  |  (closed)  |                   |               |
  |--GET------>|                   |<==data========|   server pushes
  |<--200------|                   |===data=======>|   client sends
  |  (closed)  |                   |<==data========|   server pushes again
                                   |    (open indefinitely)

This distinction matters because it changes who drives communication. In HTTP, the client always initiates. In WebSockets, either side can. That's the capability that makes real-time applications practical.


Part 1: HTTP Polling

How It Works

Without WebSockets, the standard approach to "near real-time" updates is polling: the client asks the server on a fixed interval — "anything new?" — and the server answers. To simulate a chat room, each client:

  1. Sends a GET /messages?since=<last_id> every 100ms
  2. Records the highest message ID it has seen
  3. Posts a new message via POST /messages every ~2 seconds

The server maintains a flat list of messages in memory. Every GET returns any messages newer than the since ID. Every POST appends to the list.

The Code

polling/
├── server.py   — ThreadingHTTPServer, message list, stats reporter
└── client.py   — polling loop, scripted chat, latency tracking

Two clients chatting via the polling server look like this in the server log:

[server] Alice: Hey, how's it going?
[server] Bob: Did you see the game last night?
[server] Alice: This polling thing feels a bit wasteful, right?
...

[stats] --- 5s window ---
[stats]  Total connections opened : 102
[stats]  Connection rate          : 20.4/s
[stats]  Polls (GET requests)     : 100
[stats]  Polls that were empty    : 96 (96% wasted)
[stats]  Messages posted          : 2
[stats]  Message rate             : 0.40 msg/s

The Problems

Every poll is a new TCP connection. HTTP is stateless by design, so each GET request requires its own connection: TCP handshake, HTTP headers sent and received, response written, connection torn down. With two clients polling at 100ms, that's ~20 new connections per second before a single message has been sent.

Most polls find nothing. Messages arrive infrequently relative to how often clients poll. In a quiet room, 90–96% of polls return an empty []. The client and server both did real work — CPU, memory, a syscall, a file descriptor — for no informational value.

Latency is bounded by the interval. A message posted at t=0ms won't be seen by another client until their next poll fires. With a 100ms interval, the average wait is 50ms and the worst case is 100ms. Reducing the interval reduces latency but multiplies the connection rate.

These costs compound. Halving the interval doubles the connection rate. Doubling the number of clients doubles it again. The server's work grows with clients × (1 / poll_interval) regardless of how much actual communication is happening.


Part 2: WebSockets

How It Works

The WebSocket server accepts a persistent connection from each client. Once connected:

  • The server holds a registry of all connected clients (clients = set())
  • When any client sends a message, the server broadcasts it to everyone instantly
  • No client ever asks "anything new?" — the server just pushes

Each client runs two coroutines concurrently over the same connection:

await asyncio.gather(
    receive_messages(websocket, name),  # blocks until server pushes something
    send_messages(websocket, name),     # sends on a timer
)

The async for raw in websocket in receive_messages is the key change. Instead of waking up every 100ms and making a network request, the coroutine simply suspends itself. The asyncio event loop runs other work. When the server pushes a frame, the OS delivers it, asyncio wakes the coroutine, and it processes the message — all without the client having asked for it.

The Code

websockets/
├── server.py   — websockets.serve(), client registry, asyncio stats reporter
└── client.py   — persistent connection, recv + send coroutines, reconnect backoff

What Changed and Why

HTTP Polling WebSockets
Connection model New connection per request One connection per client, kept open
Who initiates delivery Client (pull) Server (push)
Latency Up to POLL_INTERVAL ~RTT (message delivered immediately)
Wasted work ~90–95% of requests return nothing None — server only acts on real events
Concurrency model threading (one thread per request) asyncio (one event loop, many coroutines)
Send + receive Two separate HTTP endpoints Both directions on the same socket

The server also became simpler in one respect: it no longer needs to store messages. In polling, the message list existed so clients could catch up on what they missed between polls. With push delivery, a message is broadcast the instant it arrives — there's nothing to catch up on. (Message history for late joiners is a separate concern, handled by a database layer, not the transport.)

Disconnect Handling

Persistent connections introduce a lifecycle the polling model doesn't have: a connection can drop at any time and must be handled explicitly.

The server distinguishes two cases:

  • ConnectionClosedOK — the client sent a proper WebSocket close frame (clean shutdown, e.g. Ctrl+C). Expected.
  • ConnectionClosedError — the connection dropped without warning (process killed, network loss). Something went wrong.

The client uses exponential backoff to reconnect:

Server unreachable (OSError):     1s → 2s → 4s → 8s → ... → 30s (cap)
Connection dropped (ClosedError): reset to 1s — server exists, retry quickly

Part 3: Benchmark Results

Methodology

Both benchmarks run entirely in-process — the server starts in a background thread (polling) or asyncio task (WebSockets), then N clients run concurrently for 10 seconds. Each client sends one message every ~2 seconds. Three tiers were tested: 5, 50, and 100 clients.

Latency is measured end-to-end: a sent_at timestamp is embedded in every outgoing message and compared against the wall clock when the message is received.

Polling Results

Clients    Connections    Conn/s    Wasted%    Avg Latency
5                  494      49.4        83%         64.11ms
50               4,823     482.3        24%         59.61ms
100              9,008     900.8        18%         63.58ms

A few things stand out:

Connections grow linearly with clients. 100 clients at 100ms produces ~900 new TCP connections per second. This number grows without bound as you add clients or reduce the poll interval.

Wasted% drops as clients increase. With 5 clients sending every 2 seconds, messages are rare — 83% of polls find nothing. With 100 clients, messages arrive more frequently so fewer polls are empty. The waste is traffic-dependent, not fixed.

Latency clusters around the theoretical average. With a 100ms poll interval the expected average wait is 50ms (a message posted at a random moment waits on average half a cycle). The observed ~60–64ms is consistent with that, plus a small amount of HTTP overhead. The floor is fixed by the interval regardless of load.

WebSocket Results

Clients    Connections    Conn/s    Avg Latency
5                    5      0.50        0.86ms
50                  50      5.00        4.73ms
100                100     10.00        9.55ms

Connections equal the client count and never grow. After the initial handshake, conn/s drops to zero. The server does no connection-related work for the remaining 9.9 seconds of the benchmark.

Latency is 7–74× lower. At 100 clients: 9.55ms vs 63.58ms. At 5 clients the gap is widest — 0.86ms vs 64ms — because polling's floor is pinned to the interval while WebSocket latency at low fan-out is essentially just loopback RTT. Over a real network the polling number would be worse (each poll adds a full HTTP round-trip on top of the interval) while the WebSocket number would grow modestly with RTT.

WebSocket latency grows with client count. 0.86ms → 9.55ms as clients increase from 5 to 100. This is broadcast overhead: websockets.broadcast() must write a frame to every connected socket in a single event loop pass. In production this is managed by scoping broadcasts to rooms or channels rather than the full client set.

Side-by-Side

                    POLLING                  WEBSOCKET
               5     50    100          5     50    100
Connections  494   4823   9008          5     50    100
Conn/s      49.4  482.3  900.8        0.50   5.00  10.00  *
Wasted%      83%   24%    18%          0%     0%     0%
Avg Latency 64ms   60ms   64ms       0.9ms  4.7ms  9.6ms

* WebSocket conn/s reflects only the initial handshake — after startup it is 0.

Reading the Charts

Each benchmark produces one PNG per tier. Each chart has three panels:

  • Total Connections — polling climbs linearly for the full duration; WebSocket reaches N at second 1 and flatlines.
  • Conn/s — polling holds a constant rate throughout; WebSocket shows a single spike at startup then drops to zero.
  • Avg Latency — the y-axis scales tell the story. Polling sits in the 60–65ms range; WebSocket stays well under 10ms. The lines look visually similar — both roughly flat — but you're comparing different axes.

Key Takeaways

Polling works. It's just expensive. For very low client counts and high poll intervals it's entirely reasonable. Its simplicity — standard HTTP, no special libraries, works with any client — is a genuine advantage.

WebSockets eliminate connection churn. The single most impactful change is not latency but the complete removal of repeated connection overhead. A server handling 1,000 polling clients at 100ms is fielding 10,000 HTTP requests per second before any real work is done. The same workload over WebSockets is 1,000 persistent connections and nothing else.

Push inverts the cost model. In polling, the server's work scales with clients × poll_rate. In WebSockets, it scales with message_rate × clients_per_room. Quiet rooms are essentially free. The server only works when something actually happens.

WebSockets are a transport, not a full solution. They solve delivery. Message history, authentication, presence, rooms, and reconnect state are all separate concerns that sit above the WebSocket layer.

About

You will learn websockets after reading this.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages