A hands-on tutorial using a single Python "guinea pig" program to learn three debugging tools. ~15 minutes per tool.
A multithreaded Python HTTP server that:
- Serves JSON on port 8080 (
/,/fib,/crash,/info) - Runs a background worker thread (fibonacci + log writes every 5s)
- Writes to
app.log - Has an intentional bug on the
/crashendpoint (division by zero)
# Run locally (for pdb section)
python3 app.py
# Run in Docker (for strace/gdb sections)
docker compose up --build
# Test it
curl http://localhost:8080/
curl http://localhost:8080/infoRun app.py locally for this section. No Docker needed.
Add a breakpoint to the request handler. Edit app.py and add this line inside do_GET, right after count = ...:
def do_GET(self):
DebugHandler.request_count += 1
count = DebugHandler.request_count
breakpoint() # <-- add thisNow run:
python3 app.pyIn another terminal:
curl http://localhost:8080/The server will pause and drop you into the pdb prompt in the first terminal.
Use w (where) to see the call stack:
~/.pyenv/versions/3.11.4/lib/python3.11/socketserver.py(361)finish_request()
-> self.RequestHandlerClass(request, client_address, self)
~/.pyenv/versions/3.11.4/lib/python3.11/socketserver.py(755)__init__()
-> self.handle()
~/.pyenv/versions/3.11.4/lib/python3.11/http/server.py(436)handle()
-> self.handle_one_request()
~/.pyenv/versions/3.11.4/lib/python3.11/http/server.py(424)handle_one_request()
-> method()
> ~/strace_debug_tutorial/app.py(80)do_GET()
-> if self.path == "/"
Reading top to bottom (pdb shows oldest call first, newest last):
finish_request()creates a newDebugHandlerinstance for each request- The handler's
__init__()immediately callsself.handle()(all work happens inside the constructor) handle()callshandle_one_request()which parses the HTTP methodhandle_one_request()callsmethod()— which resolves to yourdo_GET()- You're now paused at the breakpoint inside
do_GET()
Once at the (Pdb) prompt, try:
| Command | What it does |
|---|---|
l |
List source code around current line |
n |
Execute next line (step over) |
s |
Step into function call |
c |
Continue execution |
w |
Show call stack (where) |
u / d |
Move up/down the call stack |
Try it: Step through the request handling with n until you see the response being built.
| Command | What it does |
|---|---|
p expr |
Print expression |
pp expr |
Pretty-print expression |
a |
Print args of current function |
pp locals() |
Print all local variables |
!var = val |
Modify a variable live |
Try it:
(Pdb) p self.path
'/'
(Pdb) p count
1
(Pdb) pp locals()
(Pdb) pp vars(self)
{'client_address': ('127.0.0.1', 64457),
'close_connection': True,
'command': 'GET',
'connection': <socket.socket fd=5, family=2, type=1, proto=0, laddr=('127.0.0.1', 8080), raddr=('127.0.0.1', 64457)>,
'headers': <http.client.HTTPMessage object at 0x10849d390>,
'path': '/',
'raw_requestline': b'GET / HTTP/1.1\r\n',
'request': <socket.socket fd=5, family=2, type=1, proto=0, laddr=('127.0.0.1', 8080), raddr=('127.0.0.1', 64457)>,
'request_version': 'HTTP/1.1',
'requestline': 'GET / HTTP/1.1',
'rfile': <_io.BufferedReader name=5>,
'server': <__main__.ThreadedTCPServer object at 0x103638950>,
'wfile': <socketserver._SocketWriter object at 0x108542b00>}
(Pdb) pp self.headers.items()
(Pdb) p threading.active_count()
Remove the breakpoint() line and instead run with -m pdb:
python3 -m pdb app.pyAt the pdb prompt, set a conditional breakpoint:
(Pdb) b app.py:76, self.path == '/crash'
(Pdb) c
Now requests to / and /fib will pass through, but /crash will pause:
curl http://localhost:8080/ # passes through
curl http://localhost:8080/crash # triggers breakpointThe /crash endpoint has a division-by-zero bug. Use pdb to find it:
- Set a breakpoint on the
/crashpath (line 76) - Step into the
compute_ratio()function withs - Inspect the arguments with
a - See why
b - aequals zero - Fix: Change the call or the function to handle
a == b
Instead of setting breakpoints, let the crash happen and debug after:
python3 -c "
import app
# Simulate the bug
try:
app.compute_ratio(42, 42)
except ZeroDivisionError:
import pdb; pdb.post_mortem()
"At the pdb prompt you'll be inside compute_ratio at the moment of the crash. Inspect a, b, and b - a.
strace traces system calls — the interface between your program and the Linux kernel. Run this section inside Docker.
docker compose up --build -d
docker compose exec app bashYou're now inside the Linux container. The app is running as PID 1.
# In a second exec session (open another terminal)
docker compose exec app bash
# Trace PID 1 for a few seconds (Ctrl+C to stop)
strace -p 1You'll see a stream of syscalls — mostly epoll_wait (the server waiting for connections) and futex (thread synchronization).
Important: Since our app is multithreaded, always use -f to follow all threads. Without it, strace only traces the main thread — you won't see activity from HTTP handler threads or the background worker.
Filter to only see what you care about:
# Network syscalls only
strace -f -p 1 -e trace=network
# File operations only
strace -f -p 1 -e trace=file
# Just write() calls (log writes + HTTP responses)
strace -f -p 1 -e trace=write
# Just read() calls
strace -f -p 1 -e trace=readWhile strace is running, trigger activity from your host:
curl http://localhost:8080/
curl http://localhost:8080/fibThe -f flag follows all threads (important for our multithreaded app):
strace -f -p 1 -e trace=writeYou'll see writes from both the HTTP handler threads and the background worker thread, each tagged with their thread ID ([pid XXXX]).
# Show time spent in each syscall
strace -T -p 1
# Summary: count and time per syscall type (Ctrl+C to stop and print)
strace -f -c -p 1The -c output shows a table like:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
45.00 0.000450 10 45 epoll_wait
30.00 0.000300 5 60 write
...
- Start tracing network calls:
strace -f -p 1 -e trace=network - From your host:
curl http://localhost:8080/ - Observe the sequence:
accept4→recvfrom(incoming HTTP request) →sendto(HTTP response) →close
Note: Linux sockets use recvfrom/sendto (network syscalls), not read/write. So to see both sides of an HTTP request, trace network syscalls — not read+write:
strace -f -p 1 -e trace=networkYou'll see the raw HTTP request bytes in recvfrom() and the JSON response in sendto().
To see file writes (like log output) alongside network activity:
strace -f -p 1 -e trace=network,writeThe FileHandler opens app.log once at startup and keeps the fd open, so you won't see repeated openat calls. Instead, first find which fd belongs to the log file:
ls -l /proc/1/fd | grep app.logNote the fd number (e.g. 3). Now trace writes and filter for that fd:
strace -f -p 1 -e trace=write 2>&1 | grep 'write(3,'Wait for the background worker to tick. You'll see the log messages being written: write(3, "2026-... worker tick=...").
# Idle: run for ~15 seconds, then Ctrl+C to see the summary
strace -f -c -p 1
# Under load: in one terminal start strace, in another generate traffic
strace -f -c -p 1
# (in another terminal)
for i in $(seq 1 50); do curl -s http://localhost:8080/fib > /dev/null & done
# Wait a bit, then Ctrl+C strace to see the summaryCompare the syscall counts and time distribution between idle and load.
gdb lets you inspect and control a running process at the C level. Since Python is interpreted by CPython, we can use gdb to look at both C frames and Python frames.
The Dockerfile installs gdb and python3-dbg which provides Python-aware gdb extensions (py-bt, py-list, py-locals).
docker compose exec app bash
gdb -p 1gdb will pause the process. You'll see something like:
(gdb)
Important: The app is paused while gdb is attached. Use c (continue) to resume it, or detach to leave.
(gdb) bt # C-level backtrace
(gdb) info threads # List all threads
(gdb) thread 2 # Switch to thread 2
(gdb) bt # Backtrace of thread 2
You'll see CPython interpreter frames like _PyEval_EvalFrameDefault, PyObject_Call, etc.
These commands show you the Python level, not the C level:
(gdb) py-bt # Python backtrace (much more readable)
(gdb) py-list # Show Python source at current position
(gdb) py-locals # Show Python local variables
Try it:
(gdb) info threads
(gdb) thread 1 # Main thread (HTTP server)
(gdb) py-bt
(gdb) thread 2 # Background worker
(gdb) py-bt
(gdb) py-locals
You should see the worker's tick variable and time.sleep() in the backtrace.
(gdb) break write # Break on the write() syscall
(gdb) c # Continue — will break on next log write
# When it breaks:
(gdb) bt # See what triggered the write
(gdb) py-bt # See the Python context
(gdb) c # Continue again
To remove the breakpoint:
(gdb) info breakpoints
(gdb) delete 1 # Delete breakpoint number 1
- Attach:
gdb -p 1 info threads— identify the main thread and worker thread- Switch to each thread and run
py-bt - For the worker thread, run
py-localsto see the current tick count detachto release the process
Questions to answer:
- Which thread is the main HTTP server loop?
- Which thread is the background worker?
- What is the worker currently doing (sleeping? computing fibonacci?)
- Attach:
gdb -p 1 break writec(continue)- When it breaks, run
py-btto see what Python code triggered the write - Is it a log write? HTTP response? Something else?
delete 1thencto clean up
When you run bt after breaking on write(), you'll see something like:
#0 write () from /lib/aarch64-linux-gnu/libc.so.6
#1 _Py_write_impl (fd=3, buf=0xaaaac3f53c60, count=71, gil_held=1) at Python/fileutils.c:1836
#2 _Py_write (fd=3, buf=0xaaaac3f53c60, count=<optimized out>) at Python/fileutils.c:1896
#3 _io_FileIO_write_impl (b=..., self=...) at ./Modules/_io/fileio.c:863
#4 _io_FileIO_write (self=..., arg=<memoryview at remote 0xffffa6d2bc40>) at ./Modules/_io/clinic/fileio.c.h:304
#5 method_vectorcall_O (func=<method_descriptor ...>) at Objects/descrobject.c:481
Read from bottom to top — that's the call order:
| Frame | What's happening |
|---|---|
#5 method_vectorcall_O |
CPython's calling convention — dispatching a built-in method call with one argument |
#4 _io_FileIO_write |
Auto-generated argument parsing for FileIO.write() (from CPython's clinic tool) |
#3 _io_FileIO_write_impl |
The real implementation of Python's FileIO.write() — you can see the file object and fd here |
#2 _Py_write |
CPython's internal write wrapper — handles errors, GIL, and retry-on-EINTR |
#1 _Py_write_impl |
The inner implementation: fd=3 (our app.log), count=71 (bytes to write), gil_held=1 (GIL is held) |
#0 write |
The glibc write() syscall — crossing from userspace into the kernel (writes to page cache, not disk) |
The full chain: Python logging → FileHandler.emit() → C FileIO.write() → CPython _Py_write → glibc write(3, buf, 71) → kernel page cache.
Take a snapshot of the entire process state without killing it:
(gdb) generate-core-file
This creates a core.<pid> file. You can later analyze it offline:
gdb python3 core.<pid>This is useful for capturing the state of a misbehaving production process.
Note: Core files are large — even for our simple app, expect ~90 MB. The dump includes the entire process memory: the Python interpreter, all loaded shared libraries (libc, libpython, etc.), and every Python object on the heap. For real production services this can easily be gigabytes.
Always detach cleanly when done:
(gdb) detach
(gdb) quit
| Tool | Scope | Key Use |
|---|---|---|
| pdb | Python source level | Breakpoints, step through logic, inspect variables |
| strace | Syscall boundary | See file/network/IPC activity, measure syscall timing |
| gdb | Machine/C level | Inspect threads, memory, attach to running processes |
docker compose down
rm -f app.log