Skip to content

Jonnymcc/strace_debug_tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Debugging Tutorial: pdb, strace, gdb

A hands-on tutorial using a single Python "guinea pig" program to learn three debugging tools. ~15 minutes per tool.

The Guinea Pig: app.py

A multithreaded Python HTTP server that:

  • Serves JSON on port 8080 (/, /fib, /crash, /info)
  • Runs a background worker thread (fibonacci + log writes every 5s)
  • Writes to app.log
  • Has an intentional bug on the /crash endpoint (division by zero)

Setup

# Run locally (for pdb section)
python3 app.py

# Run in Docker (for strace/gdb sections)
docker compose up --build

# Test it
curl http://localhost:8080/
curl http://localhost:8080/info

Part 1: Python Debugger (pdb) — ~15 min

Run app.py locally for this section. No Docker needed.

1.1 Setting Breakpoints

Add a breakpoint to the request handler. Edit app.py and add this line inside do_GET, right after count = ...:

def do_GET(self):
    DebugHandler.request_count += 1
    count = DebugHandler.request_count
    breakpoint()  # <-- add this

Now run:

python3 app.py

In another terminal:

curl http://localhost:8080/

The server will pause and drop you into the pdb prompt in the first terminal.

Use w (where) to see the call stack:

  ~/.pyenv/versions/3.11.4/lib/python3.11/socketserver.py(361)finish_request()
-> self.RequestHandlerClass(request, client_address, self)
  ~/.pyenv/versions/3.11.4/lib/python3.11/socketserver.py(755)__init__()
-> self.handle()
  ~/.pyenv/versions/3.11.4/lib/python3.11/http/server.py(436)handle()
-> self.handle_one_request()
  ~/.pyenv/versions/3.11.4/lib/python3.11/http/server.py(424)handle_one_request()
-> method()
> ~/strace_debug_tutorial/app.py(80)do_GET()
-> if self.path == "/"

Reading top to bottom (pdb shows oldest call first, newest last):

  1. finish_request() creates a new DebugHandler instance for each request
  2. The handler's __init__() immediately calls self.handle() (all work happens inside the constructor)
  3. handle() calls handle_one_request() which parses the HTTP method
  4. handle_one_request() calls method() — which resolves to your do_GET()
  5. You're now paused at the breakpoint inside do_GET()

1.2 Navigation Commands

Once at the (Pdb) prompt, try:

Command What it does
l List source code around current line
n Execute next line (step over)
s Step into function call
c Continue execution
w Show call stack (where)
u / d Move up/down the call stack

Try it: Step through the request handling with n until you see the response being built.

1.3 Inspection

Command What it does
p expr Print expression
pp expr Pretty-print expression
a Print args of current function
pp locals() Print all local variables
!var = val Modify a variable live

Try it:

(Pdb) p self.path
'/'
(Pdb) p count
1
(Pdb) pp locals()
(Pdb) pp vars(self)
{'client_address': ('127.0.0.1', 64457),
 'close_connection': True,
 'command': 'GET',
 'connection': <socket.socket fd=5, family=2, type=1, proto=0, laddr=('127.0.0.1', 8080), raddr=('127.0.0.1', 64457)>,
 'headers': <http.client.HTTPMessage object at 0x10849d390>,
 'path': '/',
 'raw_requestline': b'GET / HTTP/1.1\r\n',
 'request': <socket.socket fd=5, family=2, type=1, proto=0, laddr=('127.0.0.1', 8080), raddr=('127.0.0.1', 64457)>,
 'request_version': 'HTTP/1.1',
 'requestline': 'GET / HTTP/1.1',
 'rfile': <_io.BufferedReader name=5>,
 'server': <__main__.ThreadedTCPServer object at 0x103638950>,
 'wfile': <socketserver._SocketWriter object at 0x108542b00>}
(Pdb) pp self.headers.items()
(Pdb) p threading.active_count()

1.4 Conditional Breakpoints

Remove the breakpoint() line and instead run with -m pdb:

python3 -m pdb app.py

At the pdb prompt, set a conditional breakpoint:

(Pdb) b app.py:76, self.path == '/crash'
(Pdb) c

Now requests to / and /fib will pass through, but /crash will pause:

curl http://localhost:8080/       # passes through
curl http://localhost:8080/crash  # triggers breakpoint

1.5 Exercise: Find the Bug

The /crash endpoint has a division-by-zero bug. Use pdb to find it:

  1. Set a breakpoint on the /crash path (line 76)
  2. Step into the compute_ratio() function with s
  3. Inspect the arguments with a
  4. See why b - a equals zero
  5. Fix: Change the call or the function to handle a == b

1.6 Bonus: Post-Mortem Debugging

Instead of setting breakpoints, let the crash happen and debug after:

python3 -c "
import app
# Simulate the bug
try:
    app.compute_ratio(42, 42)
except ZeroDivisionError:
    import pdb; pdb.post_mortem()
"

At the pdb prompt you'll be inside compute_ratio at the moment of the crash. Inspect a, b, and b - a.


Part 2: strace — ~15 min

strace traces system calls — the interface between your program and the Linux kernel. Run this section inside Docker.

2.0 Start the Container

docker compose up --build -d
docker compose exec app bash

You're now inside the Linux container. The app is running as PID 1.

2.1 Attach to the Running Process

# In a second exec session (open another terminal)
docker compose exec app bash

# Trace PID 1 for a few seconds (Ctrl+C to stop)
strace -p 1

You'll see a stream of syscalls — mostly epoll_wait (the server waiting for connections) and futex (thread synchronization).

2.2 Trace Specific Syscalls

Important: Since our app is multithreaded, always use -f to follow all threads. Without it, strace only traces the main thread — you won't see activity from HTTP handler threads or the background worker.

Filter to only see what you care about:

# Network syscalls only
strace -f -p 1 -e trace=network

# File operations only
strace -f -p 1 -e trace=file

# Just write() calls (log writes + HTTP responses)
strace -f -p 1 -e trace=write

# Just read() calls
strace -f -p 1 -e trace=read

While strace is running, trigger activity from your host:

curl http://localhost:8080/
curl http://localhost:8080/fib

2.3 Follow Threads

The -f flag follows all threads (important for our multithreaded app):

strace -f -p 1 -e trace=write

You'll see writes from both the HTTP handler threads and the background worker thread, each tagged with their thread ID ([pid XXXX]).

2.4 Timing

# Show time spent in each syscall
strace -T -p 1

# Summary: count and time per syscall type (Ctrl+C to stop and print)
strace -f -c -p 1

The -c output shows a table like:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 45.00    0.000450          10        45           epoll_wait
 30.00    0.000300           5        60           write
...

2.5 Exercise: Trace an HTTP Request

  1. Start tracing network calls: strace -f -p 1 -e trace=network
  2. From your host: curl http://localhost:8080/
  3. Observe the sequence: accept4recvfrom (incoming HTTP request) → sendto (HTTP response) → close

Note: Linux sockets use recvfrom/sendto (network syscalls), not read/write. So to see both sides of an HTTP request, trace network syscalls — not read+write:

strace -f -p 1 -e trace=network

You'll see the raw HTTP request bytes in recvfrom() and the JSON response in sendto().

To see file writes (like log output) alongside network activity:

strace -f -p 1 -e trace=network,write

2.6 Exercise: Find File Writes

The FileHandler opens app.log once at startup and keeps the fd open, so you won't see repeated openat calls. Instead, first find which fd belongs to the log file:

ls -l /proc/1/fd | grep app.log

Note the fd number (e.g. 3). Now trace writes and filter for that fd:

strace -f -p 1 -e trace=write 2>&1 | grep 'write(3,'

Wait for the background worker to tick. You'll see the log messages being written: write(3, "2026-... worker tick=...").

2.7 Bonus: Load vs Idle Comparison

# Idle: run for ~15 seconds, then Ctrl+C to see the summary
strace -f -c -p 1

# Under load: in one terminal start strace, in another generate traffic
strace -f -c -p 1
# (in another terminal)
for i in $(seq 1 50); do curl -s http://localhost:8080/fib > /dev/null & done
# Wait a bit, then Ctrl+C strace to see the summary

Compare the syscall counts and time distribution between idle and load.


Part 3: gdb — ~15 min

gdb lets you inspect and control a running process at the C level. Since Python is interpreted by CPython, we can use gdb to look at both C frames and Python frames.

3.0 Prerequisites (already in the Docker image)

The Dockerfile installs gdb and python3-dbg which provides Python-aware gdb extensions (py-bt, py-list, py-locals).

3.1 Attach to the Running Process

docker compose exec app bash

gdb -p 1

gdb will pause the process. You'll see something like:

(gdb)

Important: The app is paused while gdb is attached. Use c (continue) to resume it, or detach to leave.

3.2 Basics

(gdb) bt                  # C-level backtrace
(gdb) info threads         # List all threads
(gdb) thread 2             # Switch to thread 2
(gdb) bt                  # Backtrace of thread 2

You'll see CPython interpreter frames like _PyEval_EvalFrameDefault, PyObject_Call, etc.

3.3 Python-Aware Debugging

These commands show you the Python level, not the C level:

(gdb) py-bt               # Python backtrace (much more readable)
(gdb) py-list             # Show Python source at current position
(gdb) py-locals           # Show Python local variables

Try it:

(gdb) info threads
(gdb) thread 1            # Main thread (HTTP server)
(gdb) py-bt
(gdb) thread 2            # Background worker
(gdb) py-bt
(gdb) py-locals

You should see the worker's tick variable and time.sleep() in the backtrace.

3.4 Breakpoints

(gdb) break write          # Break on the write() syscall
(gdb) c                    # Continue — will break on next log write

# When it breaks:
(gdb) bt                   # See what triggered the write
(gdb) py-bt               # See the Python context
(gdb) c                    # Continue again

To remove the breakpoint:

(gdb) info breakpoints
(gdb) delete 1             # Delete breakpoint number 1

3.5 Exercise: Inspect Thread States

  1. Attach: gdb -p 1
  2. info threads — identify the main thread and worker thread
  3. Switch to each thread and run py-bt
  4. For the worker thread, run py-locals to see the current tick count
  5. detach to release the process

Questions to answer:

  • Which thread is the main HTTP server loop?
  • Which thread is the background worker?
  • What is the worker currently doing (sleeping? computing fibonacci?)

3.6 Exercise: Break on write()

  1. Attach: gdb -p 1
  2. break write
  3. c (continue)
  4. When it breaks, run py-bt to see what Python code triggered the write
  5. Is it a log write? HTTP response? Something else?
  6. delete 1 then c to clean up

Reading a C backtrace

When you run bt after breaking on write(), you'll see something like:

#0  write () from /lib/aarch64-linux-gnu/libc.so.6
#1  _Py_write_impl (fd=3, buf=0xaaaac3f53c60, count=71, gil_held=1) at Python/fileutils.c:1836
#2  _Py_write (fd=3, buf=0xaaaac3f53c60, count=<optimized out>) at Python/fileutils.c:1896
#3  _io_FileIO_write_impl (b=..., self=...) at ./Modules/_io/fileio.c:863
#4  _io_FileIO_write (self=..., arg=<memoryview at remote 0xffffa6d2bc40>) at ./Modules/_io/clinic/fileio.c.h:304
#5  method_vectorcall_O (func=<method_descriptor ...>) at Objects/descrobject.c:481

Read from bottom to top — that's the call order:

Frame What's happening
#5 method_vectorcall_O CPython's calling convention — dispatching a built-in method call with one argument
#4 _io_FileIO_write Auto-generated argument parsing for FileIO.write() (from CPython's clinic tool)
#3 _io_FileIO_write_impl The real implementation of Python's FileIO.write() — you can see the file object and fd here
#2 _Py_write CPython's internal write wrapper — handles errors, GIL, and retry-on-EINTR
#1 _Py_write_impl The inner implementation: fd=3 (our app.log), count=71 (bytes to write), gil_held=1 (GIL is held)
#0 write The glibc write() syscall — crossing from userspace into the kernel (writes to page cache, not disk)

The full chain: Python loggingFileHandler.emit() → C FileIO.write() → CPython _Py_write → glibc write(3, buf, 71) → kernel page cache.

3.7 Bonus: Core File Snapshot

Take a snapshot of the entire process state without killing it:

(gdb) generate-core-file

This creates a core.<pid> file. You can later analyze it offline:

gdb python3 core.<pid>

This is useful for capturing the state of a misbehaving production process.

Note: Core files are large — even for our simple app, expect ~90 MB. The dump includes the entire process memory: the Python interpreter, all loaded shared libraries (libc, libpython, etc.), and every Python object on the heap. For real production services this can easily be gigabytes.

3.8 Detach

Always detach cleanly when done:

(gdb) detach
(gdb) quit

Quick Reference

Tool Scope Key Use
pdb Python source level Breakpoints, step through logic, inspect variables
strace Syscall boundary See file/network/IPC activity, measure syscall timing
gdb Machine/C level Inspect threads, memory, attach to running processes

Cleanup

docker compose down
rm -f app.log

About

pdb, strace, gdb

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors