feat(etc): Add Witness Diff Utility by refcell · Pull Request #1231 · base/base

refcell · 2026-03-11T14:58:59Z

Summary

Ports the witness-diff utility from the regression testing branch into etc/. The tool compares two execution witnesses — one fetched live from a reference L2 node via debug_executionWitness and one read from a local JSON file — to identify exactly which trie nodes, bytecodes, and key preimages diverge. The core analysis decodes RLP-encoded trie leaf nodes from both witnesses and finds pairs that share the same path but carry different values, then further decodes those as TrieAccount structs to produce a human-readable account-level diff with nonce, balance, code hash, and storage root side-by-side. This is the primary tool for diagnosing state root mismatches without manually comparing multi-megabyte witness blobs.

cb-heimdall · 2026-03-11T14:59:03Z

🟡 Heimdall Review Status

Requirement Status More Info

Reviews

🟡 0/1

Denominator calculation

Show calculation

1 if user is bot 0

1 if user is external 0

2 if repo is sensitive 0

From .codeflow.yml 1

Additional review requirements

Show calculation

Max	0

0

From CODEOWNERS 0

Global minimum 0

Max 1

1

1 if commit is unverified 0

Sum 1

etc/witness-diff/src/main.rs

etc/tools/witness-diff/src/main.rs

etc/tools/witness-diff/Cargo.toml

etc/tools/witness-diff/src/main.rs

refcell · 2026-03-11T16:12:04Z

TODO @refcell move this into an etc/tools/ subdir

etc/tools/witness-diff/src/main.rs

etc/tools/witness-diff/src/diff.rs

Ports the witness-diff utility into etc/tools/. The tool compares two execution witnesses — one fetched live from a reference L2 node via debug_executionWitness and one read from a local JSON file — to identify exactly which trie nodes, bytecodes, and key preimages diverge. The core analysis decodes RLP-encoded trie leaf nodes from both witnesses and finds pairs that share the same path but carry different values, then further decodes those as TrieAccount structs to produce a human-readable account-level diff with nonce, balance, code hash, and storage root side-by-side. This is the primary tool for diagnosing state root mismatches without manually comparing multi-megabyte witness blobs.

- Use HashSet<&Nibbles> instead of HashMap<&Nibbles, &Bytes> for the prefix-only membership sets; the values were never read - Add an O(1) fast path to resolve_address for the common case where the leaf prefix covers the full 64-nibble path; fall back to the O(n) suffix scan only for partial paths

github-actions · 2026-03-12T13:34:07Z

etc/tools/witness-diff/src/diff.rs

+/// Fetches the reference witness from the given RPC URL and compares it against
+/// the local witness file, printing a human-readable diff of diverging trie nodes,
+/// bytecodes, and account state.
+pub async fn run(local: PathBuf, rpc_url: String, block: u64) -> Result<()> {


fs::read_to_string (line 163) performs synchronous/blocking I/O inside an async fn running on the tokio runtime. For potentially large witness files (multi-megabyte per the README), this blocks the entire runtime thread.

Use tokio::fs::read_to_string instead:

let local_json = tokio::fs::read_to_string(&local).await?;

This requires adding tokio to the library's dependencies (it's already a workspace dep). Alternatively, since this is a single-threaded CLI tool, you could use tokio::task::spawn_blocking — but swapping to tokio::fs is simpler.

github-actions · 2026-03-12T13:34:19Z

etc/tools/witness-diff/src/diff.rs

+/// Deserializes a vector while treating a JSON `null` as an empty vector.
+pub fn deserialize_null_vec<'de, D, T>(deserializer: D) -> Result<Vec<T>, D::Error>
+where
+    D: Deserializer<'de>,
+    T: Deserialize<'de>,
+{
+    Option::<Vec<T>>::deserialize(deserializer).map(Option::unwrap_or_default)


deserialize_null_vec is exported as pub and re-exported via pub use diff::* in lib.rs, making it part of the crate's public API. This is a serde implementation detail that external consumers should never call.

Consider either:

Making it pub(crate) or private (used only via the #[serde(deserialize_with = ...)] attribute, which doesn't require pub visibility when the function is in the same module)

Or, since the Witness struct fields already use #[serde(default)] alongside the custom deserializer, you could simplify by removing this function entirely and using a #[serde(default, deserialize_with = "...")] with a standard approach. In fact, #[serde(default)] alone already handles missing fields by producing an empty Vec. The only case deserialize_null_vec adds is handling an explicit JSON null — if that's a real concern, keep the function but make it non-pub.

Suggested change

/// Deserializes a vector while treating a JSON `null` as an empty vector.

pub fn deserialize_null_vec<'de, D, T>(deserializer: D) -> Result<Vec<T>, D::Error>

where

D: Deserializer<'de>,

T: Deserialize<'de>,

{

Option::<Vec<T>>::deserialize(deserializer).map(Option::unwrap_or_default)

fn deserialize_null_vec<'de, D, T>(deserializer: D) -> Result<Vec<T>, D::Error>

where

D: Deserializer<'de>,

T: Deserialize<'de>,

{

Option::<Vec<T>>::deserialize(deserializer).map(Option::unwrap_or_default)

}

github-actions · 2026-03-12T13:34:30Z

etc/tools/witness-diff/src/diff.rs

+/// Run the witness diff tool.
+///
+/// Fetches the reference witness from the given RPC URL and compares it against
+/// the local witness file, printing a human-readable diff of diverging trie nodes,
+/// bytecodes, and account state.
+pub async fn run(local: PathBuf, rpc_url: String, block: u64) -> Result<()> {


Per project conventions: "Prefer placing functions as methods on a type (even a unit struct) rather than as bare functions, so the public API exports types, not loose functions."

The public API of this crate is 9 bare functions (run, index_by_hash, build_address_lookup, build_nibble_address_lookup, resolve_address, extract_leaves, try_decode_account, find_differing_leaf_pairs, compare_sets) plus 2 structs (Witness, DecodedLeaf). Most of these are internal implementation details of run and shouldn't be individually pub.

Consider grouping the comparison logic behind a WitnessDiff type (or similar) that holds the two witnesses and exposes a single run/compare method. The helper functions become private methods or module-private functions. This keeps the public API surface to the minimum needed by main.rs.

github-actions · 2026-03-12T13:34:37Z

etc/tools/witness-diff/src/diff.rs

+pub fn find_differing_leaf_pairs(
+    local_leaves: &[DecodedLeaf],
+    rpc_leaves: &[DecodedLeaf],
+) -> Vec<(Nibbles, Bytes, Bytes)> {
+    let rpc_by_prefix: HashMap<&Nibbles, &Bytes> =
+        rpc_leaves.iter().map(|l| (&l.prefix, &l.value)).collect();
+
+    let mut pairs = Vec::new();
+    for local_leaf in local_leaves {
+        if let Some(rpc_value) = rpc_by_prefix.get(&local_leaf.prefix)
+            && local_leaf.value != **rpc_value
+        {
+            pairs.push((local_leaf.prefix, local_leaf.value.clone(), (*rpc_value).clone()));


This function returns Vec<(Nibbles, Bytes, Bytes)> — a tuple with unnamed fields. Since this is a diagnostic tool where clarity matters, consider returning a named struct:

pub struct LeafDiff { pub prefix: Nibbles, pub local_value: Bytes, pub rpc_value: Bytes, }

This would make the destructuring at line 217 (for (prefix, local_value, rpc_value) in &leaf_pairs) self-documenting and prevent accidental field swaps between local_value and rpc_value.

github-actions · 2026-03-12T13:34:43Z

etc/tools/witness-diff/src/diff.rs

+/// Compare two sets of bytes and print a compact diff summary.
+///
+/// Returns the total number of entries that differ (only-local + only-rpc).
+pub fn compare_sets(label: &str, local_items: &[Bytes], rpc_items: &[Bytes]) -> usize {
+    let local_map = index_by_hash(local_items);
+    let rpc_map = index_by_hash(rpc_items);
+
+    let only_local = local_map.keys().filter(|h| !rpc_map.contains_key(*h)).count();
+    let only_rpc = rpc_map.keys().filter(|h| !local_map.contains_key(*h)).count();
+    let common = local_map.keys().filter(|h| rpc_map.contains_key(*h)).count();
+
+    println!("  {label}: {common} common, {only_local} only-local, {only_rpc} only-rpc");
+
+    only_local + only_rpc


compare_sets mixes computation with I/O (println!), which makes it impossible to test or reuse without producing console output. Consider returning a result struct and letting the caller handle printing:

pub struct SetComparison { pub common: usize, pub only_local: usize, pub only_rpc: usize, }

Minor point for a diagnostic CLI, but it would also simplify the function signature by removing the label parameter.

github-actions · 2026-03-12T13:35:58Z

Review Summary

The witness-diff utility is well-structured and addresses a real debugging need. The code has been significantly improved from the earlier revision (logic extracted to a library module, HashSet used correctly, total_diffs accounting fixed, iterator-based address lookup). A few items remain:

Issues

Blocking I/O in async context (diff.rs:163) — fs::read_to_string blocks the tokio runtime thread. Use tokio::fs::read_to_string instead, especially since witness files can be multi-megabyte.
Overly broad public API (diff.rs) — Per project conventions, functions should be methods on types rather than bare fns, and the public API should be intentional. Currently 9 bare functions and a serde helper are all pub and re-exported via pub use diff::*. Most are implementation details of run() and should be private or grouped behind a type.
deserialize_null_vec leaked as public API (diff.rs:18-24) — This serde helper is re-exported from lib.rs but is an implementation detail. Should be module-private.

Suggestions (non-blocking)

Unnamed tuple in find_differing_leaf_pairs return type (diff.rs:121-133) — Vec of (Nibbles, Bytes, Bytes) would benefit from a named struct to prevent accidental field swaps between local and rpc values.
compare_sets mixes computation with I/O (diff.rs:139-152) — Returning a struct instead of printing directly would improve testability.

refcell added K-feature Kind: New feature A-utilities Area: utility crates labels Mar 11, 2026

refcell self-assigned this Mar 11, 2026

refcell requested review from danyalprout and meyer9 March 11, 2026 14:59