Skip to content

feat(etc): Add Witness Diff Utility#1231

Open
refcell wants to merge 2 commits intomainfrom
refcell/witness-diff
Open

feat(etc): Add Witness Diff Utility#1231
refcell wants to merge 2 commits intomainfrom
refcell/witness-diff

Conversation

@refcell
Copy link
Contributor

@refcell refcell commented Mar 11, 2026

Summary

Ports the witness-diff utility from the regression testing branch into etc/. The tool compares two execution witnesses — one fetched live from a reference L2 node via debug_executionWitness and one read from a local JSON file — to identify exactly which trie nodes, bytecodes, and key preimages diverge. The core analysis decodes RLP-encoded trie leaf nodes from both witnesses and finds pairs that share the same path but carry different values, then further decodes those as TrieAccount structs to produce a human-readable account-level diff with nonce, balance, code hash, and storage root side-by-side. This is the primary tool for diagnosing state root mismatches without manually comparing multi-megabyte witness blobs.

@refcell refcell added K-feature Kind: New feature A-utilities Area: utility crates labels Mar 11, 2026
@refcell refcell self-assigned this Mar 11, 2026
@cb-heimdall
Copy link
Collaborator

cb-heimdall commented Mar 11, 2026

🟡 Heimdall Review Status

Requirement Status More Info
Reviews 🟡 0/1
Denominator calculation
Show calculation
1 if user is bot 0
1 if user is external 0
2 if repo is sensitive 0
From .codeflow.yml 1
Additional review requirements
Show calculation
Max 0
0
From CODEOWNERS 0
Global minimum 0
Max 1
1
1 if commit is unverified 0
Sum 1

@refcell refcell requested review from danyalprout and meyer9 March 11, 2026 14:59
@refcell
Copy link
Contributor Author

refcell commented Mar 11, 2026

TODO @refcell move this into an etc/tools/ subdir

@refcell refcell force-pushed the refcell/witness-diff branch from fe3ede0 to 6321af8 Compare March 12, 2026 13:22
Ports the witness-diff utility into etc/tools/. The tool compares two
execution witnesses — one fetched live from a reference L2 node via
debug_executionWitness and one read from a local JSON file — to identify
exactly which trie nodes, bytecodes, and key preimages diverge.

The core analysis decodes RLP-encoded trie leaf nodes from both witnesses
and finds pairs that share the same path but carry different values, then
further decodes those as TrieAccount structs to produce a human-readable
account-level diff with nonce, balance, code hash, and storage root
side-by-side. This is the primary tool for diagnosing state root
mismatches without manually comparing multi-megabyte witness blobs.
@refcell refcell force-pushed the refcell/witness-diff branch from 72c64c5 to 8e8303c Compare March 12, 2026 13:26
- Use HashSet<&Nibbles> instead of HashMap<&Nibbles, &Bytes> for the
  prefix-only membership sets; the values were never read
- Add an O(1) fast path to resolve_address for the common case where
  the leaf prefix covers the full 64-nibble path; fall back to the
  O(n) suffix scan only for partial paths
@refcell refcell changed the title feat(etc): add witness-diff utility feat(etc): Add Witness Diff Utility Mar 12, 2026
/// Fetches the reference witness from the given RPC URL and compares it against
/// the local witness file, printing a human-readable diff of diverging trie nodes,
/// bytecodes, and account state.
pub async fn run(local: PathBuf, rpc_url: String, block: u64) -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fs::read_to_string (line 163) performs synchronous/blocking I/O inside an async fn running on the tokio runtime. For potentially large witness files (multi-megabyte per the README), this blocks the entire runtime thread.

Use tokio::fs::read_to_string instead:

let local_json = tokio::fs::read_to_string(&local).await?;

This requires adding tokio to the library's dependencies (it's already a workspace dep). Alternatively, since this is a single-threaded CLI tool, you could use tokio::task::spawn_blocking — but swapping to tokio::fs is simpler.

Comment on lines +18 to +24
/// Deserializes a vector while treating a JSON `null` as an empty vector.
pub fn deserialize_null_vec<'de, D, T>(deserializer: D) -> Result<Vec<T>, D::Error>
where
D: Deserializer<'de>,
T: Deserialize<'de>,
{
Option::<Vec<T>>::deserialize(deserializer).map(Option::unwrap_or_default)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deserialize_null_vec is exported as pub and re-exported via pub use diff::* in lib.rs, making it part of the crate's public API. This is a serde implementation detail that external consumers should never call.

Consider either:

  • Making it pub(crate) or private (used only via the #[serde(deserialize_with = ...)] attribute, which doesn't require pub visibility when the function is in the same module)
  • Or, since the Witness struct fields already use #[serde(default)] alongside the custom deserializer, you could simplify by removing this function entirely and using a #[serde(default, deserialize_with = "...")] with a standard approach. In fact, #[serde(default)] alone already handles missing fields by producing an empty Vec. The only case deserialize_null_vec adds is handling an explicit JSON null — if that's a real concern, keep the function but make it non-pub.
Suggested change
/// Deserializes a vector while treating a JSON `null` as an empty vector.
pub fn deserialize_null_vec<'de, D, T>(deserializer: D) -> Result<Vec<T>, D::Error>
where
D: Deserializer<'de>,
T: Deserialize<'de>,
{
Option::<Vec<T>>::deserialize(deserializer).map(Option::unwrap_or_default)
fn deserialize_null_vec<'de, D, T>(deserializer: D) -> Result<Vec<T>, D::Error>
where
D: Deserializer<'de>,
T: Deserialize<'de>,
{
Option::<Vec<T>>::deserialize(deserializer).map(Option::unwrap_or_default)
}

Comment on lines +155 to +160
/// Run the witness diff tool.
///
/// Fetches the reference witness from the given RPC URL and compares it against
/// the local witness file, printing a human-readable diff of diverging trie nodes,
/// bytecodes, and account state.
pub async fn run(local: PathBuf, rpc_url: String, block: u64) -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per project conventions: "Prefer placing functions as methods on a type (even a unit struct) rather than as bare functions, so the public API exports types, not loose functions."

The public API of this crate is 9 bare functions (run, index_by_hash, build_address_lookup, build_nibble_address_lookup, resolve_address, extract_leaves, try_decode_account, find_differing_leaf_pairs, compare_sets) plus 2 structs (Witness, DecodedLeaf). Most of these are internal implementation details of run and shouldn't be individually pub.

Consider grouping the comparison logic behind a WitnessDiff type (or similar) that holds the two witnesses and exposes a single run/compare method. The helper functions become private methods or module-private functions. This keeps the public API surface to the minimum needed by main.rs.

Comment on lines +121 to +133
pub fn find_differing_leaf_pairs(
local_leaves: &[DecodedLeaf],
rpc_leaves: &[DecodedLeaf],
) -> Vec<(Nibbles, Bytes, Bytes)> {
let rpc_by_prefix: HashMap<&Nibbles, &Bytes> =
rpc_leaves.iter().map(|l| (&l.prefix, &l.value)).collect();

let mut pairs = Vec::new();
for local_leaf in local_leaves {
if let Some(rpc_value) = rpc_by_prefix.get(&local_leaf.prefix)
&& local_leaf.value != **rpc_value
{
pairs.push((local_leaf.prefix, local_leaf.value.clone(), (*rpc_value).clone()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function returns Vec<(Nibbles, Bytes, Bytes)> — a tuple with unnamed fields. Since this is a diagnostic tool where clarity matters, consider returning a named struct:

pub struct LeafDiff {
    pub prefix: Nibbles,
    pub local_value: Bytes,
    pub rpc_value: Bytes,
}

This would make the destructuring at line 217 (for (prefix, local_value, rpc_value) in &leaf_pairs) self-documenting and prevent accidental field swaps between local_value and rpc_value.

Comment on lines +139 to +152
/// Compare two sets of bytes and print a compact diff summary.
///
/// Returns the total number of entries that differ (only-local + only-rpc).
pub fn compare_sets(label: &str, local_items: &[Bytes], rpc_items: &[Bytes]) -> usize {
let local_map = index_by_hash(local_items);
let rpc_map = index_by_hash(rpc_items);

let only_local = local_map.keys().filter(|h| !rpc_map.contains_key(*h)).count();
let only_rpc = rpc_map.keys().filter(|h| !local_map.contains_key(*h)).count();
let common = local_map.keys().filter(|h| rpc_map.contains_key(*h)).count();

println!(" {label}: {common} common, {only_local} only-local, {only_rpc} only-rpc");

only_local + only_rpc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compare_sets mixes computation with I/O (println!), which makes it impossible to test or reuse without producing console output. Consider returning a result struct and letting the caller handle printing:

pub struct SetComparison {
    pub common: usize,
    pub only_local: usize,
    pub only_rpc: usize,
}

Minor point for a diagnostic CLI, but it would also simplify the function signature by removing the label parameter.

@github-actions
Copy link
Contributor

Review Summary

The witness-diff utility is well-structured and addresses a real debugging need. The code has been significantly improved from the earlier revision (logic extracted to a library module, HashSet used correctly, total_diffs accounting fixed, iterator-based address lookup). A few items remain:

Issues

  1. Blocking I/O in async context (diff.rs:163) — fs::read_to_string blocks the tokio runtime thread. Use tokio::fs::read_to_string instead, especially since witness files can be multi-megabyte.

  2. Overly broad public API (diff.rs) — Per project conventions, functions should be methods on types rather than bare fns, and the public API should be intentional. Currently 9 bare functions and a serde helper are all pub and re-exported via pub use diff::*. Most are implementation details of run() and should be private or grouped behind a type.

  3. deserialize_null_vec leaked as public API (diff.rs:18-24) — This serde helper is re-exported from lib.rs but is an implementation detail. Should be module-private.

Suggestions (non-blocking)

  1. Unnamed tuple in find_differing_leaf_pairs return type (diff.rs:121-133) — Vec of (Nibbles, Bytes, Bytes) would benefit from a named struct to prevent accidental field swaps between local and rpc values.

  2. compare_sets mixes computation with I/O (diff.rs:139-152) — Returning a struct instead of printing directly would improve testability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-utilities Area: utility crates K-feature Kind: New feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants