Conversation
🟡 Heimdall Review Status
|
|
TODO @refcell move this into an |
fe3ede0 to
6321af8
Compare
Ports the witness-diff utility into etc/tools/. The tool compares two execution witnesses — one fetched live from a reference L2 node via debug_executionWitness and one read from a local JSON file — to identify exactly which trie nodes, bytecodes, and key preimages diverge. The core analysis decodes RLP-encoded trie leaf nodes from both witnesses and finds pairs that share the same path but carry different values, then further decodes those as TrieAccount structs to produce a human-readable account-level diff with nonce, balance, code hash, and storage root side-by-side. This is the primary tool for diagnosing state root mismatches without manually comparing multi-megabyte witness blobs.
72c64c5 to
8e8303c
Compare
- Use HashSet<&Nibbles> instead of HashMap<&Nibbles, &Bytes> for the prefix-only membership sets; the values were never read - Add an O(1) fast path to resolve_address for the common case where the leaf prefix covers the full 64-nibble path; fall back to the O(n) suffix scan only for partial paths
| /// Fetches the reference witness from the given RPC URL and compares it against | ||
| /// the local witness file, printing a human-readable diff of diverging trie nodes, | ||
| /// bytecodes, and account state. | ||
| pub async fn run(local: PathBuf, rpc_url: String, block: u64) -> Result<()> { |
There was a problem hiding this comment.
fs::read_to_string (line 163) performs synchronous/blocking I/O inside an async fn running on the tokio runtime. For potentially large witness files (multi-megabyte per the README), this blocks the entire runtime thread.
Use tokio::fs::read_to_string instead:
let local_json = tokio::fs::read_to_string(&local).await?;This requires adding tokio to the library's dependencies (it's already a workspace dep). Alternatively, since this is a single-threaded CLI tool, you could use tokio::task::spawn_blocking — but swapping to tokio::fs is simpler.
| /// Deserializes a vector while treating a JSON `null` as an empty vector. | ||
| pub fn deserialize_null_vec<'de, D, T>(deserializer: D) -> Result<Vec<T>, D::Error> | ||
| where | ||
| D: Deserializer<'de>, | ||
| T: Deserialize<'de>, | ||
| { | ||
| Option::<Vec<T>>::deserialize(deserializer).map(Option::unwrap_or_default) |
There was a problem hiding this comment.
deserialize_null_vec is exported as pub and re-exported via pub use diff::* in lib.rs, making it part of the crate's public API. This is a serde implementation detail that external consumers should never call.
Consider either:
- Making it
pub(crate)or private (used only via the#[serde(deserialize_with = ...)]attribute, which doesn't requirepubvisibility when the function is in the same module) - Or, since the
Witnessstruct fields already use#[serde(default)]alongside the custom deserializer, you could simplify by removing this function entirely and using a#[serde(default, deserialize_with = "...")]with a standard approach. In fact,#[serde(default)]alone already handles missing fields by producing an emptyVec. The only casedeserialize_null_vecadds is handling an explicit JSONnull— if that's a real concern, keep the function but make it non-pub.
| /// Deserializes a vector while treating a JSON `null` as an empty vector. | |
| pub fn deserialize_null_vec<'de, D, T>(deserializer: D) -> Result<Vec<T>, D::Error> | |
| where | |
| D: Deserializer<'de>, | |
| T: Deserialize<'de>, | |
| { | |
| Option::<Vec<T>>::deserialize(deserializer).map(Option::unwrap_or_default) | |
| fn deserialize_null_vec<'de, D, T>(deserializer: D) -> Result<Vec<T>, D::Error> | |
| where | |
| D: Deserializer<'de>, | |
| T: Deserialize<'de>, | |
| { | |
| Option::<Vec<T>>::deserialize(deserializer).map(Option::unwrap_or_default) | |
| } |
| /// Run the witness diff tool. | ||
| /// | ||
| /// Fetches the reference witness from the given RPC URL and compares it against | ||
| /// the local witness file, printing a human-readable diff of diverging trie nodes, | ||
| /// bytecodes, and account state. | ||
| pub async fn run(local: PathBuf, rpc_url: String, block: u64) -> Result<()> { |
There was a problem hiding this comment.
Per project conventions: "Prefer placing functions as methods on a type (even a unit struct) rather than as bare functions, so the public API exports types, not loose functions."
The public API of this crate is 9 bare functions (run, index_by_hash, build_address_lookup, build_nibble_address_lookup, resolve_address, extract_leaves, try_decode_account, find_differing_leaf_pairs, compare_sets) plus 2 structs (Witness, DecodedLeaf). Most of these are internal implementation details of run and shouldn't be individually pub.
Consider grouping the comparison logic behind a WitnessDiff type (or similar) that holds the two witnesses and exposes a single run/compare method. The helper functions become private methods or module-private functions. This keeps the public API surface to the minimum needed by main.rs.
| pub fn find_differing_leaf_pairs( | ||
| local_leaves: &[DecodedLeaf], | ||
| rpc_leaves: &[DecodedLeaf], | ||
| ) -> Vec<(Nibbles, Bytes, Bytes)> { | ||
| let rpc_by_prefix: HashMap<&Nibbles, &Bytes> = | ||
| rpc_leaves.iter().map(|l| (&l.prefix, &l.value)).collect(); | ||
|
|
||
| let mut pairs = Vec::new(); | ||
| for local_leaf in local_leaves { | ||
| if let Some(rpc_value) = rpc_by_prefix.get(&local_leaf.prefix) | ||
| && local_leaf.value != **rpc_value | ||
| { | ||
| pairs.push((local_leaf.prefix, local_leaf.value.clone(), (*rpc_value).clone())); |
There was a problem hiding this comment.
This function returns Vec<(Nibbles, Bytes, Bytes)> — a tuple with unnamed fields. Since this is a diagnostic tool where clarity matters, consider returning a named struct:
pub struct LeafDiff {
pub prefix: Nibbles,
pub local_value: Bytes,
pub rpc_value: Bytes,
}This would make the destructuring at line 217 (for (prefix, local_value, rpc_value) in &leaf_pairs) self-documenting and prevent accidental field swaps between local_value and rpc_value.
| /// Compare two sets of bytes and print a compact diff summary. | ||
| /// | ||
| /// Returns the total number of entries that differ (only-local + only-rpc). | ||
| pub fn compare_sets(label: &str, local_items: &[Bytes], rpc_items: &[Bytes]) -> usize { | ||
| let local_map = index_by_hash(local_items); | ||
| let rpc_map = index_by_hash(rpc_items); | ||
|
|
||
| let only_local = local_map.keys().filter(|h| !rpc_map.contains_key(*h)).count(); | ||
| let only_rpc = rpc_map.keys().filter(|h| !local_map.contains_key(*h)).count(); | ||
| let common = local_map.keys().filter(|h| rpc_map.contains_key(*h)).count(); | ||
|
|
||
| println!(" {label}: {common} common, {only_local} only-local, {only_rpc} only-rpc"); | ||
|
|
||
| only_local + only_rpc |
There was a problem hiding this comment.
compare_sets mixes computation with I/O (println!), which makes it impossible to test or reuse without producing console output. Consider returning a result struct and letting the caller handle printing:
pub struct SetComparison {
pub common: usize,
pub only_local: usize,
pub only_rpc: usize,
}Minor point for a diagnostic CLI, but it would also simplify the function signature by removing the label parameter.
Review SummaryThe witness-diff utility is well-structured and addresses a real debugging need. The code has been significantly improved from the earlier revision (logic extracted to a library module, HashSet used correctly, total_diffs accounting fixed, iterator-based address lookup). A few items remain: Issues
Suggestions (non-blocking)
|
Summary
Ports the
witness-diffutility from the regression testing branch intoetc/. The tool compares two execution witnesses — one fetched live from a reference L2 node viadebug_executionWitnessand one read from a local JSON file — to identify exactly which trie nodes, bytecodes, and key preimages diverge. The core analysis decodes RLP-encoded trie leaf nodes from both witnesses and finds pairs that share the same path but carry different values, then further decodes those asTrieAccountstructs to produce a human-readable account-level diff with nonce, balance, code hash, and storage root side-by-side. This is the primary tool for diagnosing state root mismatches without manually comparing multi-megabyte witness blobs.