diff --git a/.claude/commands/dev-checklists.md b/.claude/commands/dev-checklists.md index 342118a..a4dc87e 100644 --- a/.claude/commands/dev-checklists.md +++ b/.claude/commands/dev-checklists.md @@ -46,9 +46,13 @@ When implementing or modifying code that affects statistical methodology (estima - [ ] For edge cases: either match reference OR document deviation 3. **When deviating from reference implementations**: - - [ ] Add a **Note** in the Methodology Registry explaining the deviation + - [ ] Add entry in `docs/methodology/REGISTRY.md` using a reviewer-recognized label: + `**Note:**`, `**Deviation from R:**`, or `**Note (deviation from R):**` + (see CLAUDE.md "Documenting Deviations" for full format reference) - [ ] Include rationale (e.g., "defensive enhancement", "R errors here") - [ ] Ensure the deviation is an improvement, not a bug + - [ ] If deferring P2/P3 work: add row to `TODO.md` table under "Tech Debt from Code + Reviews" with columns `Issue | Location | PR | Priority` 4. **Testing methodology-aligned behavior**: - [ ] Test that edge cases produce documented behavior (NaN, warning, etc.) diff --git a/.claude/commands/pre-merge-check.md b/.claude/commands/pre-merge-check.md index fca6e07..0454d83 100644 --- a/.claude/commands/pre-merge-check.md +++ b/.claude/commands/pre-merge-check.md @@ -107,6 +107,20 @@ git diff HEAD -- | grep "^+.*def " | head -10 For each changed function, flag: "Verify docstring Parameters section matches updated signature for: ``" +#### 2.5 Methodology Documentation Check + +If any methodology files changed, check whether `docs/methodology/REGISTRY.md` was also +modified in the changed file set (from Section 1). + +If methodology files changed but REGISTRY.md was NOT modified, flag: +"Methodology files changed but `docs/methodology/REGISTRY.md` was not updated. If your +changes deviate from reference implementations, document them using a reviewer-recognized +label (`**Note:**`, `**Deviation from R:**`, or `**Note (deviation from R):**`) — +undocumented deviations are flagged as P1 by the AI reviewer and cannot be mitigated +by TODO.md." + +This is a WARNING, not a blocker — not every methodology change involves a deviation. + ### 3. Display Context-Specific Checklist Based on what changed, display the appropriate checklist items: @@ -134,6 +148,11 @@ Based on your changes to: - [ ] Control group composition verified for new code paths - [ ] "Not-yet-treated" excludes the treatment cohort itself - [ ] Parameter interactions tested with all aggregation methods + +### Methodology Deviation Documentation +- [ ] If deviating from reference implementation: added a reviewer-recognized label + (`**Note:**`, `**Deviation from R:**`, or `**Note (deviation from R):**`) in REGISTRY.md +- [ ] No undocumented methodology deviations (AI reviewer flags these as P1) ``` #### If Documentation Files Changed @@ -141,7 +160,6 @@ Based on your changes to: ### Documentation Sync - [ ] Docstrings updated for changed function signatures - [ ] README updated if user-facing behavior changes -- [ ] REGISTRY.md updated if methodology edge cases change ``` #### If This Appears to Be a Bug Fix diff --git a/CLAUDE.md b/CLAUDE.md index 4e0db93..c66c721 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -91,6 +91,29 @@ pytest tests/test_rust_backend.py -v Standalone estimators must be updated individually. 8. **Dependencies**: numpy, pandas, and scipy ONLY. No statsmodels. +## Documenting Deviations (AI Review Compatibility) + +The AI PR reviewer recognizes deviations as documented (and downgrades them to P3) ONLY +when they use specific label patterns in `docs/methodology/REGISTRY.md`. Using different +wording will cause a P1 finding ("undocumented methodology deviation"). + +**Recognized REGISTRY.md labels** — use one of these in the relevant estimator section: + +| Label | When to use | Example | +|-------|------------|---------| +| `- **Note:** ` | Defensive enhancements, implementation choices | `- **Note:** Defensive enhancement matching CallawaySantAnna NaN convention` | +| `- **Deviation from R:** ` | Intentional differences from R packages | `- **Deviation from R:** R's fixest uses t-distribution at all levels` | +| `**Note (deviation from R):** ` | Combined form, inline within edge case bullets | See SyntheticDiD section in REGISTRY.md | + +**TODO.md format** — for deferring P2/P3 items only (P0/P1 cannot be deferred): + +Add a row to the table in `TODO.md` under "Tech Debt from Code Reviews" in the appropriate +category (`Methodology/Correctness`, `Performance`, or `Testing/Docs`): + +| Issue | Location | PR | Priority | +|-------|----------|----|----------| +| Description of deferred item | `file.py` | #NNN | Medium/Low | + ## Testing Conventions - **`ci_params` fixture** (session-scoped in `conftest.py`): Use `ci_params.bootstrap(n)` and