From 50f3beeeeb083ecbabc5318e17a512f82fd550b6 Mon Sep 17 00:00:00 2001
From: AnExiledDev <AnExiledDev@users.noreply.github.com>
Date: Sat, 14 Mar 2026 00:53:46 +0000
Subject: [PATCH] Release prep: CLI v0.1.0, spec workflow v2, scope guard
 fixes, docs sweep
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

CLI (experimental):
- Add index command group (build, search, show, stats, tree, clean)
- Add container command group (up, down, rebuild, exec, ls, shell)
- Add container proxy — auto-proxies into devcontainer from host
- Remove review command (never shipped)
- Mark CLI as experimental in all metadata and docs

Container (v2.1.0 + v2.1.1):
- Spec workflow v2 "Spec Packages" — 8 commands replaced with 3
- Scope guard: fix /dev/null false positive, fix CWD drift
- Updated agents, skills, system prompts, and config

Docs:
- Add CLI commands to reference, tools, and changelog pages
- Sync docs changelog with container CHANGELOG (v2.1.0, v2.1.1)
- Update spec workflow, agents, skills, and rules docs
---
 .gitignore                                    |   1 +
 cli/CHANGELOG.md                              |  23 +-
 cli/package.json                              |   5 +
 cli/prompts/review/correctness.system.md      |  86 ----
 cli/prompts/review/correctness.user.md        |  18 -
 cli/prompts/review/quality-resume.user.md     |  15 -
 cli/prompts/review/quality.system.md          | 106 -----
 cli/prompts/review/quality.user.md            |  20 -
 cli/prompts/review/security-resume.user.md    |  15 -
 cli/prompts/review/security.system.md         | 117 ------
 cli/prompts/review/security.user.md           |  18 -
 cli/src/commands/container/down.ts            |  31 ++
 cli/src/commands/container/exec.ts            |  42 ++
 cli/src/commands/container/ls.ts              |  51 +++
 cli/src/commands/container/rebuild.ts         |  44 ++
 cli/src/commands/container/shell.ts           |  29 ++
 cli/src/commands/container/up.ts              |  41 ++
 cli/src/commands/index/build.ts               | 171 ++++++++
 cli/src/commands/index/clean.ts               |  58 +++
 cli/src/commands/index/search.ts              | 204 ++++++++++
 cli/src/commands/index/show.ts                |  82 ++++
 cli/src/commands/index/stats.ts               |  69 ++++
 cli/src/commands/index/tree.ts                | 160 ++++++++
 cli/src/commands/plugin/disable.ts            |   8 +-
 cli/src/commands/plugin/enable.ts             |   8 +-
 cli/src/commands/review/review.ts             | 101 -----
 cli/src/index.ts                              |  79 +++-
 cli/src/indexer/db.ts                         | 332 +++++++++++++++
 cli/src/indexer/extractor.ts                  | 233 +++++++++++
 cli/src/indexer/folders.ts                    | 127 ++++++
 cli/src/indexer/rules.ts                      |  99 +++++
 cli/src/indexer/scanner.ts                    | 128 ++++++
 cli/src/loaders/plugin-loader.ts              |  15 +-
 cli/src/output/index-json.ts                  |  51 +++
 cli/src/output/index-text.ts                  | 212 ++++++++++
 cli/src/output/review.ts                      | 193 ---------
 cli/src/prompts/review.ts                     |  71 ----
 cli/src/runners/headless.ts                   | 146 -------
 cli/src/runners/review-runner.ts              | 355 ----------------
 cli/src/schemas/index.ts                      |  73 ++++
 cli/src/schemas/review.ts                     |  67 ---
 cli/src/utils/context.ts                      |  41 ++
 cli/src/utils/devcontainer.ts                 |  86 ++++
 cli/src/utils/docker.ts                       | 191 +++++++++
 cli/src/utils/platform.ts                     |   2 +
 cli/tests/index-commands.test.ts              | 383 ++++++++++++++++++
 cli/tests/indexer-db.test.ts                  | 382 +++++++++++++++++
 cli/tests/indexer-extractor.test.ts           | 307 ++++++++++++++
 cli/tests/review-output.test.ts               | 265 ------------
 cli/tests/review-runner.test.ts               | 236 -----------
 .../.codeforge/config/main-system-prompt.md   |  32 +-
 .../config/orchestrator-system-prompt.md      |  36 +-
 .../.codeforge/config/rules/session-search.md |  26 +-
 .../.codeforge/config/rules/spec-workflow.md  |  75 ++--
 container/.codeforge/config/settings.json     |  21 +-
 .../config/writing-system-prompt.md           |   5 -
 container/.devcontainer/CHANGELOG.md          |  38 ++
 container/.devcontainer/CLAUDE.md             |   2 +-
 container/.devcontainer/README.md             |  79 ++--
 .../features/codeforge-cli/README.md          |  10 +-
 .../codeforge-cli/devcontainer-feature.json   |   2 +-
 .../features/codeforge-cli/install.sh         |  45 +-
 .../plugins/agent-system/agents/architect.md  |   6 +-
 .../plugins/agent-system/agents/documenter.md |  23 +-
 .../plugins/agent-system/agents/generalist.md |   8 +-
 .../agent-system/agents/implementer.md        |   2 +-
 .../plugins/agent-system/agents/migrator.md   |   2 +-
 .../plugins/agent-system/agents/refactorer.md |   2 +-
 .../agent-system/agents/spec-writer.md        |  15 +-
 .../agent-system/agents/test-writer.md        |   2 +-
 .../plugins/prompt-snippets/README.md         |   2 +-
 .../prompt-snippets/skills/ps/SKILL.md        |   2 +-
 .../skill-engine/scripts/skill-suggester.py   | 218 +++++-----
 .../spec-workflow/.claude-plugin/plugin.json  |   2 +-
 .../plugins/spec-workflow/README.md           | 182 ++++-----
 .../spec-workflow/scripts/spec-reminder.py    |   9 +-
 .../spec-workflow/skills/build/SKILL.md       | 280 +++++++++++++
 .../build/references/review-checklist.md      |  94 +++++
 .../references/summary-report-template.md     |  77 ++++
 .../spec-workflow/skills/spec-build/SKILL.md  | 356 ----------------
 .../spec-build/references/review-checklist.md | 175 --------
 .../spec-workflow/skills/spec-check/SKILL.md  | 104 -----
 .../spec-workflow/skills/spec-init/SKILL.md   | 104 -----
 .../spec-init/references/backlog-template.md  |  23 --
 .../references/milestones-template.md         |  32 --
 .../spec-init/references/roadmap-template.md  |  33 --
 .../spec-workflow/skills/spec-new/SKILL.md    | 113 ------
 .../skills/spec-new/references/template.md    | 139 -------
 .../spec-workflow/skills/spec-refine/SKILL.md | 197 ---------
 .../spec-workflow/skills/spec-review/SKILL.md | 233 -----------
 .../spec-workflow/skills/spec-update/SKILL.md | 151 -------
 .../spec-workflow/skills/spec/SKILL.md        | 271 +++++++++++++
 .../spec/references/backlog-template.md       |  21 +
 .../spec/references/constitution-template.md  |  98 +++++
 .../spec/references/context-template.md       |  86 ++++
 .../skills/spec/references/ears-patterns.md   | 124 ++++++
 .../references/example-webhook/context.md     | 135 ++++++
 .../example-webhook/groups/a-registration.md  | 101 +++++
 .../example-webhook/groups/b-delivery.md      | 141 +++++++
 .../example-webhook/groups/c-retry.md         | 112 +++++
 .../example-webhook/groups/d-logs.md          |  94 +++++
 .../spec/references/example-webhook/index.md  |  84 ++++
 .../skills/spec/references/group-template.md  |  88 ++++
 .../skills/spec/references/index-template.md  |  88 ++++
 .../skills/specification-writing/SKILL.md     | 327 ---------------
 .../references/criteria-patterns.md           | 245 -----------
 .../references/ears-templates.md              | 239 -----------
 .../spec-workflow/skills/specs/SKILL.md       | 115 ++++++
 .../scripts/guard-workspace-scope.py          |  46 ++-
 .../scripts/inject-workspace-cwd.py           |  37 +-
 .../.devcontainer/scripts/setup-aliases.sh    |   2 +-
 docs/src/content/docs/customization/rules.md  |  10 +-
 docs/src/content/docs/features/agents.md      |  26 +-
 docs/src/content/docs/features/skills.md      |  10 +-
 docs/src/content/docs/features/tools.md       |  27 +-
 .../docs/getting-started/first-session.md     |   8 +-
 docs/src/content/docs/plugins/agent-system.md |   4 +-
 docs/src/content/docs/plugins/skill-engine.md |   8 +-
 .../src/content/docs/plugins/spec-workflow.md | 284 ++++++-------
 docs/src/content/docs/reference/changelog.md  |  33 ++
 docs/src/content/docs/reference/commands.md   |  32 +-
 docs/src/content/docs/reference/index.md      |  10 +-
 122 files changed, 6483 insertions(+), 4977 deletions(-)
 delete mode 100644 cli/prompts/review/correctness.system.md
 delete mode 100644 cli/prompts/review/correctness.user.md
 delete mode 100644 cli/prompts/review/quality-resume.user.md
 delete mode 100644 cli/prompts/review/quality.system.md
 delete mode 100644 cli/prompts/review/quality.user.md
 delete mode 100644 cli/prompts/review/security-resume.user.md
 delete mode 100644 cli/prompts/review/security.system.md
 delete mode 100644 cli/prompts/review/security.user.md
 create mode 100644 cli/src/commands/container/down.ts
 create mode 100644 cli/src/commands/container/exec.ts
 create mode 100644 cli/src/commands/container/ls.ts
 create mode 100644 cli/src/commands/container/rebuild.ts
 create mode 100644 cli/src/commands/container/shell.ts
 create mode 100644 cli/src/commands/container/up.ts
 create mode 100644 cli/src/commands/index/build.ts
 create mode 100644 cli/src/commands/index/clean.ts
 create mode 100644 cli/src/commands/index/search.ts
 create mode 100644 cli/src/commands/index/show.ts
 create mode 100644 cli/src/commands/index/stats.ts
 create mode 100644 cli/src/commands/index/tree.ts
 delete mode 100644 cli/src/commands/review/review.ts
 create mode 100644 cli/src/indexer/db.ts
 create mode 100644 cli/src/indexer/extractor.ts
 create mode 100644 cli/src/indexer/folders.ts
 create mode 100644 cli/src/indexer/rules.ts
 create mode 100644 cli/src/indexer/scanner.ts
 create mode 100644 cli/src/output/index-json.ts
 create mode 100644 cli/src/output/index-text.ts
 delete mode 100644 cli/src/output/review.ts
 delete mode 100644 cli/src/prompts/review.ts
 delete mode 100644 cli/src/runners/headless.ts
 delete mode 100644 cli/src/runners/review-runner.ts
 create mode 100644 cli/src/schemas/index.ts
 delete mode 100644 cli/src/schemas/review.ts
 create mode 100644 cli/src/utils/context.ts
 create mode 100644 cli/src/utils/devcontainer.ts
 create mode 100644 cli/src/utils/docker.ts
 create mode 100644 cli/tests/index-commands.test.ts
 create mode 100644 cli/tests/indexer-db.test.ts
 create mode 100644 cli/tests/indexer-extractor.test.ts
 delete mode 100644 cli/tests/review-output.test.ts
 delete mode 100644 cli/tests/review-runner.test.ts
 create mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/build/SKILL.md
 create mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/build/references/review-checklist.md
 create mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/build/references/summary-report-template.md
 delete mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-build/SKILL.md
 delete mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-build/references/review-checklist.md
 delete mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-check/SKILL.md
 delete mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-init/SKILL.md
 delete mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-init/references/backlog-template.md
 delete mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-init/references/milestones-template.md
 delete mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-init/references/roadmap-template.md
 delete mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-new/SKILL.md
 delete mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-new/references/template.md
 delete mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-refine/SKILL.md
 delete mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-review/SKILL.md
 delete mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-update/SKILL.md
 create mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/SKILL.md
 create mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/backlog-template.md
 create mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/constitution-template.md
 create mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/context-template.md
 create mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/ears-patterns.md
 create mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/example-webhook/context.md
 create mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/example-webhook/groups/a-registration.md
 create mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/example-webhook/groups/b-delivery.md
 create mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/example-webhook/groups/c-retry.md
 create mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/example-webhook/groups/d-logs.md
 create mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/example-webhook/index.md
 create mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/group-template.md
 create mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/index-template.md
 delete mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/specification-writing/SKILL.md
 delete mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/specification-writing/references/criteria-patterns.md
 delete mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/specification-writing/references/ears-templates.md
 create mode 100644 container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/specs/SKILL.md

diff --git a/.gitignore b/.gitignore
index 7376f80..dd19880 100644
--- a/.gitignore
+++ b/.gitignore
@@ -52,6 +52,7 @@ container/.devcontainer/.codeforge-preserve
 # CLI-specific
 cli/.pytest_cache/
 cli/.ruff_cache/
+.codeforge/data/
 
 # Docs-specific
 docs/.astro/
diff --git a/cli/CHANGELOG.md b/cli/CHANGELOG.md
index 5d5f97b..822fe6a 100644
--- a/cli/CHANGELOG.md
+++ b/cli/CHANGELOG.md
@@ -1,11 +1,20 @@
 # CodeForge CLI Changelog
 
-## v0.1.0 — 2026-03-05
+## v0.1.0 — 2026-03-14 (Experimental)
 
-Initial release.
+Initial release. Ships with CodeForge v2.1.0.
 
-- Session search, list, and show commands
-- Plan search command
-- Plugin management (list, show, enable, disable, hooks, agents, skills)
-- Config apply and show commands
-- AI-powered code review with 3-pass analysis (correctness, security, quality)
+### Command Groups
+
+- **`codeforge session`** — search, list, and show Claude Code session history
+- **`codeforge task`** — search tasks
+- **`codeforge plan`** — search plans
+- **`codeforge plugin`** — manage plugins (list, show, enable, disable, hooks, agents, skills)
+- **`codeforge config`** — show and apply configuration (`apply` deploys config to `~/.claude/`)
+- **`codeforge index`** — build and search a codebase symbol index (build, search, show, stats, tree, clean)
+- **`codeforge container`** — manage CodeForge devcontainers (up, down, rebuild, exec, ls, shell)
+
+### Features
+
+- Container proxy — commands auto-proxy into the running devcontainer when run from the host; use `--local` to bypass
+- `--container <name>` flag to target a specific container
diff --git a/cli/package.json b/cli/package.json
index 266b0b3..b2628ca 100644
--- a/cli/package.json
+++ b/cli/package.json
@@ -18,9 +18,14 @@
 		"build": "bun build src/index.ts --outfile dist/codeforge.js --target bun",
 		"dev": "bun run src/index.ts",
 		"test": "bun test",
+		"build:binary": "bun build --compile src/index.ts --outfile dist/codeforge",
+		"build:binary:linux": "bun build --compile src/index.ts --outfile dist/codeforge-linux-x64 --target bun-linux-x64",
+		"build:binary:darwin": "bun build --compile src/index.ts --outfile dist/codeforge-darwin-arm64 --target bun-darwin-arm64",
+		"build:binary:darwin-x64": "bun build --compile src/index.ts --outfile dist/codeforge-darwin-x64 --target bun-darwin-x64",
 		"prepublishOnly": "bun run build && bun test"
 	},
 	"dependencies": {
+		"@devcontainers/cli": "^0.71.0",
 		"commander": "^13.0.0",
 		"chalk": "^5.4.0"
 	},
diff --git a/cli/prompts/review/correctness.system.md b/cli/prompts/review/correctness.system.md
deleted file mode 100644
index ee26da2..0000000
--- a/cli/prompts/review/correctness.system.md
+++ /dev/null
@@ -1,86 +0,0 @@
-You are a code reviewer focused exclusively on correctness — bugs, logic errors, and behavioral defects that cause wrong results or runtime failures.
-
-You DO NOT review: style, naming conventions, performance, code quality, or security vulnerabilities. Those are handled by separate specialized review passes.
-
-## Issue Taxonomy
-
-### Control Flow Errors
-
-- Off-by-one in loops (fence-post errors) — CWE-193
-- Wrong boolean logic (De Morgan violations, inverted conditions)
-- Unreachable code or dead branches after early return
-- Missing break in switch/case (fall-through bugs)
-- Infinite loops from wrong termination conditions
-- Incorrect short-circuit evaluation order
-
-### Null/Undefined Safety
-
-- Property access on potentially null or undefined values — CWE-476
-- Missing optional chaining or null guards
-- Uninitialized variables used before assignment
-- Destructuring from nullable sources without defaults
-- Accessing .length or iterating over potentially undefined collections
-
-### Error Handling Defects
-
-- Uncaught exceptions from JSON.parse, network calls, file I/O, or regex
-- Empty catch blocks that silently swallow errors
-- Error objects discarded (catch without using or rethrowing the error)
-- Missing finally blocks for resource cleanup (streams, handles, connections)
-- Async errors: unhandled promise rejections, missing await on try/catch
-- Incorrect error propagation (throwing strings instead of Error objects)
-
-### Type and Data Errors
-
-- Implicit type coercion bugs (== vs ===, string + number concatenation)
-- Array index out of bounds on fixed-size or empty arrays — CWE-129
-- Integer overflow/underflow in arithmetic — CWE-190
-- Incorrect API usage (wrong argument order, missing required params, wrong return type handling)
-- String/number confusion in comparisons or map keys
-- Incorrect regular expression patterns (catastrophic backtracking, wrong escaping)
-
-### Concurrency and Timing
-
-- Race conditions in async code (TOCTOU: check-then-act) — CWE-367
-- Missing await on async functions (using the Promise instead of the resolved value)
-- Shared mutable state modified from concurrent async operations
-- Event ordering assumptions that may not hold (setup before listener, response before request)
-- Promise.all with side effects that assume sequential execution
-
-### Edge Cases
-
-- Empty collections (arrays, maps, sets, strings) not handled before access
-- Boundary values: 0, -1, MAX_SAFE_INTEGER, empty string, undefined, NaN
-- Unicode/encoding issues in string operations (multi-byte chars, surrogate pairs)
-- Large inputs causing stack overflow (deep recursion) or memory exhaustion
-
-## Analysis Method
-
-Think step by step. For each changed file, mentally execute the code:
-
-1. **Identify inputs.** What data enters this function? What are its possible types and values, including null, undefined, empty, and malformed?
-2. **Trace control flow.** At each branch point, ask: what happens when the condition is false? What happens when both branches are taken across consecutive calls?
-3. **Check data access safety.** At each property access, array index, or method call, ask: can the receiver be null, undefined, or the wrong type?
-4. **Verify loop correctness.** For each loop: is initialization correct? Does termination trigger at the right time? Does the increment/decrement step cover all cases? Is the loop body idempotent when it needs to be?
-5. **Audit async paths.** For each async call: is there an await? Is the error handled? Could concurrent calls interleave unsafely?
-6. **Self-check.** Review your findings. Remove any that lack concrete evidence from the actual code. If you cannot point to a specific line and explain exactly how the bug manifests, do not report it.
-
-## Severity Calibration
-
-- **critical**: Will crash, corrupt data, or produce wrong results in normal usage — not just edge cases. High confidence required.
-- **high**: Will fail under realistic but less common conditions (specific input patterns, certain timing).
-- **medium**: Edge case that requires specific inputs or unusual conditions to trigger, but is a real bug.
-- **low**: Defensive improvement; unlikely to manifest in practice but worth fixing for robustness.
-- **info**: Observation or suggestion, not a concrete bug.
-
-Only report issues you can point to in the actual code with a specific line number. Do not invent hypothetical scenarios unsupported by the diff. If you're uncertain whether something is a real bug, err on the side of not reporting it.
-
-## Output Quality
-
-- Every finding MUST include the exact file path and line number.
-- Every finding MUST include a concrete, actionable fix suggestion.
-- Descriptions must explain WHY it's a problem (what goes wrong), not just WHAT the issue is (what the code does).
-- **category**: Use the taxonomy headers from this prompt (e.g., "Control Flow Errors", "Null/Undefined Safety", "Error Handling Defects", "Type and Data Errors", "Concurrency and Timing", "Edge Cases").
-- **title**: Concise and specific, under 80 characters. "Missing null check on user.profile" — not "Potential issue with data handling."
-- After drafting all findings, re-read each one and ask: "Is this a real bug with evidence, or am I speculating?" Remove speculative findings.
-- If you find no issues, that is a valid and expected outcome. Do not manufacture findings to appear thorough.
diff --git a/cli/prompts/review/correctness.user.md b/cli/prompts/review/correctness.user.md
deleted file mode 100644
index cfe9262..0000000
--- a/cli/prompts/review/correctness.user.md
+++ /dev/null
@@ -1,18 +0,0 @@
-Review this git diff for correctness issues ONLY.
-
-Apply your analysis method systematically to each changed file:
-
-1. **Read beyond the diff.** Use the surrounding context to understand function signatures, types, and data flow. If a changed line references a variable defined outside the diff, consider what that variable could be.
-2. **Trace inputs through the changes.** Identify every input to the changed code (function parameters, external data, return values from calls) and consider their full range of possible values — including null, undefined, empty, and error cases.
-3. **Walk each execution path.** For every branch, loop, and error handler in the changed code, mentally execute both the happy path and the failure path. Ask: what state is the program in after each path?
-4. **Apply the issue taxonomy.** Systematically check each category: control flow errors, null/undefined safety, error handling defects, type/data errors, concurrency issues, and edge cases.
-5. **Calibrate severity.** Use the severity definitions from your instructions. A bug that only triggers with empty input on a function that always receives validated data is low, not critical.
-6. **Self-check before reporting.** For each potential finding, verify: Can I point to the exact line? Can I describe how it fails? If not, discard it.
-
-Do NOT flag: style issues, naming choices, performance concerns, or security vulnerabilities. Those are handled by separate review passes.
-
-Only report issues with concrete evidence from the code. Do not speculate.
-
-<diff>
-{{DIFF}}
-</diff>
diff --git a/cli/prompts/review/quality-resume.user.md b/cli/prompts/review/quality-resume.user.md
deleted file mode 100644
index b7a4a0f..0000000
--- a/cli/prompts/review/quality-resume.user.md
+++ /dev/null
@@ -1,15 +0,0 @@
-You previously reviewed this diff for correctness and security issues. Now review it for CODE QUALITY issues only.
-
-Apply your analysis method systematically:
-
-1. **Readability** — is the intent clear to a newcomer? Are names specific? Is the abstraction level consistent?
-2. **Complexity** — identify input sizes for loops, count nesting levels and responsibilities per function.
-3. **Duplication** — scan for repeated patterns (5+ lines or 3+ occurrences). Do not flag trivial similarity.
-4. **Error handling** — do messages include context? Are patterns consistent within each module?
-5. **API design** — are signatures consistent? Do public functions have clear contracts?
-6. **Calibrate** — apply the "real burden vs style preference" test. Remove subjective findings.
-
-Do NOT re-report correctness or security findings from previous passes — they are already captured.
-Prioritize findings that will create real maintenance burden over cosmetic suggestions.
-
-If a finding seems to overlap with a previous pass (e.g., poor error handling that is both a quality issue and a correctness bug), only report the quality-specific aspects: the maintenance burden, the readability impact, and the improvement suggestion. Do not duplicate the correctness or security perspective.
diff --git a/cli/prompts/review/quality.system.md b/cli/prompts/review/quality.system.md
deleted file mode 100644
index 21945a4..0000000
--- a/cli/prompts/review/quality.system.md
+++ /dev/null
@@ -1,106 +0,0 @@
-You are a code quality reviewer focused on maintainability. You review code exclusively for issues that increase technical debt, slow down future development, or cause performance problems under real-world usage.
-
-You DO NOT review: correctness bugs or security vulnerabilities. Those are handled by separate specialized review passes.
-
-## Issue Taxonomy
-
-### Performance
-
-- O(n^2) or worse algorithms where O(n) or O(n log n) is straightforward
-- Unnecessary allocations inside loops (creating objects, arrays, or closures per iteration when they could be hoisted)
-- Redundant computation (calculating the same value multiple times in the same scope)
-- Missing early returns or short-circuit evaluation that would avoid expensive work
-- Synchronous blocking operations in async contexts (fs.readFileSync in a request handler)
-- Memory leaks: event listeners not removed, closures retaining large scopes, timers not cleared
-- Unbounded data structures (arrays, maps, caches) that grow without limits or eviction
-- N+1 query patterns (database call inside a loop)
-
-### Complexity
-
-- Functions exceeding ~30 lines or 3+ levels of nesting
-- Cyclomatic complexity > 10 (many branches, early returns, and conditions in one function)
-- God functions: doing multiple unrelated things that should be separate functions
-- Complex boolean expressions that should be extracted into named variables or functions
-- Deeply nested callbacks or promise chains that should use async/await
-- Control flow obscured by exceptions used for non-exceptional conditions
-
-### Duplication
-
-- Copy-pasted logic (5+ lines or repeated 3+ times) that should be extracted into a shared function
-- Repeated patterns across files (same structure with different data) that could be parameterized
-- Near-duplicates: same logic with minor variations that could be unified with a parameter
-- NOTE: 2-3 similar lines are NOT duplication. Do not flag trivial repetition. Look for substantial repeated logic.
-
-### Naming and Clarity
-
-- Misleading names: variable or function name suggests a different type, purpose, or behavior than what it actually does
-- Abbreviations that are not universally understood in the project's domain
-- Boolean variables or functions not named as predicates (is/has/should/can)
-- Generic names (data, result, temp, item, handler) in non-trivial contexts where a specific name would aid comprehension
-- Inconsistent naming conventions within the same module (camelCase mixed with snake_case, plural vs singular for collections)
-
-### Error Handling Quality
-
-- Error messages without actionable context (what operation failed, why, what the caller should do)
-- "Something went wrong" or equivalent messages that provide no diagnostic value
-- Missing error propagation context (not wrapping with additional info when rethrowing)
-- Inconsistent error handling patterns within the same module (some functions throw, others return null, others return Result)
-
-### API Design
-
-- Inconsistent interfaces: similar functions with different parameter signatures or return types
-- Breaking changes to public APIs without versioning or migration path
-- Functions with too many parameters (>4 without an options object)
-- Boolean parameters that control branching (should be separate functions or an enum/options)
-- Missing return type annotations on public functions
-- Functions that return different types depending on input (union returns that callers must narrow)
-
-## Analysis Method
-
-Think step by step. For each changed function or module:
-
-1. **Assess readability.** Read the code as if you are a new team member. Can you understand what it does and why in under 2 minutes? If not, identify what makes it hard: naming, nesting, abstraction level, missing context.
-2. **Check algorithmic complexity.** For each loop, what is the expected input size? Is the algorithm appropriate for that size? An O(n^2) sort on a 10-element array is fine; on a user-provided list is not.
-3. **Look for duplication.** Scan the diff for patterns that appear multiple times. Could they be unified into a shared function with parameters?
-4. **Assess naming.** Does each identifier clearly convey its purpose? Would a reader misunderstand what a variable holds or what a function does based on its name alone?
-5. **Check error paths.** Do error messages include enough context to diagnose the problem without a debugger? Do they tell the caller what to do?
-6. **Self-check: real burden vs style preference.** For each finding, ask: would fixing this measurably improve maintainability for the next developer who touches this code? If the answer is "marginally" or "it's a matter of taste," remove the finding.
-
-## Calibration: Real Burden vs Style Preference
-
-REPORT these (real maintenance burden):
-- Algorithm is O(n^2) and n is unbounded or user-controlled
-- Function is 50+ lines with deeply nested logic and multiple responsibilities
-- Same 10-line block copy-pasted in 3+ places
-- Variable named `data` holds a user authentication token
-- Error message is "Something went wrong" with no context
-- Function takes 6 positional parameters of the same type
-- Boolean parameter that inverts the entire function behavior
-
-DO NOT REPORT these (style preferences — not actionable quality issues):
-- "Could use a ternary instead of if/else"
-- "Consider using const instead of let" (unless actually mutated incorrectly)
-- "This function could be shorter" (if it's clear and under 30 lines)
-- "Consider renaming X to Y" when both names are reasonable and clear
-- Minor formatting inconsistencies (handled by linters, not reviewers)
-- "Could extract this into a separate file" when the module is cohesive and under 300 lines
-- Preferring one iteration method over another (for-of vs forEach vs map) when both are clear
-
-## Severity Calibration
-
-- **critical**: Algorithmic issue causing degradation at production scale (O(n^2) on unbounded input), or memory leak that will crash the process.
-- **high**: Significant complexity or duplication that actively impedes modification — changing one copy without the others will introduce bugs.
-- **medium**: Meaningful readability or maintainability issue that a new team member would struggle with, but won't cause incidents.
-- **low**: Minor improvement that would help but isn't blocking anyone.
-- **info**: Observation or style-adjacent suggestion with minimal impact.
-
-## Output Quality
-
-- Every finding MUST include the exact file path and line number.
-- Every finding MUST include a concrete, actionable suggestion for improvement — not just "this is complex."
-- Descriptions must explain WHY the issue creates maintenance burden, not just WHAT the code does.
-- **category**: Use the taxonomy headers from this prompt (e.g., "Performance", "Complexity", "Duplication", "Naming and Clarity", "Error Handling Quality", "API Design").
-- **title**: Concise and specific, under 80 characters. "O(n^2) user lookup in request handler" — not "Performance could be improved."
-- Severity reflects actual impact on the codebase, not theoretical ideals about clean code.
-- After drafting all findings, re-read each one and ask: "Is this a real maintenance burden, or am I enforcing a personal style preference?" Remove style preferences.
-- If you find no issues, that is a valid and expected outcome. Do not manufacture findings to appear thorough.
diff --git a/cli/prompts/review/quality.user.md b/cli/prompts/review/quality.user.md
deleted file mode 100644
index 96aea1b..0000000
--- a/cli/prompts/review/quality.user.md
+++ /dev/null
@@ -1,20 +0,0 @@
-Review this git diff for CODE QUALITY issues only.
-
-Apply your analysis method systematically to each changed file:
-
-1. **Readability check.** Read each changed function as a newcomer. Is the intent clear? Are names specific enough? Is the abstraction level consistent within the function?
-2. **Complexity check.** For each loop, identify the input size and algorithm. For each function, count nesting levels and responsibilities. Flag functions that do multiple unrelated things.
-3. **Duplication check.** Scan the entire diff for repeated patterns — 5+ lines appearing in multiple places, or the same structure with different data. Only flag substantial repetition, not 2-3 similar lines.
-4. **Error handling check.** Do error messages include context (what failed, why, what to do)? Are error patterns consistent within each module?
-5. **API design check.** Are function signatures consistent? Do public functions have clear contracts (parameter types, return types, error behavior)?
-6. **Calibrate against real impact.** For each potential finding, apply the "real burden vs style preference" test from your instructions. Remove findings that are subjective preferences or marginal improvements.
-
-Do NOT flag correctness bugs or security vulnerabilities. Those are handled by separate review passes.
-
-Prioritize findings that will create real maintenance burden over cosmetic suggestions.
-
-Only report issues with concrete evidence of quality impact. Do not flag style preferences.
-
-<diff>
-{{DIFF}}
-</diff>
diff --git a/cli/prompts/review/security-resume.user.md b/cli/prompts/review/security-resume.user.md
deleted file mode 100644
index 9cc199c..0000000
--- a/cli/prompts/review/security-resume.user.md
+++ /dev/null
@@ -1,15 +0,0 @@
-You previously reviewed this diff for correctness issues. Now review it for SECURITY issues only.
-
-Apply taint analysis systematically to each changed file:
-
-1. **Identify all sources of external input** in the changed code — function parameters from HTTP handlers, environment variables, file reads, CLI arguments, database results, parsed config.
-2. **Trace tainted data** through assignments, function calls, and transformations to security-sensitive sinks (SQL queries, shell commands, file paths, HTML output, eval, redirects, HTTP headers).
-3. **Check for sanitization** between each source and sink. Is it appropriate for the sink type?
-4. **Check trust boundaries.** Does data cross from untrusted to trusted context without validation?
-5. **Apply the full taxonomy** — hardcoded secrets, weak crypto, missing auth, overly permissive config, sensitive data in logs, unsafe deserialization, prototype pollution.
-6. **Verify each finding** — articulate the concrete attack vector. If you cannot describe who attacks, how, and what they gain, discard it.
-
-Do NOT re-report correctness findings from the previous pass — they are already captured.
-Do NOT flag style or performance issues. Those are handled by separate review passes.
-
-If a finding seems to overlap with the correctness pass (e.g., an error handling issue that is both a bug and a security concern), only report the security-specific aspects: the attack vector, the exploitability, and the security impact. Do not duplicate the correctness perspective.
diff --git a/cli/prompts/review/security.system.md b/cli/prompts/review/security.system.md
deleted file mode 100644
index e9d1d8a..0000000
--- a/cli/prompts/review/security.system.md
+++ /dev/null
@@ -1,117 +0,0 @@
-You are a security-focused code reviewer. You review code exclusively for vulnerabilities — weaknesses that could be exploited by an attacker to compromise confidentiality, integrity, or availability.
-
-You DO NOT review: correctness bugs, style issues, code quality, or performance concerns. Those are handled by separate specialized review passes.
-
-## Issue Taxonomy (OWASP Top 10:2025 + CWE Top 25:2024)
-
-### A01: Broken Access Control
-
-- Missing authorization checks on sensitive operations — CWE-862
-- Direct object reference without ownership validation (IDOR) — CWE-639
-- Path traversal via unsanitized file paths — CWE-22
-- CORS misconfiguration allowing unauthorized origins — CWE-346
-- Privilege escalation through parameter manipulation — CWE-269
-- Server-side request forgery (SSRF) via user-controlled URLs — CWE-918
-- Missing function-level access control on API endpoints
-
-### A02: Security Misconfiguration
-
-- Debug mode or verbose errors exposed in production
-- Default credentials or insecure default settings — CWE-1188
-- Unnecessary features, services, or ports enabled
-- Missing security headers (CSP, HSTS, X-Frame-Options, X-Content-Type-Options)
-- Overly permissive file or directory permissions — CWE-732
-- HTTPS not enforced or mixed content allowed
-
-### A03: Software Supply Chain Failures
-
-- Unpinned dependency versions allowing silent upgrades
-- No integrity verification (checksums, signatures) for downloaded artifacts
-- Use of deprecated or known-vulnerable packages
-- Importing from untrusted or typosquattable sources
-
-### A04: Cryptographic Failures
-
-- Weak algorithms: MD5, SHA1 for security purposes, DES, RC4 — CWE-327
-- Hardcoded keys, salts, or initialization vectors — CWE-321
-- Missing encryption for sensitive data in transit or at rest — CWE-311
-- Insufficient key length or improper key management
-- Use of Math.random() or other non-CSPRNG for security-sensitive operations — CWE-338
-- Missing or improper certificate validation
-
-### A05: Injection
-
-- SQL injection via string concatenation or template literals — CWE-89
-- OS command injection via shell execution with user input — CWE-78
-- Template injection (server-side or client-side) — CWE-94
-- Cross-site scripting (XSS) via unsanitized output in HTML/DOM — CWE-79
-- LDAP, XML external entity (XXE), or header injection — CWE-611
-- Regular expression denial of service (ReDoS) — CWE-1333
-- Code injection via eval(), new Function(), or vm.runInContext with untrusted input — CWE-95
-
-### A06: Insecure Design
-
-- Business logic flaws allowing unintended workflows
-- Missing rate limiting on authentication or sensitive operations
-- Lack of defense-in-depth (single layer of validation)
-- Enumeration vectors (user existence, valid IDs via timing or error differences)
-
-### A07: Authentication Failures
-
-- Weak password policies or missing credential validation
-- Session fixation or improper session invalidation — CWE-384
-- Missing multi-factor authentication for privileged operations
-- Insecure token storage (localStorage for auth tokens, tokens in URLs)
-- Timing attacks on authentication comparisons (non-constant-time compare) — CWE-208
-- JWT vulnerabilities (none algorithm, missing expiry, weak signing)
-
-### A08: Software and Data Integrity Failures
-
-- Unsafe deserialization of untrusted data — CWE-502
-- Missing signature verification on updates, webhooks, or data imports
-- Prototype pollution in JavaScript — CWE-1321
-- Mass assignment / over-posting without allowlists
-
-### A09: Security Logging and Alerting Failures
-
-- Sensitive data written to logs (passwords, tokens, PII, credit cards) — CWE-532
-- Missing audit logging for authentication and authorization events
-- Log injection via unsanitized user input in log messages — CWE-117
-
-### A10: Mishandling of Exceptional Conditions (new in 2025)
-
-- Error responses revealing internal system details (stack traces, paths, versions)
-- Failing open: granting access when an error occurs instead of denying — CWE-636
-- Uncaught exceptions that bypass security controls (auth, validation, rate limiting)
-- Resource exhaustion from unhandled edge cases (unbounded allocations, infinite loops)
-
-## Analysis Method (Taint Analysis Framework)
-
-Think step by step. For each code change, perform source-sink-sanitizer analysis:
-
-1. **Identify sources.** Where does external or user-controlled input enter? Look for: HTTP request parameters, headers, and body; environment variables; file reads; database query results; CLI arguments; message queue payloads; URL parameters; cookie values.
-2. **Trace flow.** Follow each source through variable assignments, function calls, transformations, and returns. Track whether the taint is preserved or eliminated at each step. Pay special attention to data that crosses function or module boundaries.
-3. **Identify sinks.** Where is the data consumed in a security-sensitive way? Look for: SQL queries, shell commands, HTML/DOM output, file system paths, eval/Function constructors, HTTP redirects, response headers, deserialization calls, crypto operations.
-4. **Check sanitizers.** Is the data validated, escaped, or transformed before reaching the sink? Is the sanitization appropriate for the specific sink type? (HTML escaping doesn't prevent SQL injection; URL encoding doesn't prevent command injection.)
-5. **Check trust boundaries.** Does data cross from untrusted to trusted context without validation? Common trust boundaries: client→server, user input→database query, external API→internal processing, config file→runtime behavior.
-6. **Self-check.** For each finding, describe the specific attack vector: who is the attacker, what input do they control, what is the exploit, and what is the impact? If you cannot articulate a concrete attack, do not report the finding.
-
-## Severity Calibration
-
-- **critical**: Exploitable by an unauthenticated external attacker. Impact: remote code execution, full data breach, complete authentication bypass, or privilege escalation to admin.
-- **high**: Exploitable with some preconditions (authenticated user, specific configuration). Impact: significant data exposure, horizontal privilege escalation, or persistent XSS.
-- **medium**: Requires authenticated access, specific configuration, or uncommon conditions. Impact: limited data exposure, information disclosure, or denial of service.
-- **low**: Defense-in-depth improvement. No direct exploit path from the code alone, but weakens the security posture.
-- **info**: Security best practice suggestion. Not a vulnerability.
-
-Do NOT flag theoretical vulnerabilities without a concrete attack path supported by the code. "This could be insecure" is not a finding — you must explain who attacks, how, and what they gain.
-
-## Output Quality
-
-- Every finding MUST include the exact file path and line number.
-- Every finding MUST describe the attack vector: what input does the attacker control, how does it reach the sink, and what is the impact?
-- Every finding MUST include a concrete remediation (parameterized query, escaping function, validation check — not just "sanitize the input").
-- **category**: Use the taxonomy headers from this prompt (e.g., "A01: Broken Access Control", "A05: Injection", "A04: Cryptographic Failures").
-- **title**: Concise and specific, under 80 characters. "SQL injection in getUserById query parameter" — not "Possible security concern."
-- After drafting all findings, re-read each one and ask: "Could I write a proof-of-concept exploit based on this description?" If not, strengthen the evidence or remove the finding.
-- If you find no vulnerabilities, that is a valid and expected outcome. Do not manufacture findings to appear thorough.
diff --git a/cli/prompts/review/security.user.md b/cli/prompts/review/security.user.md
deleted file mode 100644
index f0d1773..0000000
--- a/cli/prompts/review/security.user.md
+++ /dev/null
@@ -1,18 +0,0 @@
-Review this git diff for SECURITY issues only.
-
-Apply taint analysis systematically to each changed file:
-
-1. **Identify all sources of external input.** In the changed code, find every place where user-controlled or external data enters: function parameters from HTTP handlers, environment variables, file reads, CLI arguments, database results, parsed config. Mark each as a taint source.
-2. **Trace tainted data through the diff.** Follow each source through assignments, function calls, string operations, and returns. Does it reach a security-sensitive sink (SQL query, shell command, file path, HTML output, eval, redirect, HTTP header)?
-3. **Check for sanitization.** Between each source and sink, is the data validated, escaped, or constrained? Is the sanitization appropriate for the sink type?
-4. **Check trust boundaries.** Does data cross from an untrusted to a trusted context (client→server, user→database, external→internal) without validation?
-5. **Apply the full taxonomy.** Beyond taint analysis, check for: hardcoded secrets, weak crypto, missing auth checks, overly permissive configurations, sensitive data in logs, unsafe deserialization, prototype pollution.
-6. **Verify each finding.** For every potential issue, articulate the concrete attack: who is the attacker, what do they control, how do they exploit it, and what do they gain? If you cannot answer all four, discard the finding.
-
-Do NOT flag correctness bugs, style issues, or performance concerns. Those are handled by separate review passes.
-
-Only report vulnerabilities with a concrete attack path. Do not speculate.
-
-<diff>
-{{DIFF}}
-</diff>
diff --git a/cli/src/commands/container/down.ts b/cli/src/commands/container/down.ts
new file mode 100644
index 0000000..64da092
--- /dev/null
+++ b/cli/src/commands/container/down.ts
@@ -0,0 +1,31 @@
+import chalk from "chalk";
+import type { Command } from "commander";
+import { isInsideContainer } from "../../utils/context.js";
+import { dockerStop, resolveContainer } from "../../utils/docker.js";
+
+export function registerContainerDownCommand(parent: Command): void {
+	parent
+		.command("down [name]")
+		.description("Stop a running CodeForge devcontainer")
+		.action(async (name?: string) => {
+			if (isInsideContainer()) {
+				console.error(
+					"Already inside a container. This command runs on the host.",
+				);
+				process.exit(1);
+			}
+
+			try {
+				const container = await resolveContainer(name);
+				console.log(
+					`${chalk.blue("▶")} Stopping container ${container.name}...`,
+				);
+				await dockerStop(container.id);
+				console.log(`${chalk.green("✓")} Stopped ${container.name}`);
+			} catch (err) {
+				const message = err instanceof Error ? err.message : String(err);
+				console.error(`${chalk.red("✗")} ${message}`);
+				process.exit(1);
+			}
+		});
+}
diff --git a/cli/src/commands/container/exec.ts b/cli/src/commands/container/exec.ts
new file mode 100644
index 0000000..ae8ab66
--- /dev/null
+++ b/cli/src/commands/container/exec.ts
@@ -0,0 +1,42 @@
+import chalk from "chalk";
+import type { Command } from "commander";
+import { isInsideContainer } from "../../utils/context.js";
+import { dockerExec, resolveContainer } from "../../utils/docker.js";
+
+export function registerContainerExecCommand(parent: Command): void {
+	parent
+		.command("exec [name]")
+		.description("Execute a command inside a running devcontainer")
+		.allowUnknownOption(true)
+		.allowExcessArguments(true)
+		.action(
+			async (
+				name: string | undefined,
+				_options: unknown,
+				_command: Command,
+			) => {
+				if (isInsideContainer()) {
+					console.error("Already inside a container.");
+					process.exit(1);
+				}
+
+				const dashDashIndex = process.argv.indexOf("--");
+				if (dashDashIndex === -1 || dashDashIndex === process.argv.length - 1) {
+					console.error(
+						"Usage: codeforge container exec [name] -- <command...>",
+					);
+					process.exit(1);
+				}
+				const cmd = process.argv.slice(dashDashIndex + 1);
+
+				try {
+					const container = await resolveContainer(name);
+					await dockerExec(container.id, cmd);
+				} catch (err) {
+					const message = err instanceof Error ? err.message : String(err);
+					console.error(`${chalk.red("✗")} ${message}`);
+					process.exit(1);
+				}
+			},
+		);
+}
diff --git a/cli/src/commands/container/ls.ts b/cli/src/commands/container/ls.ts
new file mode 100644
index 0000000..3cf4a09
--- /dev/null
+++ b/cli/src/commands/container/ls.ts
@@ -0,0 +1,51 @@
+import chalk from "chalk";
+import type { Command } from "commander";
+import { basename } from "path";
+import { isDockerAvailable, listDevcontainers } from "../../utils/docker.js";
+
+export function registerContainerLsCommand(parent: Command): void {
+	parent
+		.command("ls")
+		.description("List running CodeForge devcontainers")
+		.action(async () => {
+			if (!isDockerAvailable()) {
+				console.error(
+					"Docker is not available. Install Docker Desktop to manage containers.",
+				);
+				process.exit(1);
+			}
+
+			try {
+				const containers = await listDevcontainers();
+				if (containers.length === 0) {
+					console.log("No running CodeForge devcontainers found.");
+					return;
+				}
+
+				console.log(
+					chalk.bold("NAME".padEnd(25)) +
+						chalk.bold("STATUS".padEnd(15)) +
+						chalk.bold("WORKSPACE".padEnd(40)) +
+						chalk.bold("PORTS"),
+				);
+				console.log("─".repeat(90));
+
+				for (const c of containers) {
+					const name = basename(c.workspacePath);
+					const statusColor = c.status.includes("Up")
+						? chalk.green
+						: chalk.yellow;
+					console.log(
+						name.padEnd(25) +
+							statusColor(c.status.padEnd(15)) +
+							c.workspacePath.padEnd(40) +
+							(c.ports || "—"),
+					);
+				}
+			} catch (err) {
+				const message = err instanceof Error ? err.message : String(err);
+				console.error(`${chalk.red("✗")} ${message}`);
+				process.exit(1);
+			}
+		});
+}
diff --git a/cli/src/commands/container/rebuild.ts b/cli/src/commands/container/rebuild.ts
new file mode 100644
index 0000000..fa24ccb
--- /dev/null
+++ b/cli/src/commands/container/rebuild.ts
@@ -0,0 +1,44 @@
+import chalk from "chalk";
+import type { Command } from "commander";
+import { isInsideContainer } from "../../utils/context.js";
+import {
+	devcontainerRebuild,
+	findWorkspacePath,
+} from "../../utils/devcontainer.js";
+
+export function registerContainerRebuildCommand(parent: Command): void {
+	parent
+		.command("rebuild [workspace-path]")
+		.description("Rebuild a CodeForge devcontainer")
+		.action(async (workspacePath?: string) => {
+			if (isInsideContainer()) {
+				console.error(
+					"Already inside a container. This command runs on the host.",
+				);
+				process.exit(1);
+			}
+
+			const resolved = workspacePath || findWorkspacePath();
+			if (!resolved) {
+				console.error(
+					"Could not find a .devcontainer/devcontainer.json in the current directory tree.",
+				);
+				console.error(
+					"Provide a workspace path: codeforge container rebuild <path>",
+				);
+				process.exit(1);
+			}
+
+			try {
+				console.log(
+					`${chalk.blue("▶")} Rebuilding devcontainer at ${resolved}...`,
+				);
+				await devcontainerRebuild(resolved);
+				console.log(`${chalk.green("✓")} Devcontainer rebuilt`);
+			} catch (err) {
+				const message = err instanceof Error ? err.message : String(err);
+				console.error(`${chalk.red("✗")} Failed to rebuild: ${message}`);
+				process.exit(1);
+			}
+		});
+}
diff --git a/cli/src/commands/container/shell.ts b/cli/src/commands/container/shell.ts
new file mode 100644
index 0000000..3111ac7
--- /dev/null
+++ b/cli/src/commands/container/shell.ts
@@ -0,0 +1,29 @@
+import chalk from "chalk";
+import type { Command } from "commander";
+import { isInsideContainer } from "../../utils/context.js";
+import { dockerExec, resolveContainer } from "../../utils/docker.js";
+
+export function registerContainerShellCommand(parent: Command): void {
+	parent
+		.command("shell [name]")
+		.description("Open an interactive shell in a running devcontainer")
+		.action(async (name?: string) => {
+			if (isInsideContainer()) {
+				console.error("Already inside a container.");
+				process.exit(1);
+			}
+
+			try {
+				const container = await resolveContainer(name);
+				try {
+					await dockerExec(container.id, ["/bin/zsh"], { interactive: true });
+				} catch {
+					await dockerExec(container.id, ["/bin/bash"], { interactive: true });
+				}
+			} catch (err) {
+				const message = err instanceof Error ? err.message : String(err);
+				console.error(`${chalk.red("✗")} ${message}`);
+				process.exit(1);
+			}
+		});
+}
diff --git a/cli/src/commands/container/up.ts b/cli/src/commands/container/up.ts
new file mode 100644
index 0000000..762825e
--- /dev/null
+++ b/cli/src/commands/container/up.ts
@@ -0,0 +1,41 @@
+import chalk from "chalk";
+import type { Command } from "commander";
+import { isInsideContainer } from "../../utils/context.js";
+import { devcontainerUp, findWorkspacePath } from "../../utils/devcontainer.js";
+
+export function registerContainerUpCommand(parent: Command): void {
+	parent
+		.command("up [workspace-path]")
+		.description("Start a CodeForge devcontainer")
+		.action(async (workspacePath?: string) => {
+			if (isInsideContainer()) {
+				console.error(
+					"Already inside a container. This command runs on the host.",
+				);
+				process.exit(1);
+			}
+
+			const resolved = workspacePath || findWorkspacePath();
+			if (!resolved) {
+				console.error(
+					"Could not find a .devcontainer/devcontainer.json in the current directory tree.",
+				);
+				console.error(
+					"Provide a workspace path: codeforge container up <path>",
+				);
+				process.exit(1);
+			}
+
+			try {
+				console.log(
+					`${chalk.blue("▶")} Starting devcontainer at ${resolved}...`,
+				);
+				await devcontainerUp(resolved);
+				console.log(`${chalk.green("✓")} Devcontainer started`);
+			} catch (err) {
+				const message = err instanceof Error ? err.message : String(err);
+				console.error(`${chalk.red("✗")} Failed to start: ${message}`);
+				process.exit(1);
+			}
+		});
+}
diff --git a/cli/src/commands/index/build.ts b/cli/src/commands/index/build.ts
new file mode 100644
index 0000000..161d7f8
--- /dev/null
+++ b/cli/src/commands/index/build.ts
@@ -0,0 +1,171 @@
+import chalk from "chalk";
+import type { Command } from "commander";
+import { existsSync, mkdirSync } from "fs";
+import { relative, resolve } from "path";
+import {
+	closeDatabase,
+	deleteFileAndSymbols,
+	insertFiles,
+	insertSymbols,
+	openDatabase,
+	rebuildFts,
+	upsertFolders,
+} from "../../indexer/db.js";
+import { checkSgInstalled, extractSymbols } from "../../indexer/extractor.js";
+import { extractFolderDocs } from "../../indexer/folders.js";
+import {
+	collectDirectories,
+	getLanguageForExtension,
+	hashFileContent,
+	scanDirectory,
+} from "../../indexer/scanner.js";
+import { formatBuildJson } from "../../output/index-json.js";
+import { formatBuildSummary } from "../../output/index-text.js";
+import type { IndexedFile } from "../../schemas/index.js";
+
+interface BuildCommandOptions {
+	format: string;
+	color?: boolean;
+}
+
+function findWorkspaceRoot(): string | null {
+	let dir = process.cwd();
+	while (true) {
+		if (existsSync(resolve(dir, ".codeforge"))) return dir;
+		const parent = resolve(dir, "..");
+		if (parent === dir) return null;
+		dir = parent;
+	}
+}
+
+export function registerIndexBuildCommand(parent: Command): void {
+	parent
+		.command("build")
+		.description("Build or incrementally update the codebase symbol index")
+		.argument("[path]", "Target directory (defaults to workspace root)")
+		.option("-f, --format <format>", "Output format: text|json", "text")
+		.option("--no-color", "Disable colored output")
+		.action(async (path: string | undefined, options: BuildCommandOptions) => {
+			try {
+				if (!options.color) chalk.level = 0;
+
+				const start = Date.now();
+				const workspaceRoot = findWorkspaceRoot();
+				if (!workspaceRoot) {
+					console.error(
+						"Error: No .codeforge directory found. Are you in a CodeForge workspace?",
+					);
+					process.exit(1);
+				}
+
+				const targetPath = path ? resolve(process.cwd(), path) : workspaceRoot;
+				const dataDir = resolve(workspaceRoot, ".codeforge", "data");
+				mkdirSync(dataDir, { recursive: true });
+
+				const dbPath = resolve(dataDir, "code-index.db");
+
+				console.error(chalk.dim("Checking ast-grep installation..."));
+				const sgInstalled = await checkSgInstalled();
+				if (!sgInstalled) {
+					console.error(
+						"Error: ast-grep (sg) is not installed. Install it with: npm i -g @ast-grep/cli",
+					);
+					process.exit(1);
+				}
+
+				console.error(chalk.dim("Scanning files..."));
+				const db = openDatabase(dbPath);
+				const scanned = await scanDirectory(targetPath, db, workspaceRoot);
+
+				const filesToProcess = [...scanned.newFiles, ...scanned.changedFiles];
+				let totalSymbols = 0;
+
+				if (filesToProcess.length > 0) {
+					// Group files by language
+					const byLang = new Map<string, string[]>();
+					for (const relPath of filesToProcess) {
+						const ext = "." + (relPath.split(".").pop() ?? "");
+						const lang = getLanguageForExtension(ext);
+						if (lang) {
+							const group = byLang.get(lang) ?? [];
+							group.push(relPath);
+							byLang.set(lang, group);
+						}
+					}
+
+					// Delete old data for changed + deleted files
+					for (const file of [
+						...scanned.changedFiles,
+						...scanned.deletedFiles,
+					]) {
+						deleteFileAndSymbols(db, file);
+					}
+
+					// Insert file records first (symbols have FK to files)
+					const fileRecords: IndexedFile[] = [];
+					for (const relPath of filesToProcess) {
+						const absPath = resolve(workspaceRoot, relPath);
+						const ext = "." + (relPath.split(".").pop() ?? "");
+						const lang = getLanguageForExtension(ext) ?? "unknown";
+						const hash = await hashFileContent(absPath);
+						const content = await Bun.file(absPath).text();
+						const lineCount = content.split("\n").length;
+						const size = Buffer.byteLength(content, "utf-8");
+						fileRecords.push({
+							path: relPath,
+							hash,
+							size,
+							language: lang,
+							lineCount,
+							lastIndexed: new Date()
+								.toISOString()
+								.replace("T", " ")
+								.substring(0, 19),
+						});
+					}
+					insertFiles(db, fileRecords);
+
+					console.error(chalk.dim("Extracting symbols..."));
+					for (const [lang, relPaths] of byLang) {
+						const absPaths = relPaths.map((r) => resolve(workspaceRoot, r));
+						const symbols = await extractSymbols(absPaths, lang);
+						if (symbols.length > 0) {
+							const remapped = symbols.map((s: (typeof symbols)[number]) => ({
+								...s,
+								filePath: relative(workspaceRoot, s.filePath),
+							}));
+							insertSymbols(db, remapped);
+							totalSymbols += symbols.length;
+						}
+					}
+				} else {
+					// Still handle deletions
+					for (const file of scanned.deletedFiles) {
+						deleteFileAndSymbols(db, file);
+					}
+				}
+
+				console.error(chalk.dim("Updating folder index..."));
+				const directories = await collectDirectories(targetPath, workspaceRoot);
+				const folderDocs = await extractFolderDocs(directories, workspaceRoot);
+				upsertFolders(db, folderDocs);
+
+				console.error(chalk.dim("Rebuilding search index..."));
+				rebuildFts(db);
+				closeDatabase(db);
+
+				const durationMs = Date.now() - start;
+				const buildResult = { scanned, symbolCount: totalSymbols, durationMs };
+
+				if (options.format === "json") {
+					console.log(formatBuildJson(buildResult));
+				} else {
+					console.log(formatBuildSummary(buildResult));
+				}
+			} catch (err) {
+				const message = err instanceof Error ? err.message : String(err);
+				console.error(`Error: ${message}`);
+				process.exit(1);
+			}
+		});
+}
diff --git a/cli/src/commands/index/clean.ts b/cli/src/commands/index/clean.ts
new file mode 100644
index 0000000..b270f20
--- /dev/null
+++ b/cli/src/commands/index/clean.ts
@@ -0,0 +1,58 @@
+import chalk from "chalk";
+import type { Command } from "commander";
+import { existsSync, unlinkSync } from "fs";
+import { resolve } from "path";
+
+function findWorkspaceRoot(): string | null {
+	let dir = process.cwd();
+	while (true) {
+		if (existsSync(resolve(dir, ".codeforge"))) return dir;
+		const parent = resolve(dir, "..");
+		if (parent === dir) return null;
+		dir = parent;
+	}
+}
+
+export function registerIndexCleanCommand(parent: Command): void {
+	parent
+		.command("clean")
+		.description("Remove the codebase index database")
+		.option("--no-color", "Disable colored output")
+		.action(async (options: { color?: boolean }) => {
+			try {
+				if (!options.color) chalk.level = 0;
+
+				const workspaceRoot = findWorkspaceRoot();
+				if (!workspaceRoot) {
+					console.error(
+						"Error: No .codeforge directory found. Are you in a CodeForge workspace?",
+					);
+					process.exit(1);
+				}
+
+				const dbPath = resolve(
+					workspaceRoot,
+					".codeforge",
+					"data",
+					"code-index.db",
+				);
+				if (!existsSync(dbPath)) {
+					console.log("No index database found.");
+					return;
+				}
+
+				// Remove main DB and WAL/SHM files
+				unlinkSync(dbPath);
+				const walPath = dbPath + "-wal";
+				const shmPath = dbPath + "-shm";
+				if (existsSync(walPath)) unlinkSync(walPath);
+				if (existsSync(shmPath)) unlinkSync(shmPath);
+
+				console.log("Index database removed.");
+			} catch (err) {
+				const message = err instanceof Error ? err.message : String(err);
+				console.error(`Error: ${message}`);
+				process.exit(1);
+			}
+		});
+}
diff --git a/cli/src/commands/index/search.ts b/cli/src/commands/index/search.ts
new file mode 100644
index 0000000..58436af
--- /dev/null
+++ b/cli/src/commands/index/search.ts
@@ -0,0 +1,204 @@
+import chalk from "chalk";
+import type { Command } from "commander";
+import { existsSync, mkdirSync } from "fs";
+import { relative, resolve } from "path";
+import { createInterface } from "readline";
+import {
+	closeDatabase,
+	deleteFileAndSymbols,
+	insertFiles,
+	insertSymbols,
+	openDatabase,
+	rebuildFts,
+	searchSymbols,
+	upsertFolders,
+} from "../../indexer/db.js";
+import { checkSgInstalled, extractSymbols } from "../../indexer/extractor.js";
+import { extractFolderDocs } from "../../indexer/folders.js";
+import {
+	collectDirectories,
+	getLanguageForExtension,
+	hashFileContent,
+	scanDirectory,
+} from "../../indexer/scanner.js";
+import { formatSearchJson } from "../../output/index-json.js";
+import {
+	formatBuildSummary,
+	formatSearchText,
+} from "../../output/index-text.js";
+import type {
+	IndexedFile,
+	SearchHit,
+	SymbolKind,
+} from "../../schemas/index.js";
+
+interface SearchCommandOptions {
+	format: string;
+	color?: boolean;
+	limit: string;
+	kind?: string;
+}
+
+function findWorkspaceRoot(): string | null {
+	let dir = process.cwd();
+	while (true) {
+		if (existsSync(resolve(dir, ".codeforge"))) return dir;
+		const parent = resolve(dir, "..");
+		if (parent === dir) return null;
+		dir = parent;
+	}
+}
+
+async function autoBuild(workspaceRoot: string, dbPath: string): Promise<void> {
+	const dataDir = resolve(workspaceRoot, ".codeforge", "data");
+	mkdirSync(dataDir, { recursive: true });
+
+	const sgInstalled = await checkSgInstalled();
+	if (!sgInstalled) {
+		console.error(
+			"Error: ast-grep (sg) is not installed. Install it with: npm i -g @ast-grep/cli",
+		);
+		process.exit(1);
+	}
+
+	console.error(chalk.dim("Building index..."));
+	const start = Date.now();
+	const db = openDatabase(dbPath);
+	const scanned = await scanDirectory(workspaceRoot, db);
+
+	const filesToProcess = [...scanned.newFiles, ...scanned.changedFiles];
+	let totalSymbols = 0;
+
+	for (const file of [...scanned.changedFiles, ...scanned.deletedFiles]) {
+		deleteFileAndSymbols(db, file);
+	}
+
+	// Insert file records first (symbols have FK to files)
+	const fileRecords: IndexedFile[] = [];
+	for (const relPath of filesToProcess) {
+		const absPath = resolve(workspaceRoot, relPath);
+		const ext = "." + (relPath.split(".").pop() ?? "");
+		const lang = getLanguageForExtension(ext) ?? "unknown";
+		const hash = await hashFileContent(absPath);
+		const content = await Bun.file(absPath).text();
+		const lineCount = content.split("\n").length;
+		const size = Buffer.byteLength(content, "utf-8");
+		fileRecords.push({
+			path: relPath,
+			hash,
+			size,
+			language: lang,
+			lineCount,
+			lastIndexed: new Date().toISOString().replace("T", " ").substring(0, 19),
+		});
+	}
+	insertFiles(db, fileRecords);
+
+	// Group by language and extract symbols
+	const byLang = new Map<string, string[]>();
+	for (const relPath of filesToProcess) {
+		const ext = "." + (relPath.split(".").pop() ?? "");
+		const lang = getLanguageForExtension(ext);
+		if (lang) {
+			const group = byLang.get(lang) ?? [];
+			group.push(relPath);
+			byLang.set(lang, group);
+		}
+	}
+
+	for (const [lang, relPaths] of byLang) {
+		const absPaths = relPaths.map((r) => resolve(workspaceRoot, r));
+		const symbols = await extractSymbols(absPaths, lang);
+		if (symbols.length > 0) {
+			const remapped = symbols.map((s: (typeof symbols)[number]) => ({
+				...s,
+				filePath: relative(workspaceRoot, s.filePath),
+			}));
+			insertSymbols(db, remapped);
+			totalSymbols += symbols.length;
+		}
+	}
+
+	const directories = await collectDirectories(workspaceRoot);
+	const folderDocs = await extractFolderDocs(directories, workspaceRoot);
+	upsertFolders(db, folderDocs);
+	rebuildFts(db);
+	closeDatabase(db);
+
+	const durationMs = Date.now() - start;
+	console.error(
+		formatBuildSummary({ scanned, symbolCount: totalSymbols, durationMs }),
+	);
+}
+
+export function registerIndexSearchCommand(parent: Command): void {
+	parent
+		.command("search")
+		.description("Search for symbols in the codebase index")
+		.argument("<query>", "Search query (FTS5 syntax)")
+		.option("-f, --format <format>", "Output format: text|json", "text")
+		.option("--no-color", "Disable colored output")
+		.option("-n, --limit <count>", "Maximum number of results", "50")
+		.option("-k, --kind <kind>", "Filter by symbol kind")
+		.action(async (query: string, options: SearchCommandOptions) => {
+			try {
+				if (!options.color) chalk.level = 0;
+
+				const workspaceRoot = findWorkspaceRoot();
+				if (!workspaceRoot) {
+					console.error(
+						"Error: No .codeforge directory found. Are you in a CodeForge workspace?",
+					);
+					process.exit(1);
+				}
+
+				const dbPath = resolve(
+					workspaceRoot,
+					".codeforge",
+					"data",
+					"code-index.db",
+				);
+
+				if (!existsSync(dbPath)) {
+					const rl = createInterface({
+						input: process.stdin,
+						output: process.stderr,
+					});
+					const answer = await new Promise<string>((resolve) =>
+						rl.question("No index found. Build one now? (y/n) ", resolve),
+					);
+					rl.close();
+
+					if (answer.toLowerCase() === "y") {
+						await autoBuild(workspaceRoot, dbPath);
+					} else {
+						console.error(
+							"Run `codeforge index build` to create an index first.",
+						);
+						process.exit(0);
+					}
+				}
+
+				const db = openDatabase(dbPath);
+				const limit = parseInt(options.limit, 10);
+				let hits: SearchHit[] = searchSymbols(db, query, limit);
+
+				if (options.kind) {
+					const kind = options.kind as SymbolKind;
+					hits = hits.filter((h) => h.symbol.kind === kind);
+				}
+
+				if (options.format === "json") {
+					console.log(formatSearchJson(hits));
+				} else {
+					console.log(formatSearchText(hits, { noColor: !options.color }));
+				}
+
+				closeDatabase(db);
+			} catch (err) {
+				const message = err instanceof Error ? err.message : String(err);
+				console.error(`Error: ${message}`);
+				process.exit(1);
+			}
+		});
+}
diff --git a/cli/src/commands/index/show.ts b/cli/src/commands/index/show.ts
new file mode 100644
index 0000000..4dcffd9
--- /dev/null
+++ b/cli/src/commands/index/show.ts
@@ -0,0 +1,82 @@
+import chalk from "chalk";
+import type { Command } from "commander";
+import { existsSync } from "fs";
+import { relative, resolve } from "path";
+import {
+	closeDatabase,
+	getFileSymbols,
+	openDatabase,
+} from "../../indexer/db.js";
+import { formatShowJson } from "../../output/index-json.js";
+import { formatShowText } from "../../output/index-text.js";
+
+interface ShowCommandOptions {
+	format: string;
+	color?: boolean;
+}
+
+function findWorkspaceRoot(): string | null {
+	let dir = process.cwd();
+	while (true) {
+		if (existsSync(resolve(dir, ".codeforge"))) return dir;
+		const parent = resolve(dir, "..");
+		if (parent === dir) return null;
+		dir = parent;
+	}
+}
+
+export function registerIndexShowCommand(parent: Command): void {
+	parent
+		.command("show")
+		.description("Show all symbols in a specific file")
+		.argument("<file>", "File path to inspect")
+		.option("-f, --format <format>", "Output format: text|json", "text")
+		.option("--no-color", "Disable colored output")
+		.action(async (file: string, options: ShowCommandOptions) => {
+			try {
+				if (!options.color) chalk.level = 0;
+
+				const workspaceRoot = findWorkspaceRoot();
+				if (!workspaceRoot) {
+					console.error(
+						"Error: No .codeforge directory found. Are you in a CodeForge workspace?",
+					);
+					process.exit(1);
+				}
+
+				const dbPath = resolve(
+					workspaceRoot,
+					".codeforge",
+					"data",
+					"code-index.db",
+				);
+				if (!existsSync(dbPath)) {
+					console.error("No index found. Run `codeforge index build` first.");
+					process.exit(1);
+				}
+
+				// Resolve file path relative to workspace root
+				const absoluteFile = resolve(process.cwd(), file);
+				const relativePath = relative(workspaceRoot, absoluteFile);
+
+				const db = openDatabase(dbPath);
+				const symbols = getFileSymbols(db, relativePath);
+
+				if (symbols.length === 0) {
+					console.log(`No symbols found for ${relativePath}`);
+				} else if (options.format === "json") {
+					console.log(formatShowJson(relativePath, symbols));
+				} else {
+					console.log(
+						formatShowText(relativePath, symbols, { noColor: !options.color }),
+					);
+				}
+
+				closeDatabase(db);
+			} catch (err) {
+				const message = err instanceof Error ? err.message : String(err);
+				console.error(`Error: ${message}`);
+				process.exit(1);
+			}
+		});
+}
diff --git a/cli/src/commands/index/stats.ts b/cli/src/commands/index/stats.ts
new file mode 100644
index 0000000..9fb338e
--- /dev/null
+++ b/cli/src/commands/index/stats.ts
@@ -0,0 +1,69 @@
+import chalk from "chalk";
+import type { Command } from "commander";
+import { existsSync } from "fs";
+import { resolve } from "path";
+import { closeDatabase, getStats, openDatabase } from "../../indexer/db.js";
+import { formatStatsJson } from "../../output/index-json.js";
+import { formatStatsText } from "../../output/index-text.js";
+
+interface StatsCommandOptions {
+	format: string;
+	color?: boolean;
+}
+
+function findWorkspaceRoot(): string | null {
+	let dir = process.cwd();
+	while (true) {
+		if (existsSync(resolve(dir, ".codeforge"))) return dir;
+		const parent = resolve(dir, "..");
+		if (parent === dir) return null;
+		dir = parent;
+	}
+}
+
+export function registerIndexStatsCommand(parent: Command): void {
+	parent
+		.command("stats")
+		.description("Show codebase index statistics")
+		.option("-f, --format <format>", "Output format: text|json", "text")
+		.option("--no-color", "Disable colored output")
+		.action(async (options: StatsCommandOptions) => {
+			try {
+				if (!options.color) chalk.level = 0;
+
+				const workspaceRoot = findWorkspaceRoot();
+				if (!workspaceRoot) {
+					console.error(
+						"Error: No .codeforge directory found. Are you in a CodeForge workspace?",
+					);
+					process.exit(1);
+				}
+
+				const dbPath = resolve(
+					workspaceRoot,
+					".codeforge",
+					"data",
+					"code-index.db",
+				);
+				if (!existsSync(dbPath)) {
+					console.error("No index found. Run `codeforge index build` first.");
+					process.exit(1);
+				}
+
+				const db = openDatabase(dbPath);
+				const stats = getStats(db, dbPath);
+
+				if (options.format === "json") {
+					console.log(formatStatsJson(stats));
+				} else {
+					console.log(formatStatsText(stats, { noColor: !options.color }));
+				}
+
+				closeDatabase(db);
+			} catch (err) {
+				const message = err instanceof Error ? err.message : String(err);
+				console.error(`Error: ${message}`);
+				process.exit(1);
+			}
+		});
+}
diff --git a/cli/src/commands/index/tree.ts b/cli/src/commands/index/tree.ts
new file mode 100644
index 0000000..3c090fa
--- /dev/null
+++ b/cli/src/commands/index/tree.ts
@@ -0,0 +1,160 @@
+import chalk from "chalk";
+import type { Command } from "commander";
+import { existsSync } from "fs";
+import { resolve } from "path";
+import {
+	closeDatabase,
+	getAllFolders,
+	openDatabase,
+} from "../../indexer/db.js";
+import { formatTreeJson } from "../../output/index-json.js";
+import { formatTreeText } from "../../output/index-text.js";
+import type { IndexedFolder, TreeEntry } from "../../schemas/index.js";
+
+interface TreeCommandOptions {
+	format: string;
+	color?: boolean;
+	depth?: string;
+}
+
+function findWorkspaceRoot(): string | null {
+	let dir = process.cwd();
+	while (true) {
+		if (existsSync(resolve(dir, ".codeforge"))) return dir;
+		const parent = resolve(dir, "..");
+		if (parent === dir) return null;
+		dir = parent;
+	}
+}
+
+function buildTree(
+	folders: IndexedFolder[],
+	symbolCounts: Map<string, number>,
+	pathFilter?: string,
+	maxDepth?: number,
+): TreeEntry[] {
+	// Filter folders by path prefix if specified
+	let filtered = folders;
+	if (pathFilter) {
+		filtered = folders.filter(
+			(f) => f.path === pathFilter || f.path.startsWith(pathFilter + "/"),
+		);
+	}
+
+	// Build nested tree from flat folder list
+	const root: TreeEntry[] = [];
+	const nodeMap = new Map<string, TreeEntry>();
+
+	// Sort folders so parents come before children
+	const sorted = [...filtered].sort((a, b) => a.path.localeCompare(b.path));
+
+	for (const folder of sorted) {
+		const entry: TreeEntry = {
+			path: folder.path.split("/").pop() ?? folder.path,
+			type: "folder",
+			description: folder.description ?? undefined,
+			symbolCount: symbolCounts.get(folder.path) ?? 0,
+			children: [],
+		};
+
+		nodeMap.set(folder.path, entry);
+
+		// Find parent
+		const parts = folder.path.split("/");
+		if (parts.length > 1) {
+			const parentPath = parts.slice(0, -1).join("/");
+			const parent = nodeMap.get(parentPath);
+			if (parent) {
+				parent.children!.push(entry);
+				continue;
+			}
+		}
+
+		root.push(entry);
+	}
+
+	// Apply depth limit
+	if (maxDepth !== undefined) {
+		pruneDepth(root, 0, maxDepth);
+	}
+
+	return root;
+}
+
+function pruneDepth(entries: TreeEntry[], current: number, max: number): void {
+	for (const entry of entries) {
+		if (current >= max) {
+			entry.children = undefined;
+		} else if (entry.children) {
+			pruneDepth(entry.children, current + 1, max);
+		}
+	}
+}
+
+export function registerIndexTreeCommand(parent: Command): void {
+	parent
+		.command("tree")
+		.description("Show directory tree with symbol counts")
+		.argument("[path]", "Subtree path to display")
+		.option("-f, --format <format>", "Output format: text|json", "text")
+		.option("--no-color", "Disable colored output")
+		.option("-d, --depth <n>", "Maximum tree depth")
+		.action(async (path: string | undefined, options: TreeCommandOptions) => {
+			try {
+				if (!options.color) chalk.level = 0;
+
+				const workspaceRoot = findWorkspaceRoot();
+				if (!workspaceRoot) {
+					console.error(
+						"Error: No .codeforge directory found. Are you in a CodeForge workspace?",
+					);
+					process.exit(1);
+				}
+
+				const dbPath = resolve(
+					workspaceRoot,
+					".codeforge",
+					"data",
+					"code-index.db",
+				);
+				if (!existsSync(dbPath)) {
+					console.error("No index found. Run `codeforge index build` first.");
+					process.exit(1);
+				}
+
+				const db = openDatabase(dbPath);
+				const folders = getAllFolders(db);
+
+				// Count symbols per folder by prefix-matching file paths
+				const symbolCounts = new Map<string, number>();
+				for (const folder of folders) {
+					const prefix = folder.path.endsWith("/")
+						? folder.path
+						: folder.path + "/";
+					const rows = db
+						.prepare(
+							"SELECT COUNT(*) as cnt FROM symbols WHERE file_path LIKE ? || '%'",
+						)
+						.get(prefix) as { cnt: number };
+					symbolCounts.set(folder.path, rows.cnt);
+				}
+
+				const maxDepth = options.depth
+					? parseInt(options.depth, 10)
+					: undefined;
+				const tree = buildTree(folders, symbolCounts, path, maxDepth);
+
+				if (options.format === "json") {
+					console.log(formatTreeJson(tree));
+				} else {
+					console.log(formatTreeText(tree, { noColor: !options.color }));
+				}
+
+				closeDatabase(db);
+			} catch (err) {
+				const message = err instanceof Error ? err.message : String(err);
+				console.error(`Error: ${message}`);
+				process.exit(1);
+			}
+		});
+}
diff --git a/cli/src/commands/plugin/disable.ts b/cli/src/commands/plugin/disable.ts
index 0528bf7..0b086ae 100644
--- a/cli/src/commands/plugin/disable.ts
+++ b/cli/src/commands/plugin/disable.ts
@@ -1,6 +1,9 @@
 import chalk from "chalk";
 import type { Command } from "commander";
-import { loadInstalledPlugins } from "../../loaders/plugin-loader.js";
+import {
+	findSettingsPaths,
+	loadInstalledPlugins,
+} from "../../loaders/plugin-loader.js";
 import { setPluginEnabled } from "../../loaders/settings-writer.js";
 
 interface PluginDisableOptions {
@@ -28,6 +31,7 @@ export function registerPluginDisableCommand(parent: Command): void {
 					process.exit(1);
 				}
 
+				const paths = findSettingsPaths();
 				const result = await setPluginEnabled(plugin.qualifiedName, false);
 
 				console.log(`${chalk.red("✓")} Disabled ${plugin.qualifiedName}`);
@@ -35,7 +39,7 @@ export function registerPluginDisableCommand(parent: Command): void {
 					console.log("  Updated: ~/.claude/settings.json");
 				}
 				if (result.source) {
-					console.log("  Updated: /workspaces/.codeforge/config/settings.json");
+					console.log(`  Updated: ${paths.source}`);
 				} else {
 					console.log("  Source settings.json not found — deployed copy only");
 				}
diff --git a/cli/src/commands/plugin/enable.ts b/cli/src/commands/plugin/enable.ts
index c16a675..e8e41d9 100644
--- a/cli/src/commands/plugin/enable.ts
+++ b/cli/src/commands/plugin/enable.ts
@@ -1,6 +1,9 @@
 import chalk from "chalk";
 import type { Command } from "commander";
-import { loadInstalledPlugins } from "../../loaders/plugin-loader.js";
+import {
+	findSettingsPaths,
+	loadInstalledPlugins,
+} from "../../loaders/plugin-loader.js";
 import { setPluginEnabled } from "../../loaders/settings-writer.js";
 
 interface PluginEnableOptions {
@@ -28,6 +31,7 @@ export function registerPluginEnableCommand(parent: Command): void {
 					process.exit(1);
 				}
 
+				const paths = findSettingsPaths();
 				const result = await setPluginEnabled(plugin.qualifiedName, true);
 
 				console.log(`${chalk.green("✓")} Enabled ${plugin.qualifiedName}`);
@@ -35,7 +39,7 @@ export function registerPluginEnableCommand(parent: Command): void {
 					console.log("  Updated: ~/.claude/settings.json");
 				}
 				if (result.source) {
-					console.log("  Updated: /workspaces/.codeforge/config/settings.json");
+					console.log(`  Updated: ${paths.source}`);
 				} else {
 					console.log("  Source settings.json not found — deployed copy only");
 				}
diff --git a/cli/src/commands/review/review.ts b/cli/src/commands/review/review.ts
deleted file mode 100644
index a4991ae..0000000
--- a/cli/src/commands/review/review.ts
+++ /dev/null
@@ -1,101 +0,0 @@
-import chalk from "chalk";
-import type { Command } from "commander";
-import { formatReviewJson, formatReviewText } from "../../output/review.js";
-import { detectBaseBranch, runReview } from "../../runners/review-runner.js";
-import type { ReviewScope } from "../../schemas/review.js";
-
-interface ReviewCommandOptions {
-	scope: string;
-	base?: string;
-	include?: string;
-	format: string;
-	color?: boolean;
-	parallel?: boolean;
-	model: string;
-	maxCost?: string;
-	failBelow?: string;
-	passes: string;
-	verbose?: boolean;
-}
-
-export function registerReviewCommand(parent: Command): void {
-	parent
-		.command("review")
-		.description("Multi-pass AI code review of branch changes")
-		.option("-s, --scope <scope>", "Review scope: diff|staged|full", "diff")
-		.option("-b, --base <branch>", "Base branch for diff scope")
-		.option("-i, --include <glob>", "Filter files by glob pattern")
-		.option("-f, --format <format>", "Output format: text|json", "text")
-		.option("--no-color", "Disable colored output")
-		.option(
-			"--parallel",
-			"Run passes concurrently (~3x cost, faster, more diverse)",
-		)
-		.option("-m, --model <model>", "Model for review passes", "sonnet")
-		.option("--max-cost <usd>", "Maximum total USD across all passes")
-		.option(
-			"--fail-below <score>",
-			"Exit code 1 if score below threshold (1-10)",
-		)
-		.option("--passes <count>", "Number of passes: 1|2|3", "3")
-		.option("-v, --verbose", "Show per-pass progress to stderr")
-		.action(async (options: ReviewCommandOptions) => {
-			try {
-				if (!options.color) chalk.level = 0;
-
-				const scope = options.scope as ReviewScope;
-				if (!["diff", "staged", "full"].includes(scope)) {
-					console.error("Error: --scope must be diff, staged, or full");
-					process.exit(1);
-				}
-
-				const passes = parseInt(options.passes, 10) as 1 | 2 | 3;
-				if (![1, 2, 3].includes(passes)) {
-					console.error("Error: --passes must be 1, 2, or 3");
-					process.exit(1);
-				}
-
-				const base = options.base || (await detectBaseBranch());
-				const maxCost = options.maxCost
-					? parseFloat(options.maxCost)
-					: undefined;
-				const failBelow = options.failBelow
-					? parseInt(options.failBelow, 10)
-					: undefined;
-
-				if (failBelow !== undefined && (failBelow < 1 || failBelow > 10)) {
-					console.error("Error: --fail-below must be between 1 and 10");
-					process.exit(1);
-				}
-
-				const result = await runReview({
-					scope,
-					base,
-					include: options.include,
-					parallel: options.parallel ?? false,
-					model: options.model,
-					maxCost,
-					passes,
-					verbose: options.verbose ?? false,
-				});
-
-				if (options.format === "json") {
-					console.log(formatReviewJson(result));
-				} else {
-					console.log(
-						formatReviewText(result, {
-							noColor: !options.color,
-						}),
-					);
-				}
-
-				if (failBelow !== undefined && result.score < failBelow) {
-					process.exit(1);
-				}
-			} catch (err) {
-				const message = err instanceof Error ? err.message : String(err);
-				console.error(`Error: ${message}`);
-				process.exit(1);
-			}
-		});
-}
diff --git a/cli/src/index.ts b/cli/src/index.ts
index af0f15b..18c0e6d 100644
--- a/cli/src/index.ts
+++ b/cli/src/index.ts
@@ -3,6 +3,18 @@
 import { Command } from "commander";
 import { registerConfigApplyCommand } from "./commands/config/apply.js";
 import { registerConfigShowCommand } from "./commands/config/show.js";
+import { registerContainerDownCommand } from "./commands/container/down.js";
+import { registerContainerExecCommand } from "./commands/container/exec.js";
+import { registerContainerLsCommand } from "./commands/container/ls.js";
+import { registerContainerRebuildCommand } from "./commands/container/rebuild.js";
+import { registerContainerShellCommand } from "./commands/container/shell.js";
+import { registerContainerUpCommand } from "./commands/container/up.js";
+import { registerIndexBuildCommand } from "./commands/index/build.js";
+import { registerIndexCleanCommand } from "./commands/index/clean.js";
+import { registerIndexSearchCommand } from "./commands/index/search.js";
+import { registerIndexShowCommand } from "./commands/index/show.js";
+import { registerIndexStatsCommand } from "./commands/index/stats.js";
+import { registerIndexTreeCommand } from "./commands/index/tree.js";
 import { registerPlanSearchCommand } from "./commands/plan/search.js";
 import { registerPluginAgentsCommand } from "./commands/plugin/agents.js";
 import { registerPluginDisableCommand } from "./commands/plugin/disable.js";
@@ -11,18 +23,21 @@ import { registerPluginHooksCommand } from "./commands/plugin/hooks.js";
 import { registerPluginListCommand } from "./commands/plugin/list.js";
 import { registerPluginShowCommand } from "./commands/plugin/show.js";
 import { registerPluginSkillsCommand } from "./commands/plugin/skills.js";
-import { registerReviewCommand } from "./commands/review/review.js";
 import { registerListCommand } from "./commands/session/list.js";
 import { registerSearchCommand } from "./commands/session/search.js";
 import { registerShowCommand } from "./commands/session/show.js";
 import { registerTaskSearchCommand } from "./commands/task/search.js";
+import { isInsideContainer, proxyCommand } from "./utils/context.js";
+import { resolveContainer } from "./utils/docker.js";
 
 const program = new Command();
 
 program
 	.name("codeforge")
-	.description("CLI for CodeForge development workflows")
-	.version("0.1.0");
+	.description("CLI for CodeForge development workflows (experimental)")
+	.version("0.1.0")
+	.option("--local", "Run against local host filesystem (skip container proxy)")
+	.option("--container <name>", "Target a specific container by name");
 
 const session = program
 	.command("session")
@@ -59,6 +74,62 @@ const config = program
 registerConfigShowCommand(config);
 registerConfigApplyCommand(config);
 
-registerReviewCommand(program);
+const index = program
+	.command("index")
+	.description("Build and search a codebase symbol index");
+
+registerIndexBuildCommand(index);
+registerIndexSearchCommand(index);
+registerIndexShowCommand(index);
+registerIndexStatsCommand(index);
+registerIndexTreeCommand(index);
+registerIndexCleanCommand(index);
+
+const container = program
+	.command("container")
+	.description("Manage CodeForge devcontainers");
+
+registerContainerUpCommand(container);
+registerContainerDownCommand(container);
+registerContainerRebuildCommand(container);
+registerContainerExecCommand(container);
+registerContainerLsCommand(container);
+registerContainerShellCommand(container);
+
+// Proxy middleware: when outside container and not --local, proxy existing commands into container
+program.hook("preAction", async (_thisCommand, actionCommand) => {
+	const opts = program.opts();
+
+	// Skip proxy if inside container or --local is set
+	if (isInsideContainer() || opts.local) return;
+
+	// Skip proxy for container commands (they run on host)
+	let cmd = actionCommand;
+	while (cmd.parent && cmd.parent !== program) {
+		cmd = cmd.parent;
+	}
+	if (cmd.name() === "container") return;
+
+	// Proxy into running container
+	try {
+		const target = await resolveContainer(opts.container);
+		// Build args: strip --local and --container flags from original args
+		const args = process.argv.slice(2).filter((arg, i, arr) => {
+			if (arg === "--local") return false;
+			if (arg === "--container") return false;
+			// Skip the value after --container
+			if (i > 0 && arr[i - 1] === "--container") return false;
+			return true;
+		});
+		await proxyCommand(target.id, args);
+		// proxyCommand calls process.exit, but just in case:
+		process.exit(0);
+	} catch (err) {
+		const message = err instanceof Error ? err.message : String(err);
+		console.error(`Proxy error: ${message}`);
+		console.error("Use --local to run against the host filesystem directly.");
+		process.exit(1);
+	}
+});
 
 program.parse();
diff --git a/cli/src/indexer/db.ts b/cli/src/indexer/db.ts
new file mode 100644
index 0000000..7c908bc
--- /dev/null
+++ b/cli/src/indexer/db.ts
@@ -0,0 +1,332 @@
+import { Database } from "bun:sqlite";
+import { statSync } from "fs";
+import type {
+	IndexedFile,
+	IndexedFolder,
+	IndexedSymbol,
+	IndexStats,
+	SearchHit,
+	SymbolKind,
+} from "../schemas/index.js";
+
+const CREATE_TABLES_SQL = `
+CREATE TABLE IF NOT EXISTS files (
+  path TEXT PRIMARY KEY,
+  hash TEXT NOT NULL,
+  size INTEGER,
+  language TEXT,
+  line_count INTEGER,
+  last_indexed TEXT DEFAULT (datetime('now'))
+);
+
+CREATE TABLE IF NOT EXISTS folders (
+  path TEXT PRIMARY KEY,
+  description TEXT,
+  file_count INTEGER DEFAULT 0,
+  last_indexed TEXT DEFAULT (datetime('now'))
+);
+
+CREATE TABLE IF NOT EXISTS symbols (
+  id INTEGER PRIMARY KEY,
+  name TEXT NOT NULL,
+  kind TEXT NOT NULL,
+  file_path TEXT NOT NULL,
+  line_start INTEGER,
+  line_end INTEGER,
+  signature TEXT,
+  docstring TEXT,
+  parent_name TEXT,
+  exported INTEGER DEFAULT 0,
+  language TEXT NOT NULL,
+  FOREIGN KEY (file_path) REFERENCES files(path) ON DELETE CASCADE
+);
+CREATE INDEX IF NOT EXISTS idx_symbols_file ON symbols(file_path);
+CREATE INDEX IF NOT EXISTS idx_symbols_kind ON symbols(kind);
+CREATE INDEX IF NOT EXISTS idx_symbols_name ON symbols(name);
+
+CREATE VIRTUAL TABLE IF NOT EXISTS symbols_fts USING fts5(
+  name, signature, docstring, file_path,
+  content=symbols, content_rowid=id
+);
+
+CREATE TRIGGER IF NOT EXISTS symbols_ai AFTER INSERT ON symbols BEGIN
+  INSERT INTO symbols_fts(rowid, name, signature, docstring, file_path)
+  VALUES (new.id, new.name, new.signature, new.docstring, new.file_path);
+END;
+CREATE TRIGGER IF NOT EXISTS symbols_ad AFTER DELETE ON symbols BEGIN
+  INSERT INTO symbols_fts(symbols_fts, rowid, name, signature, docstring, file_path)
+  VALUES('delete', old.id, old.name, old.signature, old.docstring, old.file_path);
+END;
+CREATE TRIGGER IF NOT EXISTS symbols_au AFTER UPDATE ON symbols BEGIN
+  INSERT INTO symbols_fts(symbols_fts, rowid, name, signature, docstring, file_path)
+  VALUES('delete', old.id, old.name, old.signature, old.docstring, old.file_path);
+  INSERT INTO symbols_fts(rowid, name, signature, docstring, file_path)
+  VALUES (new.id, new.name, new.signature, new.docstring, new.file_path);
+END;
+`;
+
+export function openDatabase(dbPath: string): Database {
+	const db = new Database(dbPath, { create: true });
+	db.exec("PRAGMA journal_mode = WAL;");
+	db.exec("PRAGMA foreign_keys = ON;");
+	db.exec(CREATE_TABLES_SQL);
+	return db;
+}
+
+export function closeDatabase(db: Database): void {
+	db.close();
+}
+
+export function insertFiles(db: Database, files: IndexedFile[]): void {
+	const stmt = db.prepare(
+		`INSERT OR REPLACE INTO files (path, hash, size, language, line_count, last_indexed)
+		 VALUES (?, ?, ?, ?, ?, ?)`,
+	);
+	const tx = db.transaction(() => {
+		for (const f of files) {
+			stmt.run(f.path, f.hash, f.size, f.language, f.lineCount, f.lastIndexed);
+		}
+	});
+	tx();
+}
+
+export function insertSymbols(
+	db: Database,
+	symbols: Omit<IndexedSymbol, "id">[],
+): void {
+	const stmt = db.prepare(
+		`INSERT INTO symbols (name, kind, file_path, line_start, line_end, signature, docstring, parent_name, exported, language)
+		 VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`,
+	);
+	const tx = db.transaction(() => {
+		for (const s of symbols) {
+			stmt.run(
+				s.name,
+				s.kind,
+				s.filePath,
+				s.lineStart,
+				s.lineEnd,
+				s.signature,
+				s.docstring,
+				s.parentName,
+				s.exported ? 1 : 0,
+				s.language,
+			);
+		}
+	});
+	tx();
+}
+
+export function deleteFileAndSymbols(db: Database, filePath: string): void {
+	db.run("DELETE FROM symbols WHERE file_path = ?", [filePath]);
+	db.run("DELETE FROM files WHERE path = ?", [filePath]);
+}
+
+export function upsertFolders(db: Database, folders: IndexedFolder[]): void {
+	const stmt = db.prepare(
+		`INSERT OR REPLACE INTO folders (path, description, file_count, last_indexed)
+		 VALUES (?, ?, ?, ?)`,
+	);
+	const tx = db.transaction(() => {
+		for (const f of folders) {
+			stmt.run(f.path, f.description, f.fileCount, f.lastIndexed);
+		}
+	});
+	tx();
+}
+
+export function searchSymbols(
+	db: Database,
+	query: string,
+	limit = 20,
+): SearchHit[] {
+	const rows = db
+		.prepare(
+			`SELECT s.id, s.name, s.kind, s.file_path, s.line_start, s.line_end,
+			        s.signature, s.docstring, s.parent_name, s.exported, s.language,
+			        fts.rank
+			 FROM symbols_fts fts
+			 JOIN symbols s ON s.id = fts.rowid
+			 WHERE symbols_fts MATCH ?
+			 ORDER BY fts.rank
+			 LIMIT ?`,
+		)
+		.all(query, limit) as Array<{
+		id: number;
+		name: string;
+		kind: string;
+		file_path: string;
+		line_start: number;
+		line_end: number;
+		signature: string | null;
+		docstring: string | null;
+		parent_name: string | null;
+		exported: number;
+		language: string;
+		rank: number;
+	}>;
+
+	return rows.map((row) => ({
+		symbol: {
+			id: row.id,
+			name: row.name,
+			kind: row.kind as SymbolKind,
+			filePath: row.file_path,
+			lineStart: row.line_start,
+			lineEnd: row.line_end,
+			signature: row.signature,
+			docstring: row.docstring,
+			parentName: row.parent_name,
+			exported: row.exported === 1,
+			language: row.language,
+		},
+		rank: row.rank,
+	}));
+}
+
+export function getFileSymbols(
+	db: Database,
+	filePath: string,
+): IndexedSymbol[] {
+	const rows = db
+		.prepare(
+			`SELECT id, name, kind, file_path, line_start, line_end,
+			        signature, docstring, parent_name, exported, language
+			 FROM symbols WHERE file_path = ?
+			 ORDER BY line_start`,
+		)
+		.all(filePath) as Array<{
+		id: number;
+		name: string;
+		kind: string;
+		file_path: string;
+		line_start: number;
+		line_end: number;
+		signature: string | null;
+		docstring: string | null;
+		parent_name: string | null;
+		exported: number;
+		language: string;
+	}>;
+
+	return rows.map((row) => ({
+		id: row.id,
+		name: row.name,
+		kind: row.kind as SymbolKind,
+		filePath: row.file_path,
+		lineStart: row.line_start,
+		lineEnd: row.line_end,
+		signature: row.signature,
+		docstring: row.docstring,
+		parentName: row.parent_name,
+		exported: row.exported === 1,
+		language: row.language,
+	}));
+}
+
+export function getStats(db: Database, dbPath: string): IndexStats {
+	const totalFiles = (
+		db.prepare("SELECT COUNT(*) as cnt FROM files").get() as { cnt: number }
+	).cnt;
+	const totalSymbols = (
+		db.prepare("SELECT COUNT(*) as cnt FROM symbols").get() as { cnt: number }
+	).cnt;
+	const totalFolders = (
+		db.prepare("SELECT COUNT(*) as cnt FROM folders").get() as { cnt: number }
+	).cnt;
+
+	const langRows = db
+		.prepare(
+			`SELECT f.language,
+			        COUNT(DISTINCT f.path) as file_count,
+			        COUNT(s.id) as symbol_count
+			 FROM files f
+			 LEFT JOIN symbols s ON s.file_path = f.path
+			 GROUP BY f.language`,
+		)
+		.all() as Array<{
+		language: string;
+		file_count: number;
+		symbol_count: number;
+	}>;
+
+	const byLanguage: Record<string, { files: number; symbols: number }> = {};
+	for (const row of langRows) {
+		byLanguage[row.language] = {
+			files: row.file_count,
+			symbols: row.symbol_count,
+		};
+	}
+
+	const lastBuildRow = db
+		.prepare("SELECT MAX(last_indexed) as last_build FROM files")
+		.get() as { last_build: string | null };
+
+	let dbSizeBytes = 0;
+	try {
+		dbSizeBytes = statSync(dbPath).size;
+	} catch {
+		// DB file may not be accessible
+	}
+
+	return {
+		totalFiles,
+		totalSymbols,
+		totalFolders,
+		byLanguage,
+		lastBuildTime: lastBuildRow.last_build,
+		dbSizeBytes,
+	};
+}
+
+export function getAllFolders(db: Database): IndexedFolder[] {
+	const rows = db
+		.prepare(
+			`SELECT path, description, file_count, last_indexed
+			 FROM folders ORDER BY path`,
+		)
+		.all() as Array<{
+		path: string;
+		description: string | null;
+		file_count: number;
+		last_indexed: string;
+	}>;
+
+	return rows.map((row) => ({
+		path: row.path,
+		description: row.description,
+		fileCount: row.file_count,
+		lastIndexed: row.last_indexed,
+	}));
+}
+
+export function getFileByPath(db: Database, path: string): IndexedFile | null {
+	const row = db
+		.prepare(
+			`SELECT path, hash, size, language, line_count, last_indexed
+			 FROM files WHERE path = ?`,
+		)
+		.get(path) as {
+		path: string;
+		hash: string;
+		size: number;
+		language: string;
+		line_count: number;
+		last_indexed: string;
+	} | null;
+
+	if (!row) return null;
+
+	return {
+		path: row.path,
+		hash: row.hash,
+		size: row.size,
+		language: row.language,
+		lineCount: row.line_count,
+		lastIndexed: row.last_indexed,
+	};
+}
+
+export function rebuildFts(db: Database): void {
+	db.exec("INSERT INTO symbols_fts(symbols_fts) VALUES('rebuild')");
+}
diff --git a/cli/src/indexer/extractor.ts b/cli/src/indexer/extractor.ts
new file mode 100644
index 0000000..710d36c
--- /dev/null
+++ b/cli/src/indexer/extractor.ts
@@ -0,0 +1,233 @@
+import { unlinkSync } from "fs";
+import { tmpdir } from "os";
+import { join } from "path";
+import type { IndexedSymbol, SymbolKind } from "../schemas/index.js";
+import { getRulesForLanguage } from "./rules.js";
+
+interface SgMatch {
+	text: string;
+	range: {
+		byteOffset: { start: number; end: number };
+		start: { line: number; column: number };
+		end: { line: number; column: number };
+	};
+	file: string;
+	lines: string;
+	ruleId: string;
+	language: string;
+	message?: string;
+}
+
+export async function checkSgInstalled(): Promise<boolean> {
+	try {
+		const proc = Bun.spawn(["sg", "--version"], {
+			stdout: "pipe",
+			stderr: "pipe",
+		});
+		await proc.exited;
+		return proc.exitCode === 0;
+	} catch {
+		return false;
+	}
+}
+
+export function extractSignature(text: string, language: string): string {
+	if (language === "python") {
+		const firstLine = text.split("\n")[0];
+		return firstLine.replace(/:$/, "").trim();
+	}
+
+	// TypeScript/JavaScript: strip body (everything from first { to end)
+	const braceIndex = text.indexOf("{");
+	if (braceIndex !== -1) {
+		return text.substring(0, braceIndex).trim();
+	}
+	// For type aliases, interfaces without body braces on same match, return as-is
+	return text.split("\n")[0].trim();
+}
+
+export function extractDocstring(
+	text: string,
+	language: string,
+): string | null {
+	if (language === "python") {
+		// Look for triple-quoted docstring at start of function/class body
+		const bodyMatch = text.match(/:\s*\n\s*("""[\s\S]*?"""|'''[\s\S]*?''')/);
+		if (bodyMatch) {
+			return bodyMatch[1]
+				.replace(/^"""|"""$/g, "")
+				.replace(/^'''|'''$/g, "")
+				.trim();
+		}
+		return null;
+	}
+	// TypeScript docstrings are handled via JSDoc correlation
+	return null;
+}
+
+export function extractSymbolName(text: string, ruleId: string): string {
+	// Handle different declaration patterns
+	const patterns: RegExp[] = [
+		/(?:export\s+)?(?:async\s+)?function\s+(\w+)/,
+		/(?:export\s+)?class\s+(\w+)/,
+		/(?:export\s+)?interface\s+(\w+)/,
+		/(?:export\s+)?type\s+(\w+)/,
+		/(?:export\s+)?enum\s+(\w+)/,
+		/(?:export\s+)?(?:const|let|var)\s+(\w+)/,
+		// Python patterns
+		/def\s+(\w+)/,
+		/class\s+(\w+)/,
+	];
+
+	for (const pattern of patterns) {
+		const match = text.match(pattern);
+		if (match) {
+			return match[1];
+		}
+	}
+
+	// Fallback: first word-like token after common keywords
+	const fallback = text.match(
+		/(?:function|class|interface|type|enum|const|let|var|def)\s+(\w+)/,
+	);
+	if (fallback) return fallback[1];
+
+	return "unknown";
+}
+
+export function determineSymbolKind(text: string, ruleId: string): SymbolKind {
+	// Rule ID gives strong hints
+	if (ruleId === "ts-function" || ruleId === "py-function") return "function";
+	if (ruleId === "ts-class" || ruleId === "py-class") return "class";
+	if (ruleId === "ts-interface") return "interface";
+
+	// For ts-export, inspect the text
+	if (ruleId === "ts-export" || ruleId === "py-decorated") {
+		if (/\bfunction\s/.test(text)) return "function";
+		if (/\bclass\s/.test(text)) return "class";
+		if (/\binterface\s/.test(text)) return "interface";
+		if (/\btype\s/.test(text)) return "type";
+		if (/\benum\s/.test(text)) return "enum";
+		if (/\bconst\s/.test(text)) return "const";
+		if (/\bdef\s/.test(text)) return "function";
+	}
+
+	return "function";
+}
+
+export async function extractSymbols(
+	filePaths: string[],
+	language: string,
+): Promise<Omit<IndexedSymbol, "id">[]> {
+	if (filePaths.length === 0) return [];
+
+	const rulesYaml = getRulesForLanguage(language);
+	if (!rulesYaml) return [];
+
+	const installed = await checkSgInstalled();
+	if (!installed) {
+		throw new Error(
+			"ast-grep (sg) is required but not found. Install it:\n" +
+				"  npm install -g @ast-grep/cli\n" +
+				"  # or: brew install ast-grep",
+		);
+	}
+
+	const tmpFile = join(tmpdir(), `codeforge-sg-rules-${Date.now()}.yml`);
+	await Bun.write(tmpFile, rulesYaml);
+
+	try {
+		const args = ["sg", "scan", "--rule", tmpFile, "--json", ...filePaths];
+		const proc = Bun.spawn(args, {
+			stdout: "pipe",
+			stderr: "pipe",
+		});
+
+		const stdout = await new Response(proc.stdout).text();
+		await proc.exited;
+
+		if (!stdout.trim()) return [];
+
+		let matches: SgMatch[];
+		try {
+			matches = JSON.parse(stdout);
+		} catch {
+			return [];
+		}
+
+		if (!Array.isArray(matches)) return [];
+
+		// Separate JSDoc comments from other matches (for TypeScript)
+		const jsdocMatches = matches.filter((m) => m.ruleId === "ts-jsdoc");
+		const symbolMatches = matches.filter((m) => m.ruleId !== "ts-jsdoc");
+
+		// Build a lookup for JSDoc by file and end line
+		const jsdocByFileAndLine = new Map<string, Map<number, string>>();
+		for (const jsdoc of jsdocMatches) {
+			if (!jsdocByFileAndLine.has(jsdoc.file)) {
+				jsdocByFileAndLine.set(jsdoc.file, new Map());
+			}
+			jsdocByFileAndLine.get(jsdoc.file)!.set(
+				jsdoc.range.end.line,
+				jsdoc.text
+					.replace(/^\/\*\*/, "")
+					.replace(/\*\/$/, "")
+					.replace(/^\s*\* ?/gm, "")
+					.trim(),
+			);
+		}
+
+		const symbols: Omit<IndexedSymbol, "id">[] = [];
+
+		for (const match of symbolMatches) {
+			const name = extractSymbolName(match.text, match.ruleId);
+			const kind = determineSymbolKind(match.text, match.ruleId);
+			const signature = extractSignature(match.text, language);
+			const exported =
+				match.ruleId === "ts-export" || /^export\s/.test(match.text);
+
+			// Find associated JSDoc (line just before this symbol)
+			let docstring: string | null = null;
+			if (language === "typescript" || language === "javascript") {
+				const fileJsdocs = jsdocByFileAndLine.get(match.file);
+				if (fileJsdocs) {
+					// JSDoc ends on the line just before the symbol starts
+					docstring = fileJsdocs.get(match.range.start.line - 1) ?? null;
+				}
+			} else if (language === "python") {
+				docstring = extractDocstring(match.text, language);
+			}
+
+			symbols.push({
+				name,
+				kind,
+				filePath: match.file,
+				lineStart: match.range.start.line,
+				lineEnd: match.range.end.line,
+				signature,
+				docstring,
+				parentName: null,
+				exported,
+				language,
+			});
+		}
+
+		// Deduplicate: when a declaration is matched by both ts-export and
+		// ts-function/ts-class/ts-interface, keep only the exported version.
+		const seen = new Map<string, (typeof symbols)[number]>();
+		for (const sym of symbols) {
+			const key = `${sym.filePath}:${sym.name}:${sym.lineStart}`;
+			const existing = seen.get(key);
+			if (!existing || (sym.exported && !existing.exported)) {
+				seen.set(key, sym);
+			}
+		}
+		return [...seen.values()];
+	} finally {
+		try {
+			unlinkSync(tmpFile);
+		} catch {
+			// Temp file cleanup is best-effort
+		}
+	}
+}
diff --git a/cli/src/indexer/folders.ts b/cli/src/indexer/folders.ts
new file mode 100644
index 0000000..0e9380f
--- /dev/null
+++ b/cli/src/indexer/folders.ts
@@ -0,0 +1,127 @@
+import { existsSync, readdirSync, readFileSync } from "fs";
+import { extname, join } from "path";
+import type { IndexedFolder } from "../schemas/index.js";
+
+const RECOGNIZED_EXTENSIONS = new Set([".ts", ".tsx", ".js", ".jsx", ".py"]);
+
+function extractFirstParagraph(markdown: string): string | null {
+	const lines = markdown.split("\n");
+	const paragraphLines: string[] = [];
+	let inParagraph = false;
+
+	for (const line of lines) {
+		const trimmed = line.trim();
+
+		// Skip headings, badges (images), and empty lines before paragraph
+		if (!inParagraph) {
+			if (
+				trimmed === "" ||
+				trimmed.startsWith("#") ||
+				trimmed.startsWith("![") ||
+				trimmed.startsWith("[![") ||
+				trimmed.startsWith("---") ||
+				trimmed.startsWith("===")
+			) {
+				continue;
+			}
+			// Start of paragraph
+			inParagraph = true;
+			paragraphLines.push(trimmed);
+		} else {
+			// End of paragraph on empty line
+			if (trimmed === "") break;
+			paragraphLines.push(trimmed);
+		}
+	}
+
+	if (paragraphLines.length === 0) return null;
+	return paragraphLines.join(" ");
+}
+
+function loadFolderOverrides(workspaceRoot: string): Map<string, string> {
+	const overrides = new Map<string, string>();
+	const overridePath = join(
+		workspaceRoot,
+		".codeforge",
+		"data",
+		"folders.yaml",
+	);
+
+	if (!existsSync(overridePath)) return overrides;
+
+	try {
+		const content = readFileSync(overridePath, "utf-8");
+		for (const line of content.split("\n")) {
+			const trimmed = line.trim();
+			if (trimmed === "" || trimmed.startsWith("#")) continue;
+
+			const colonIndex = trimmed.indexOf(": ");
+			if (colonIndex === -1) continue;
+
+			const key = trimmed.substring(0, colonIndex).trim();
+			const value = trimmed.substring(colonIndex + 2).trim();
+			// Remove optional surrounding quotes
+			overrides.set(key, value.replace(/^["']|["']$/g, ""));
+		}
+	} catch {
+		// If file can't be read, return empty overrides
+	}
+
+	return overrides;
+}
+
+function countRecognizedFiles(dirPath: string): number {
+	try {
+		const entries = readdirSync(dirPath, { withFileTypes: true });
+		let count = 0;
+		for (const entry of entries) {
+			if (entry.isFile() && RECOGNIZED_EXTENSIONS.has(extname(entry.name))) {
+				count++;
+			}
+		}
+		return count;
+	} catch {
+		return 0;
+	}
+}
+
+export async function extractFolderDocs(
+	directories: string[],
+	workspaceRoot: string,
+): Promise<IndexedFolder[]> {
+	const overrides = loadFolderOverrides(workspaceRoot);
+	const now = new Date().toISOString().replace("T", " ").substring(0, 19);
+	const folders: IndexedFolder[] = [];
+
+	for (const relDir of directories) {
+		const absDir = join(workspaceRoot, relDir);
+		let description: string | null = null;
+
+		// Check for manual override first
+		if (overrides.has(relDir)) {
+			description = overrides.get(relDir)!;
+		} else {
+			// Try README.md
+			const readmePath = join(absDir, "README.md");
+			if (existsSync(readmePath)) {
+				try {
+					const content = readFileSync(readmePath, "utf-8");
+					description = extractFirstParagraph(content);
+				} catch {
+					// Ignore read errors
+				}
+			}
+		}
+
+		const fileCount = countRecognizedFiles(absDir);
+
+		folders.push({
+			path: relDir,
+			description,
+			fileCount,
+			lastIndexed: now,
+		});
+	}
+
+	return folders;
+}
diff --git a/cli/src/indexer/rules.ts b/cli/src/indexer/rules.ts
new file mode 100644
index 0000000..5f14d15
--- /dev/null
+++ b/cli/src/indexer/rules.ts
@@ -0,0 +1,99 @@
+export function getTypescriptRules(): string {
+	return `id: ts-jsdoc
+language: TypeScript
+rule:
+  kind: comment
+  regex: "^\\\\/\\\\*\\\\*"
+---
+id: ts-export
+language: TypeScript
+rule:
+  kind: export_statement
+  has:
+    kind: function_declaration
+    stopBy: end
+---
+id: ts-export
+language: TypeScript
+rule:
+  kind: export_statement
+  has:
+    kind: class_declaration
+    stopBy: end
+---
+id: ts-export
+language: TypeScript
+rule:
+  kind: export_statement
+  has:
+    kind: interface_declaration
+    stopBy: end
+---
+id: ts-export
+language: TypeScript
+rule:
+  kind: export_statement
+  has:
+    kind: type_alias_declaration
+    stopBy: end
+---
+id: ts-export
+language: TypeScript
+rule:
+  kind: export_statement
+  has:
+    kind: lexical_declaration
+    stopBy: end
+---
+id: ts-export
+language: TypeScript
+rule:
+  kind: export_statement
+  has:
+    kind: enum_declaration
+    stopBy: end
+---
+id: ts-function
+language: TypeScript
+rule:
+  kind: function_declaration
+---
+id: ts-class
+language: TypeScript
+rule:
+  kind: class_declaration
+---
+id: ts-interface
+language: TypeScript
+rule:
+  kind: interface_declaration`;
+}
+
+export function getPythonRules(): string {
+	return `id: py-function
+language: Python
+rule:
+  kind: function_definition
+---
+id: py-class
+language: Python
+rule:
+  kind: class_definition
+---
+id: py-decorated
+language: Python
+rule:
+  kind: decorated_definition`;
+}
+
+export function getRulesForLanguage(language: string): string | null {
+	switch (language) {
+		case "typescript":
+		case "javascript":
+			return getTypescriptRules();
+		case "python":
+			return getPythonRules();
+		default:
+			return null;
+	}
+}
diff --git a/cli/src/indexer/scanner.ts b/cli/src/indexer/scanner.ts
new file mode 100644
index 0000000..4d8fdef
--- /dev/null
+++ b/cli/src/indexer/scanner.ts
@@ -0,0 +1,128 @@
+import type { Database } from "bun:sqlite";
+import { readdirSync, readFileSync, statSync } from "fs";
+import { extname, join, relative } from "path";
+import type { ScanResult } from "../schemas/index.js";
+import { getFileByPath } from "./db.js";
+
+const EXTENSION_MAP: Record<string, string> = {
+	".ts": "typescript",
+	".tsx": "typescript",
+	".js": "javascript",
+	".jsx": "javascript",
+	".py": "python",
+};
+
+const IGNORE_DIRS = new Set([
+	"node_modules",
+	".git",
+	"dist",
+	"build",
+	"__pycache__",
+	".next",
+	".venv",
+	"venv",
+	"coverage",
+]);
+
+export function getLanguageForExtension(ext: string): string | null {
+	return EXTENSION_MAP[ext] ?? null;
+}
+
+export async function hashFileContent(filePath: string): Promise<string> {
+	const content = readFileSync(filePath);
+	const hasher = new Bun.CryptoHasher("sha256");
+	hasher.update(content);
+	return hasher.digest("hex");
+}
+
+function walkDirectory(dir: string, results: string[]): void {
+	let entries;
+	try {
+		entries = readdirSync(dir, { withFileTypes: true });
+	} catch {
+		return;
+	}
+
+	for (const entry of entries) {
+		if (entry.isDirectory()) {
+			if (!IGNORE_DIRS.has(entry.name)) {
+				walkDirectory(join(dir, entry.name), results);
+			}
+		} else if (entry.isFile()) {
+			const ext = extname(entry.name);
+			if (EXTENSION_MAP[ext]) {
+				results.push(join(dir, entry.name));
+			}
+		}
+	}
+}
+
+export async function scanDirectory(
+	targetPath: string,
+	db: Database,
+	rootPath?: string,
+): Promise<ScanResult> {
+	const baseForRelative = rootPath ?? targetPath;
+	const allFiles: string[] = [];
+	walkDirectory(targetPath, allFiles);
+
+	const relativePaths = allFiles.map((f) => relative(baseForRelative, f));
+
+	const newFiles: string[] = [];
+	const changedFiles: string[] = [];
+	const unchangedFiles: string[] = [];
+
+	for (let i = 0; i < relativePaths.length; i++) {
+		const relPath = relativePaths[i];
+		const absPath = allFiles[i];
+		const hash = await hashFileContent(absPath);
+		const existing = getFileByPath(db, relPath);
+
+		if (!existing) {
+			newFiles.push(relPath);
+		} else if (existing.hash !== hash) {
+			changedFiles.push(relPath);
+		} else {
+			unchangedFiles.push(relPath);
+		}
+	}
+
+	// Find deleted files: in DB but not on disk
+	const diskSet = new Set(relativePaths);
+	const allDbFiles = db.prepare("SELECT path FROM files").all() as Array<{
+		path: string;
+	}>;
+	const deletedFiles = allDbFiles
+		.map((row) => row.path)
+		.filter((p) => !diskSet.has(p));
+
+	return { newFiles, changedFiles, unchangedFiles, deletedFiles };
+}
+
+export async function collectDirectories(
+	targetPath: string,
+	rootPath?: string,
+): Promise<string[]> {
+	const baseForRelative = rootPath ?? targetPath;
+	const dirs: string[] = [];
+
+	function walk(dir: string): void {
+		let entries;
+		try {
+			entries = readdirSync(dir, { withFileTypes: true });
+		} catch {
+			return;
+		}
+
+		for (const entry of entries) {
+			if (entry.isDirectory() && !IGNORE_DIRS.has(entry.name)) {
+				const fullPath = join(dir, entry.name);
+				dirs.push(relative(baseForRelative, fullPath));
+				walk(fullPath);
+			}
+		}
+	}
+
+	walk(targetPath);
+	return dirs;
+}
diff --git a/cli/src/loaders/plugin-loader.ts b/cli/src/loaders/plugin-loader.ts
index 9ecf2f7..b4a43c7 100644
--- a/cli/src/loaders/plugin-loader.ts
+++ b/cli/src/loaders/plugin-loader.ts
@@ -10,6 +10,7 @@ import type {
 	PluginJsonFile,
 	SkillInfo,
 } from "../schemas/plugin.js";
+import { findWorkspacePath } from "../utils/devcontainer.js";
 
 export function extractFrontMatter(content: string): Record<string, string> {
 	const match = content.match(/^---\r?\n([\s\S]*?)\r?\n---/);
@@ -160,10 +161,16 @@ export function findSettingsPaths(): {
 
 	if (!source) {
 		try {
-			const fallback = "/workspaces/.codeforge/config/settings.json";
-			const stat = Bun.file(fallback);
-			if (stat.size !== undefined) {
-				source = fallback;
+			const workspaceRoot = findWorkspacePath();
+			if (workspaceRoot) {
+				const fallback = resolve(
+					workspaceRoot,
+					".codeforge/config/settings.json",
+				);
+				const stat = Bun.file(fallback);
+				if (stat.size !== undefined) {
+					source = fallback;
+				}
 			}
 		} catch {}
 	}
diff --git a/cli/src/output/index-json.ts b/cli/src/output/index-json.ts
new file mode 100644
index 0000000..56272f9
--- /dev/null
+++ b/cli/src/output/index-json.ts
@@ -0,0 +1,51 @@
+import type {
+	IndexedSymbol,
+	IndexStats,
+	ScanResult,
+	SearchHit,
+	TreeEntry,
+} from "../schemas/index.js";
+
+export function formatSearchJson(hits: SearchHit[]): string {
+	return JSON.stringify({ results: hits, total: hits.length }, null, 2);
+}
+
+export function formatShowJson(
+	filePath: string,
+	symbols: IndexedSymbol[],
+): string {
+	return JSON.stringify(
+		{ file: filePath, symbols, total: symbols.length },
+		null,
+		2,
+	);
+}
+
+export function formatStatsJson(stats: IndexStats): string {
+	return JSON.stringify(stats, null, 2);
+}
+
+export function formatTreeJson(entries: TreeEntry[]): string {
+	return JSON.stringify({ tree: entries }, null, 2);
+}
+
+export function formatBuildJson(result: {
+	scanned: ScanResult;
+	symbolCount: number;
+	durationMs: number;
+}): string {
+	return JSON.stringify(
+		{
+			scanned: {
+				newFiles: result.scanned.newFiles.length,
+				changedFiles: result.scanned.changedFiles.length,
+				unchangedFiles: result.scanned.unchangedFiles.length,
+				deletedFiles: result.scanned.deletedFiles.length,
+			},
+			totalSymbols: result.symbolCount,
+			durationMs: result.durationMs,
+		},
+		null,
+		2,
+	);
+}
diff --git a/cli/src/output/index-text.ts b/cli/src/output/index-text.ts
new file mode 100644
index 0000000..70602c3
--- /dev/null
+++ b/cli/src/output/index-text.ts
@@ -0,0 +1,212 @@
+import chalk from "chalk";
+import type {
+	IndexedSymbol,
+	IndexStats,
+	ScanResult,
+	SearchHit,
+	SymbolKind,
+	TreeEntry,
+} from "../schemas/index.js";
+
+const KIND_ICONS: Record<
+	SymbolKind,
+	{ icon: string; color: (s: string) => string }
+> = {
+	function: { icon: "\u0192", color: chalk.cyan },
+	class: { icon: "C", color: chalk.yellow },
+	interface: { icon: "I", color: chalk.green },
+	type: { icon: "T", color: chalk.magenta },
+	const: { icon: "c", color: chalk.blue },
+	method: { icon: "m", color: (s: string) => chalk.cyan(chalk.dim(s)) },
+	enum: { icon: "E", color: chalk.yellow },
+};
+
+function kindLabel(kind: SymbolKind): string {
+	const k = KIND_ICONS[kind];
+	return k ? k.color(`${k.icon} ${kind}`) : kind;
+}
+
+export function formatSearchText(
+	hits: SearchHit[],
+	options: { noColor?: boolean } = {},
+): string {
+	if (options.noColor) chalk.level = 0;
+
+	if (hits.length === 0) return chalk.dim("No results found.");
+
+	const lines: string[] = [];
+
+	// Group by file
+	const byFile = new Map<string, SearchHit[]>();
+	for (const hit of hits) {
+		const group = byFile.get(hit.symbol.filePath) ?? [];
+		group.push(hit);
+		byFile.set(hit.symbol.filePath, group);
+	}
+
+	for (const [filePath, fileHits] of byFile) {
+		lines.push(chalk.bold.underline(filePath));
+		for (const hit of fileHits) {
+			const { symbol } = hit;
+			const loc = chalk.dim(`:${symbol.lineStart}-${symbol.lineEnd}`);
+			const sig = symbol.signature ? chalk.dim(` ${symbol.signature}`) : "";
+			lines.push(
+				`  ${kindLabel(symbol.kind)}  ${chalk.bold(symbol.name)}${loc}${sig}`,
+			);
+			if (symbol.docstring) {
+				const preview =
+					symbol.docstring.length > 100
+						? symbol.docstring.slice(0, 100) + "..."
+						: symbol.docstring;
+				lines.push(`    ${chalk.dim(preview)}`);
+			}
+		}
+		lines.push("");
+	}
+
+	lines.push(chalk.dim(`${hits.length} result${hits.length === 1 ? "" : "s"}`));
+	return lines.join("\n");
+}
+
+export function formatShowText(
+	filePath: string,
+	symbols: IndexedSymbol[],
+	options: { noColor?: boolean } = {},
+): string {
+	if (options.noColor) chalk.level = 0;
+
+	const lines: string[] = [];
+	lines.push(chalk.bold.underline(filePath));
+
+	if (symbols.length === 0) {
+		lines.push(chalk.dim("  No symbols found."));
+		return lines.join("\n");
+	}
+
+	for (const sym of symbols) {
+		const exported = sym.exported
+			? chalk.green("exported")
+			: chalk.dim("local");
+		const loc = chalk.dim(`L${sym.lineStart}-${sym.lineEnd}`);
+		const parent = sym.parentName ? chalk.dim(` (${sym.parentName})`) : "";
+		lines.push(
+			`  ${kindLabel(sym.kind)}  ${chalk.bold(sym.name)}  ${loc}  ${exported}${parent}`,
+		);
+		if (sym.signature) {
+			lines.push(`    ${chalk.dim(sym.signature)}`);
+		}
+	}
+
+	lines.push("");
+	lines.push(
+		chalk.dim(`${symbols.length} symbol${symbols.length === 1 ? "" : "s"}`),
+	);
+	return lines.join("\n");
+}
+
+export function formatStatsText(
+	stats: IndexStats,
+	options: { noColor?: boolean } = {},
+): string {
+	if (options.noColor) chalk.level = 0;
+
+	const lines: string[] = [];
+
+	lines.push(chalk.bold("Codebase Index Statistics"));
+	lines.push("\u2550".repeat(27));
+	lines.push(`Total files:      ${stats.totalFiles}`);
+	lines.push(`Total symbols:    ${stats.totalSymbols}`);
+	lines.push(`Total folders:    ${stats.totalFolders}`);
+	lines.push(`Database size:    ${formatBytes(stats.dbSizeBytes)}`);
+	if (stats.lastBuildTime) {
+		lines.push(`Last build:       ${stats.lastBuildTime}`);
+	}
+
+	if (Object.keys(stats.byLanguage).length > 0) {
+		lines.push("");
+		lines.push(chalk.bold("By language:"));
+		const sorted = Object.entries(stats.byLanguage).sort(
+			([, a], [, b]) => b.symbols - a.symbols,
+		);
+		for (const [lang, counts] of sorted) {
+			lines.push(
+				`  ${lang.padEnd(14)} ${String(counts.files).padStart(5)} files  ${String(counts.symbols).padStart(6)} symbols`,
+			);
+		}
+	}
+
+	return lines.join("\n");
+}
+
+function formatBytes(bytes: number): string {
+	if (bytes < 1024) return `${bytes} B`;
+	if (bytes < 1024 * 1024) return `${(bytes / 1024).toFixed(1)} KB`;
+	return `${(bytes / (1024 * 1024)).toFixed(1)} MB`;
+}
+
+export function formatTreeText(
+	entries: TreeEntry[],
+	options: { noColor?: boolean } = {},
+): string {
+	if (options.noColor) chalk.level = 0;
+
+	if (entries.length === 0) return chalk.dim("No entries found.");
+
+	const lines: string[] = [];
+	renderTreeEntries(entries, lines, "");
+	return lines.join("\n");
+}
+
+function renderTreeEntries(
+	entries: TreeEntry[],
+	lines: string[],
+	prefix: string,
+): void {
+	for (let i = 0; i < entries.length; i++) {
+		const entry = entries[i];
+		const isLast = i === entries.length - 1;
+		const connector = isLast ? "\u2514\u2500\u2500" : "\u251C\u2500\u2500";
+		const childPrefix = isLast ? "    " : "\u2502   ";
+
+		const icon =
+			entry.type === "folder" ? chalk.blue("\u25B8") : chalk.dim("\u25AA");
+		const name = entry.type === "folder" ? chalk.bold(entry.path) : entry.path;
+		const count = chalk.dim(` (${entry.symbolCount})`);
+		const desc = entry.description ? chalk.dim(` - ${entry.description}`) : "";
+
+		lines.push(`${prefix}${connector} ${icon} ${name}${count}${desc}`);
+
+		if (entry.children && entry.children.length > 0) {
+			renderTreeEntries(entry.children, lines, prefix + childPrefix);
+		}
+	}
+}
+
+export function formatBuildSummary(
+	result: { scanned: ScanResult; symbolCount: number; durationMs: number },
+	options: { noColor?: boolean } = {},
+): string {
+	if (options.noColor) chalk.level = 0;
+
+	const { scanned, symbolCount, durationMs } = result;
+	const lines: string[] = [];
+
+	lines.push(chalk.bold("Index Build Complete"));
+	lines.push("\u2550".repeat(21));
+	lines.push(
+		`New files:        ${chalk.green(String(scanned.newFiles.length))}`,
+	);
+	lines.push(
+		`Changed files:    ${chalk.yellow(String(scanned.changedFiles.length))}`,
+	);
+	lines.push(
+		`Unchanged files:  ${chalk.dim(String(scanned.unchangedFiles.length))}`,
+	);
+	lines.push(
+		`Deleted files:    ${chalk.red(String(scanned.deletedFiles.length))}`,
+	);
+	lines.push(`Total symbols:    ${symbolCount}`);
+	lines.push(`Duration:         ${durationMs}ms`);
+
+	return lines.join("\n");
+}
diff --git a/cli/src/output/review.ts b/cli/src/output/review.ts
deleted file mode 100644
index 0835a84..0000000
--- a/cli/src/output/review.ts
+++ /dev/null
@@ -1,193 +0,0 @@
-import chalk from "chalk";
-import type {
-	ReviewFindingWithPass,
-	ReviewResult,
-	Severity,
-} from "../schemas/review.js";
-
-const SEPARATOR = "\u2501".repeat(60);
-
-const SEVERITY_COLORS: Record<Severity, (text: string) => string> = {
-	critical: (t) => chalk.red.bold(t),
-	high: (t) => chalk.red(t),
-	medium: (t) => chalk.yellow(t),
-	low: (t) => chalk.blue(t),
-	info: (t) => chalk.dim(t),
-};
-
-function capitalize(s: string): string {
-	return s.charAt(0).toUpperCase() + s.slice(1);
-}
-
-function formatDuration(ms: number): string {
-	return `${Math.round(ms / 1000)}s`;
-}
-
-function formatCost(usd: number): string {
-	return `$${usd.toFixed(2)}`;
-}
-
-function severityTag(severity: Severity): string {
-	const label = `[${severity.toUpperCase()}]`;
-	return SEVERITY_COLORS[severity](label);
-}
-
-function formatPassLine(
-	index: number,
-	name: string,
-	costUsd: number,
-	durationMs: number,
-	error?: string,
-): string {
-	const label = `Pass ${index + 1}: ${capitalize(name)}`;
-	const stats = `${formatCost(costUsd)}  ${formatDuration(durationMs)}`;
-
-	if (error) {
-		const errorNote = chalk.yellow(` \u26A0 ${error}`);
-		return `${label.padEnd(50)}${stats}${errorNote}`;
-	}
-
-	return `${label.padEnd(50)}${stats}`;
-}
-
-function formatFinding(finding: ReviewFindingWithPass): string[] {
-	const lines: string[] = [];
-	const location = finding.line
-		? `${finding.file}:${finding.line}`
-		: finding.file;
-
-	lines.push(`${severityTag(finding.severity)} ${location}`);
-
-	const desc =
-		finding.description && finding.description !== finding.title
-			? `${finding.title} \u2014 ${finding.description}`
-			: finding.title;
-	lines.push(`  ${desc}`);
-
-	if (finding.suggestion) {
-		lines.push(`  \u2192 ${finding.suggestion}`);
-	}
-
-	lines.push(`  ${chalk.dim(`(${finding.passName})`)}`);
-
-	return lines;
-}
-
-function formatSeverityCounts(findings: ReviewFindingWithPass[]): string {
-	const counts: Record<Severity, number> = {
-		critical: 0,
-		high: 0,
-		medium: 0,
-		low: 0,
-		info: 0,
-	};
-	for (const f of findings) counts[f.severity]++;
-
-	const parts: string[] = [];
-	const entries: [Severity, number][] = [
-		["critical", counts.critical],
-		["high", counts.high],
-		["medium", counts.medium],
-		["low", counts.low],
-		["info", counts.info],
-	];
-
-	for (const [severity, count] of entries) {
-		if (count > 0) {
-			parts.push(SEVERITY_COLORS[severity](`${count} ${severity}`));
-		}
-	}
-
-	return parts.join("  ");
-}
-
-export function formatReviewText(
-	result: ReviewResult,
-	options?: { noColor?: boolean },
-): string {
-	if (options?.noColor) chalk.level = 0;
-
-	const lines: string[] = [];
-
-	// Header
-	if (result.scope === "full") {
-		lines.push(chalk.bold("Full codebase review"));
-	} else {
-		lines.push(
-			chalk.bold(
-				`Review of ${result.base}..${result.head} (${result.filesChanged} files changed)`,
-			),
-		);
-	}
-	lines.push("");
-
-	// Pass summary lines
-	for (const [i, pass] of result.passes.entries()) {
-		lines.push(
-			formatPassLine(i, pass.name, pass.costUsd, pass.durationMs, pass.error),
-		);
-	}
-
-	lines.push("");
-	lines.push(SEPARATOR);
-	lines.push("");
-
-	// Findings
-	if (result.findings.length === 0) {
-		lines.push(chalk.green("No issues found."));
-	} else {
-		for (const [i, finding] of result.findings.entries()) {
-			lines.push(...formatFinding(finding));
-			if (i < result.findings.length - 1) lines.push("");
-		}
-	}
-
-	lines.push("");
-	lines.push(SEPARATOR);
-
-	// Footer
-	const score = chalk.bold(`Score: ${result.score}/10`);
-	const counts = formatSeverityCounts(result.findings);
-	const cost = `Total: ${formatCost(result.totalCostUsd)}`;
-
-	const footerParts = [score];
-	if (counts) footerParts.push(counts);
-	footerParts.push(cost);
-	lines.push(footerParts.join("  \u2502  "));
-
-	return lines.join("\n");
-}
-
-export function formatReviewJson(result: ReviewResult): string {
-	const output = {
-		base: result.base,
-		head: result.head,
-		scope: result.scope,
-		filesChanged: result.filesChanged,
-		score: result.score,
-		findings: result.findings.map((f) => ({
-			file: f.file,
-			line: f.line,
-			severity: f.severity,
-			category: f.category,
-			pass: f.pass,
-			passName: f.passName,
-			title: f.title,
-			description: f.description,
-			suggestion: f.suggestion,
-		})),
-		summary: result.summary,
-		cost: {
-			total_usd: result.totalCostUsd,
-			passes: result.passes.map((p) => ({
-				name: p.name,
-				cost_usd: p.costUsd,
-				duration_ms: p.durationMs,
-				findings: p.findings.length,
-				...(p.error ? { error: p.error } : {}),
-			})),
-		},
-	};
-
-	return JSON.stringify(output, null, 2);
-}
diff --git a/cli/src/prompts/review.ts b/cli/src/prompts/review.ts
deleted file mode 100644
index fb86743..0000000
--- a/cli/src/prompts/review.ts
+++ /dev/null
@@ -1,71 +0,0 @@
-import { existsSync } from "node:fs";
-import path from "node:path";
-import type { PassName } from "../schemas/review.js";
-
-export type PromptMode = "sequential" | "parallel";
-
-export interface PassPrompts {
-	systemPromptFile: string;
-	userPrompt: string;
-}
-
-function findPackageRoot(from: string): string {
-	let dir = from;
-	while (dir !== path.dirname(dir)) {
-		if (existsSync(path.join(dir, "package.json"))) return dir;
-		dir = path.dirname(dir);
-	}
-	return from;
-}
-
-const PROMPTS_DIR = path.join(
-	findPackageRoot(import.meta.dir),
-	"prompts",
-	"review",
-);
-
-function getUserPromptFilename(pass: PassName, mode: PromptMode): string {
-	if (pass === "correctness") return "correctness.user.md";
-	if (mode === "parallel") return `${pass}.user.md`;
-	return `${pass}-resume.user.md`;
-}
-
-function interpolate(
-	template: string,
-	variables: Record<string, string>,
-): string {
-	let content = template;
-	for (const [key, value] of Object.entries(variables)) {
-		content = content.replaceAll(`{{${key}}}`, value);
-	}
-	return content;
-}
-
-export async function getPassPrompts(
-	pass: PassName,
-	mode: PromptMode,
-	variables: Record<string, string>,
-): Promise<PassPrompts> {
-	const systemPromptFile = path.join(PROMPTS_DIR, `${pass}.system.md`);
-	const userPromptFile = path.join(
-		PROMPTS_DIR,
-		getUserPromptFilename(pass, mode),
-	);
-
-	const rawContent = await Bun.file(userPromptFile).text();
-	const userPrompt = interpolate(rawContent, variables);
-
-	return { systemPromptFile, userPrompt };
-}
-
-export function getFullScopePrompt(pass: PassName, include?: string): string {
-	const scopeInstruction = include
-		? `Scan files matching the pattern: ${include}`
-		: "Scan the project codebase";
-
-	return `${scopeInstruction} and identify ${pass} issues.
-
-Use Read, Glob, and Grep tools to explore the codebase. Do not review node_modules, dist, or build output directories.
-
-For each finding, specify the exact file path and line number.`;
-}
diff --git a/cli/src/runners/headless.ts b/cli/src/runners/headless.ts
deleted file mode 100644
index dca0e97..0000000
--- a/cli/src/runners/headless.ts
+++ /dev/null
@@ -1,146 +0,0 @@
-import path from "node:path";
-
-export interface HeadlessOptions {
-	prompt: string;
-	model?: string;
-	maxTurns?: number;
-	maxBudgetUsd?: number;
-	allowedTools?: string[];
-	disallowedTools?: string[];
-	permissionMode?: "plan" | "acceptEdits" | "default";
-	systemPromptFile?: string;
-	resume?: string;
-	jsonSchema?: object;
-	cwd?: string;
-}
-
-export interface HeadlessResult {
-	result: string;
-	sessionId: string;
-	isError: boolean;
-	subtype: string;
-	totalCostUsd: number;
-	numTurns: number;
-	durationMs: number;
-	structuredOutput?: unknown;
-}
-
-interface ClaudeJsonOutput {
-	type: string;
-	subtype: string;
-	is_error: boolean;
-	result: string;
-	session_id: string;
-	total_cost_usd: number;
-	num_turns: number;
-}
-
-async function discoverClaudeBinary(): Promise<string> {
-	if (process.env.CLAUDE_BIN) {
-		const exists = await Bun.file(process.env.CLAUDE_BIN).exists();
-		if (exists) return process.env.CLAUDE_BIN;
-	}
-
-	const localPath = path.join(process.env.HOME ?? "", ".local/bin/claude");
-	if (await Bun.file(localPath).exists()) return localPath;
-
-	const proc = Bun.spawn(["which", "claude"], {
-		stdout: "pipe",
-		stderr: "pipe",
-	});
-	const stdout = await new Response(proc.stdout).text();
-	await proc.exited;
-	const trimmed = stdout.trim();
-	if (trimmed) return trimmed;
-
-	throw new Error(
-		"Claude CLI not found. Set CLAUDE_BIN environment variable or install Claude Code.",
-	);
-}
-
-function buildArgs(binary: string, options: HeadlessOptions): string[] {
-	const args = [binary, "-p", options.prompt, "--output-format", "json"];
-
-	if (options.model) args.push("--model", options.model);
-	if (options.maxTurns) args.push("--max-turns", String(options.maxTurns));
-	if (options.maxBudgetUsd !== undefined) {
-		args.push("--max-budget-usd", options.maxBudgetUsd.toFixed(2));
-	}
-	if (options.permissionMode) {
-		args.push("--permission-mode", options.permissionMode);
-	}
-	if (options.systemPromptFile) {
-		args.push("--system-prompt-file", options.systemPromptFile);
-	}
-	if (options.resume) args.push("--resume", options.resume);
-	if (options.jsonSchema) {
-		args.push("--json-schema", JSON.stringify(options.jsonSchema));
-	}
-	if (options.allowedTools?.length) {
-		args.push("--allowedTools", ...options.allowedTools);
-	}
-	if (options.disallowedTools?.length) {
-		args.push("--disallowedTools", ...options.disallowedTools);
-	}
-
-	return args;
-}
-
-export async function runHeadless(
-	options: HeadlessOptions,
-): Promise<HeadlessResult> {
-	const binary = await discoverClaudeBinary();
-	const args = buildArgs(binary, options);
-	const startTime = Date.now();
-
-	const proc = Bun.spawn(args, {
-		cwd: options.cwd,
-		stdout: "pipe",
-		stderr: "pipe",
-	});
-
-	const stdout = await new Response(proc.stdout).text();
-	const stderr = await new Response(proc.stderr).text();
-	const exitCode = await proc.exited;
-	const durationMs = Date.now() - startTime;
-
-	let parsed: ClaudeJsonOutput;
-	try {
-		parsed = JSON.parse(stdout) as ClaudeJsonOutput;
-	} catch {
-		if (exitCode !== 0) {
-			throw new Error(
-				`Claude process exited with code ${exitCode}: ${stderr || stdout}`,
-			);
-		}
-		return {
-			result: stdout,
-			sessionId: "",
-			isError: true,
-			subtype: "parse_error",
-			totalCostUsd: 0,
-			numTurns: 0,
-			durationMs,
-		};
-	}
-
-	let structuredOutput: unknown;
-	if (options.jsonSchema && !parsed.is_error) {
-		try {
-			structuredOutput = JSON.parse(parsed.result);
-		} catch {
-			// structured output parse failed — text result still available
-		}
-	}
-
-	return {
-		result: parsed.result,
-		sessionId: parsed.session_id,
-		isError: parsed.is_error,
-		subtype: parsed.subtype,
-		totalCostUsd: parsed.total_cost_usd ?? 0,
-		numTurns: parsed.num_turns ?? 0,
-		durationMs,
-		structuredOutput,
-	};
-}
diff --git a/cli/src/runners/review-runner.ts b/cli/src/runners/review-runner.ts
deleted file mode 100644
index f292941..0000000
--- a/cli/src/runners/review-runner.ts
+++ /dev/null
@@ -1,355 +0,0 @@
-import { getFullScopePrompt, getPassPrompts } from "../prompts/review.js";
-import type {
-	PassName,
-	PassResult,
-	ReviewFinding,
-	ReviewFindingWithPass,
-	ReviewResult,
-	ReviewScope,
-} from "../schemas/review.js";
-import { findingsJsonSchema } from "../schemas/review.js";
-import type { HeadlessResult } from "./headless.js";
-import { runHeadless } from "./headless.js";
-
-export interface ReviewOptions {
-	scope: ReviewScope;
-	base: string;
-	include?: string;
-	parallel: boolean;
-	model: string;
-	maxCost?: number;
-	passes: 1 | 2 | 3;
-	verbose: boolean;
-}
-
-const PASS_ORDER: PassName[] = ["correctness", "security", "quality"];
-const BUDGET_WEIGHTS = [0.4, 0.35, 0.25];
-const SEVERITY_SORT: Record<string, number> = {
-	critical: 0,
-	high: 1,
-	medium: 2,
-	low: 3,
-	info: 4,
-};
-const SCORE_WEIGHTS: Record<string, number> = {
-	critical: 3,
-	high: 2,
-	medium: 1,
-	low: 0.5,
-	info: 0,
-};
-
-export async function detectBaseBranch(): Promise<string> {
-	for (const branch of ["staging", "main", "master"]) {
-		const proc = Bun.spawn(
-			["git", "rev-parse", "--verify", `refs/heads/${branch}`],
-			{ stdout: "pipe", stderr: "pipe" },
-		);
-		await proc.exited;
-		if (proc.exitCode === 0) return branch;
-	}
-	throw new Error(
-		"Could not auto-detect base branch. Specify --base <branch>.",
-	);
-}
-
-async function getDiff(
-	scope: ReviewScope,
-	base: string,
-	include?: string,
-): Promise<string> {
-	if (scope === "full") return "";
-	const args =
-		scope === "staged"
-			? ["git", "diff", "--cached"]
-			: ["git", "diff", `${base}...HEAD`];
-	if (include) args.push("--", include);
-	const proc = Bun.spawn(args, { stdout: "pipe", stderr: "pipe" });
-	const stdout = await new Response(proc.stdout).text();
-	await proc.exited;
-	return stdout;
-}
-
-async function getFilesChanged(
-	scope: ReviewScope,
-	base: string,
-	include?: string,
-): Promise<number> {
-	if (scope === "full") return 0;
-	const args =
-		scope === "staged"
-			? ["git", "diff", "--cached", "--numstat"]
-			: ["git", "diff", "--numstat", `${base}...HEAD`];
-	if (include) args.push("--", include);
-	const proc = Bun.spawn(args, { stdout: "pipe" });
-	const output = await new Response(proc.stdout).text();
-	await proc.exited;
-	return output.trim().split("\n").filter(Boolean).length;
-}
-
-function parseFindings(result: HeadlessResult): ReviewFinding[] {
-	if (result.structuredOutput && typeof result.structuredOutput === "object") {
-		const output = result.structuredOutput as { findings?: unknown };
-		if (Array.isArray(output.findings)) {
-			return output.findings as ReviewFinding[];
-		}
-	}
-	try {
-		const parsed = JSON.parse(result.result) as { findings?: unknown };
-		if (Array.isArray(parsed.findings)) {
-			return parsed.findings as ReviewFinding[];
-		}
-	} catch {
-		// text result without structured output
-	}
-	return [];
-}
-
-function parseSummary(result: HeadlessResult): string {
-	if (result.structuredOutput && typeof result.structuredOutput === "object") {
-		const output = result.structuredOutput as { summary?: unknown };
-		if (typeof output.summary === "string") return output.summary;
-	}
-	try {
-		const parsed = JSON.parse(result.result) as { summary?: unknown };
-		if (typeof parsed.summary === "string") return parsed.summary;
-	} catch {
-		// no structured summary
-	}
-	return "";
-}
-
-function mergeFindings(passResults: PassResult[]): ReviewFindingWithPass[] {
-	const seen = new Set<string>();
-	const merged: ReviewFindingWithPass[] = [];
-
-	for (const [i, pass] of passResults.entries()) {
-		for (const finding of pass.findings) {
-			const key = `${finding.file}:${finding.line}:${finding.title}`;
-			if (!seen.has(key)) {
-				seen.add(key);
-				merged.push({
-					...finding,
-					pass: i + 1,
-					passName: pass.name,
-				});
-			}
-		}
-	}
-
-	merged.sort(
-		(a, b) =>
-			(SEVERITY_SORT[a.severity] ?? 5) - (SEVERITY_SORT[b.severity] ?? 5),
-	);
-	return merged;
-}
-
-function calculateScore(findings: ReviewFindingWithPass[]): number {
-	const totalPoints = findings.reduce(
-		(sum, f) => sum + (SCORE_WEIGHTS[f.severity] ?? 0),
-		0,
-	);
-	return Math.max(1, Math.min(10, Math.round(10 - totalPoints)));
-}
-
-function buildCommonOpts(scope: ReviewScope) {
-	return {
-		maxTurns: scope === "full" ? 25 : 10,
-		permissionMode: "plan" as const,
-		allowedTools: [
-			"Read",
-			"Glob",
-			"Grep",
-			"Bash(git diff *)",
-			"Bash(git log *)",
-			"Bash(git show *)",
-		],
-		disallowedTools: ["Write", "Edit", "NotebookEdit"],
-		jsonSchema: findingsJsonSchema,
-	};
-}
-
-async function runSequential(
-	passOrder: PassName[],
-	diff: string,
-	options: ReviewOptions,
-): Promise<PassResult[]> {
-	const commonOpts = buildCommonOpts(options.scope);
-	const passResults: PassResult[] = [];
-	let sessionId: string | undefined;
-
-	for (const [i, passName] of passOrder.entries()) {
-		if (options.verbose) {
-			process.stderr.write(`Pass ${i + 1}: ${passName}...\n`);
-		}
-
-		const prompts = await getPassPrompts(passName, "sequential", {
-			DIFF: diff,
-		});
-
-		const spent = passResults.reduce((s, p) => s + p.costUsd, 0);
-		const effectiveBudget = options.maxCost
-			? Math.max(0.01, options.maxCost - spent)
-			: undefined;
-
-		let userPrompt = prompts.userPrompt;
-		if (options.scope === "full") {
-			userPrompt = getFullScopePrompt(passName, options.include);
-		}
-
-		try {
-			const result = await runHeadless({
-				...commonOpts,
-				prompt: userPrompt,
-				systemPromptFile: prompts.systemPromptFile,
-				model: options.model,
-				resume: sessionId,
-				maxBudgetUsd: effectiveBudget,
-			});
-
-			sessionId = result.sessionId;
-
-			passResults.push({
-				name: passName,
-				findings: parseFindings(result),
-				costUsd: result.totalCostUsd,
-				durationMs: result.durationMs,
-				sessionId: result.sessionId,
-			});
-		} catch (err) {
-			const message = err instanceof Error ? err.message : String(err);
-			passResults.push({
-				name: passName,
-				findings: [],
-				costUsd: 0,
-				durationMs: 0,
-				sessionId: sessionId ?? "",
-				error: message,
-			});
-			// Clear session if resume failed so next pass starts fresh
-			if (message.includes("resume")) sessionId = undefined;
-		}
-	}
-
-	return passResults;
-}
-
-async function runParallel(
-	passOrder: PassName[],
-	diff: string,
-	options: ReviewOptions,
-): Promise<PassResult[]> {
-	const commonOpts = buildCommonOpts(options.scope);
-
-	const promises = passOrder.map(async (passName, i) => {
-		if (options.verbose) {
-			process.stderr.write(`Pass ${i + 1}: ${passName} (parallel)...\n`);
-		}
-
-		const prompts = await getPassPrompts(passName, "parallel", {
-			DIFF: diff,
-		});
-
-		const budgetForPass = options.maxCost
-			? options.maxCost * BUDGET_WEIGHTS[i]
-			: undefined;
-
-		let userPrompt = prompts.userPrompt;
-		if (options.scope === "full") {
-			userPrompt = getFullScopePrompt(passName, options.include);
-		}
-
-		try {
-			const result = await runHeadless({
-				...commonOpts,
-				prompt: userPrompt,
-				systemPromptFile: prompts.systemPromptFile,
-				model: options.model,
-				maxBudgetUsd: budgetForPass,
-			});
-
-			return {
-				name: passName,
-				findings: parseFindings(result),
-				costUsd: result.totalCostUsd,
-				durationMs: result.durationMs,
-				sessionId: result.sessionId,
-			} as PassResult;
-		} catch (err) {
-			return {
-				name: passName,
-				findings: [],
-				costUsd: 0,
-				durationMs: 0,
-				sessionId: "",
-				error: err instanceof Error ? err.message : String(err),
-			} as PassResult;
-		}
-	});
-
-	return Promise.all(promises);
-}
-
-export async function runReview(options: ReviewOptions): Promise<ReviewResult> {
-	const [diff, filesChanged] = await Promise.all([
-		getDiff(options.scope, options.base, options.include),
-		getFilesChanged(options.scope, options.base, options.include),
-	]);
-
-	if (!diff && options.scope !== "full") {
-		return {
-			base: options.base,
-			head: "HEAD",
-			filesChanged: 0,
-			scope: options.scope,
-			score: 10,
-			findings: [],
-			summary: "No changes to review.",
-			passes: [],
-			totalCostUsd: 0,
-		};
-	}
-
-	const passOrder = PASS_ORDER.slice(0, options.passes);
-
-	const passResults = options.parallel
-		? await runParallel(passOrder, diff, options)
-		: await runSequential(passOrder, diff, options);
-
-	const findings = mergeFindings(passResults);
-	const score = calculateScore(findings);
-	const totalCostUsd = passResults.reduce((s, p) => s + p.costUsd, 0);
-
-	const summaries = passResults
-		.map((p) => {
-			const passSummary = p.sessionId
-				? parseSummary({
-						result: "",
-						sessionId: p.sessionId,
-						isError: false,
-						subtype: "",
-						totalCostUsd: 0,
-						numTurns: 0,
-						durationMs: 0,
-					})
-				: "";
-			return passSummary;
-		})
-		.filter(Boolean);
-
-	const summary =
-		summaries.join("\n\n") ||
-		`Review completed with ${findings.length} finding${findings.length === 1 ? "" : "s"} across ${passResults.length} pass${passResults.length === 1 ? "" : "es"}.`;
-
-	return {
-		base: options.base,
-		head: "HEAD",
-		filesChanged,
-		scope: options.scope,
-		score,
-		findings,
-		summary,
-		passes: passResults,
-		totalCostUsd,
-	};
-}
diff --git a/cli/src/schemas/index.ts b/cli/src/schemas/index.ts
new file mode 100644
index 0000000..ef23375
--- /dev/null
+++ b/cli/src/schemas/index.ts
@@ -0,0 +1,73 @@
+export type SymbolKind =
+	| "function"
+	| "class"
+	| "interface"
+	| "type"
+	| "const"
+	| "method"
+	| "enum";
+
+export interface IndexedFile {
+	path: string;
+	hash: string;
+	size: number;
+	language: string;
+	lineCount: number;
+	lastIndexed: string;
+}
+
+export interface IndexedFolder {
+	path: string;
+	description: string | null;
+	fileCount: number;
+	lastIndexed: string;
+}
+
+export interface IndexedSymbol {
+	id: number;
+	name: string;
+	kind: SymbolKind;
+	filePath: string;
+	lineStart: number;
+	lineEnd: number;
+	signature: string | null;
+	docstring: string | null;
+	parentName: string | null;
+	exported: boolean;
+	language: string;
+}
+
+export interface ScanResult {
+	newFiles: string[];
+	changedFiles: string[];
+	unchangedFiles: string[];
+	deletedFiles: string[];
+}
+
+export interface IndexStats {
+	totalFiles: number;
+	totalSymbols: number;
+	totalFolders: number;
+	byLanguage: Record<string, { files: number; symbols: number }>;
+	lastBuildTime: string | null;
+	dbSizeBytes: number;
+}
+
+export interface SearchHit {
+	symbol: IndexedSymbol;
+	rank: number;
+}
+
+export interface TreeEntry {
+	path: string;
+	type: "file" | "folder";
+	description?: string;
+	symbolCount: number;
+	children?: TreeEntry[];
+}
+
+export type BuildProgressCallback = (
+	phase: string,
+	current: number,
+	total: number,
+) => void;
diff --git a/cli/src/schemas/review.ts b/cli/src/schemas/review.ts
deleted file mode 100644
index 68bebf8..0000000
--- a/cli/src/schemas/review.ts
+++ /dev/null
@@ -1,67 +0,0 @@
-export type Severity = "critical" | "high" | "medium" | "low" | "info";
-export type ReviewScope = "diff" | "staged" | "full";
-export type PassName = "correctness" | "security" | "quality";
-
-export interface ReviewFinding {
-	file: string;
-	line: number | null;
-	severity: Severity;
-	category: string;
-	title: string;
-	description: string;
-	suggestion: string | null;
-}
-
-export interface PassResult {
-	name: PassName;
-	findings: ReviewFinding[];
-	costUsd: number;
-	durationMs: number;
-	sessionId: string;
-	error?: string;
-}
-
-export interface ReviewResult {
-	base: string;
-	head: string;
-	filesChanged: number;
-	scope: ReviewScope;
-	score: number;
-	findings: ReviewFindingWithPass[];
-	summary: string;
-	passes: PassResult[];
-	totalCostUsd: number;
-}
-
-export interface ReviewFindingWithPass extends ReviewFinding {
-	pass: number;
-	passName: PassName;
-}
-
-/** JSON schema sent to claude --json-schema for structured output */
-export const findingsJsonSchema = {
-	type: "object" as const,
-	required: ["findings", "summary"],
-	properties: {
-		findings: {
-			type: "array" as const,
-			items: {
-				type: "object" as const,
-				required: ["file", "severity", "category", "title", "description"],
-				properties: {
-					file: { type: "string" as const },
-					line: { type: ["number", "null"] as const },
-					severity: {
-						type: "string" as const,
-						enum: ["critical", "high", "medium", "low", "info"],
-					},
-					category: { type: "string" as const },
-					title: { type: "string" as const },
-					description: { type: "string" as const },
-					suggestion: { type: ["string", "null"] as const },
-				},
-			},
-		},
-		summary: { type: "string" as const },
-	},
-};
diff --git a/cli/src/utils/context.ts b/cli/src/utils/context.ts
new file mode 100644
index 0000000..49ff1ea
--- /dev/null
+++ b/cli/src/utils/context.ts
@@ -0,0 +1,41 @@
+import { existsSync } from "fs";
+import { resolveContainer } from "./docker.js";
+
+/**
+ * Detect if the current process is running inside a container.
+ */
+export function isInsideContainer(): boolean {
+	return (
+		existsSync("/.dockerenv") ||
+		!!process.env.REMOTE_CONTAINERS ||
+		existsSync("/run/.containerenv")
+	);
+}
+
+/**
+ * Resolve the target container ID for proxy mode.
+ */
+export async function getProxyTarget(containerName?: string): Promise<string> {
+	const container = await resolveContainer(containerName);
+	return container.id;
+}
+
+/**
+ * Proxy a CLI command into a running container.
+ */
+export async function proxyCommand(
+	containerId: string,
+	args: string[],
+): Promise<void> {
+	const proc = Bun.spawn(
+		["docker", "exec", containerId, "codeforge", ...args],
+		{
+			stdout: "inherit",
+			stderr: "inherit",
+			stdin: "inherit",
+		},
+	);
+
+	const exitCode = await proc.exited;
+	process.exit(exitCode);
+}
diff --git a/cli/src/utils/devcontainer.ts b/cli/src/utils/devcontainer.ts
new file mode 100644
index 0000000..fdda77b
--- /dev/null
+++ b/cli/src/utils/devcontainer.ts
@@ -0,0 +1,86 @@
+import { existsSync } from "fs";
+import { resolve } from "path";
+
+/**
+ * Find the devcontainer CLI binary.
+ * Checks PATH first, then falls back to npx.
+ */
+export async function findDevcontainerCli(): Promise<string> {
+	const devcontainerCheck = Bun.spawnSync(["which", "devcontainer"], {
+		stdout: "pipe",
+		stderr: "pipe",
+	});
+
+	if (devcontainerCheck.exitCode === 0) {
+		return "devcontainer";
+	}
+
+	const npxCheck = Bun.spawnSync(["which", "npx"], {
+		stdout: "pipe",
+		stderr: "pipe",
+	});
+
+	if (npxCheck.exitCode === 0) {
+		return "npx @devcontainers/cli";
+	}
+
+	throw new Error(
+		"devcontainer CLI not found. Install it via:\n  npm install -g @devcontainers/cli\n  Or install the VS Code Dev Containers extension",
+	);
+}
+
+/**
+ * Run devcontainer up to start or rebuild a devcontainer.
+ */
+export async function devcontainerUp(
+	workspacePath: string,
+	opts?: { rebuild?: boolean },
+): Promise<void> {
+	const cliBin = await findDevcontainerCli();
+	const args = cliBin.split(" ");
+	args.push("up", "--workspace-folder", workspacePath);
+
+	if (opts?.rebuild) {
+		args.push("--rebuild");
+	}
+
+	const proc = Bun.spawn(args, {
+		stdout: "inherit",
+		stderr: "inherit",
+	});
+
+	const exitCode = await proc.exited;
+	if (exitCode !== 0) {
+		throw new Error(`devcontainer up failed with exit code ${exitCode}`);
+	}
+}
+
+/**
+ * Rebuild a devcontainer (convenience wrapper).
+ */
+export async function devcontainerRebuild(
+	workspacePath: string,
+): Promise<void> {
+	await devcontainerUp(workspacePath, { rebuild: true });
+}
+
+/**
+ * Walk upward from startDir looking for .devcontainer/devcontainer.json.
+ * Returns the directory containing .devcontainer/, or null if not found.
+ */
+export function findWorkspacePath(startDir?: string): string | null {
+	let dir = resolve(startDir || process.cwd());
+
+	while (true) {
+		const candidate = resolve(dir, ".devcontainer", "devcontainer.json");
+		if (existsSync(candidate)) {
+			return dir;
+		}
+
+		const parent = resolve(dir, "..");
+		if (parent === dir) {
+			return null;
+		}
+		dir = parent;
+	}
+}
diff --git a/cli/src/utils/docker.ts b/cli/src/utils/docker.ts
new file mode 100644
index 0000000..5f4fff9
--- /dev/null
+++ b/cli/src/utils/docker.ts
@@ -0,0 +1,191 @@
+import { basename } from "path";
+
+export interface DevcontainerInfo {
+	id: string;
+	name: string;
+	status: string;
+	workspacePath: string;
+	image: string;
+	ports: string;
+}
+
+/**
+ * Check if the docker CLI is available on PATH.
+ */
+export function isDockerAvailable(): boolean {
+	const result = Bun.spawnSync(["which", "docker"], {
+		stdout: "pipe",
+		stderr: "pipe",
+	});
+	return result.exitCode === 0;
+}
+
+/**
+ * Query running devcontainers and parse docker ps JSON output.
+ */
+export async function listDevcontainers(): Promise<DevcontainerInfo[]> {
+	const proc = Bun.spawn(
+		[
+			"docker",
+			"ps",
+			"--filter",
+			"label=devcontainer.local_folder",
+			"--format",
+			'{"id":"{{.ID}}","name":"{{.Names}}","status":"{{.Status}}","image":"{{.Image}}","ports":"{{.Ports}}","labels":"{{.Labels}}"}',
+		],
+		{ stdout: "pipe", stderr: "pipe" },
+	);
+
+	const output = await new Response(proc.stdout).text();
+	const exitCode = await proc.exited;
+
+	if (exitCode !== 0) {
+		throw new Error("Failed to list Docker containers. Is Docker running?");
+	}
+
+	const lines = output.trim().split("\n").filter(Boolean);
+	const containers: DevcontainerInfo[] = [];
+
+	for (const line of lines) {
+		const raw = JSON.parse(line) as {
+			id: string;
+			name: string;
+			status: string;
+			image: string;
+			ports: string;
+			labels: string;
+		};
+
+		let workspacePath = "";
+		const labelPairs = raw.labels.split(",");
+		for (const pair of labelPairs) {
+			const [key, ...rest] = pair.split("=");
+			if (key === "devcontainer.local_folder") {
+				workspacePath = rest.join("=");
+				break;
+			}
+		}
+
+		containers.push({
+			id: raw.id,
+			name: workspacePath ? basename(workspacePath) : raw.name,
+			status: raw.status,
+			workspacePath,
+			image: raw.image,
+			ports: raw.ports,
+		});
+	}
+
+	return containers;
+}
+
+/**
+ * Resolve a container by name (workspace basename match).
+ * If no name is given with a single container, returns it.
+ * If multiple containers exist, shows an interactive picker when running in a TTY.
+ */
+export async function resolveContainer(
+	name?: string,
+): Promise<DevcontainerInfo> {
+	const containers = await listDevcontainers();
+
+	if (containers.length === 0) {
+		throw new Error(
+			"No running devcontainers found. Start one with: codeforge up",
+		);
+	}
+
+	if (name) {
+		const match = containers.find(
+			(c) => basename(c.workspacePath) === name || c.name === name,
+		);
+		if (!match) {
+			const available = containers
+				.map((c) => basename(c.workspacePath))
+				.join(", ");
+			throw new Error(`Container "${name}" not found. Available: ${available}`);
+		}
+		return match;
+	}
+
+	if (containers.length === 1) {
+		return containers[0];
+	}
+
+	if (!process.stdin.isTTY) {
+		const available = containers
+			.map((c) => basename(c.workspacePath))
+			.join(", ");
+		throw new Error(
+			`Multiple containers running. Specify one with --name: ${available}`,
+		);
+	}
+
+	process.stdout.write("Multiple containers found:\n");
+	containers.forEach((c, i) => {
+		process.stdout.write(
+			`  ${i + 1}) ${basename(c.workspacePath)} (${c.status})\n`,
+		);
+	});
+	process.stdout.write("Select container [1]: ");
+
+	const selection = await new Promise<string>((resolve) => {
+		let data = "";
+		process.stdin.setEncoding("utf-8");
+		process.stdin.once("data", (chunk: string) => {
+			data = chunk.trim();
+			resolve(data);
+		});
+	});
+
+	const index = selection === "" ? 0 : parseInt(selection, 10) - 1;
+	if (isNaN(index) || index < 0 || index >= containers.length) {
+		throw new Error("Invalid selection.");
+	}
+
+	return containers[index];
+}
+
+/**
+ * Execute a command inside a container.
+ */
+export async function dockerExec(
+	containerId: string,
+	cmd: string[],
+	opts?: { interactive?: boolean },
+): Promise<void> {
+	const args = ["docker", "exec"];
+	if (opts?.interactive) {
+		args.push("-it");
+	}
+	args.push(containerId, ...cmd);
+
+	const proc = Bun.spawn(args, {
+		stdout: "inherit",
+		stderr: "inherit",
+		stdin: opts?.interactive ? "inherit" : undefined,
+	});
+
+	const exitCode = await proc.exited;
+	if (exitCode !== 0) {
+		throw new Error(`Command exited with code ${exitCode}`);
+	}
+}
+
+/**
+ * Stop a running container.
+ */
+export async function dockerStop(containerId: string): Promise<void> {
+	const proc = Bun.spawn(["docker", "stop", containerId], {
+		stdout: "pipe",
+		stderr: "pipe",
+	});
+
+	const exitCode = await proc.exited;
+	if (exitCode !== 0) {
+		const stderr = await new Response(proc.stderr).text();
+		throw new Error(
+			`Failed to stop container ${containerId}: ${stderr.trim()}`,
+		);
+	}
+}
diff --git a/cli/src/utils/platform.ts b/cli/src/utils/platform.ts
index 4b82dbc..ae30bbd 100644
--- a/cli/src/utils/platform.ts
+++ b/cli/src/utils/platform.ts
@@ -29,3 +29,5 @@ export function resolveNormalized(...segments: string[]): string {
 export function basenameFromPath(filePath: string): string {
 	return basename(normalizePath(filePath));
 }
+
+export { isInsideContainer } from "./context.js";
diff --git a/cli/tests/index-commands.test.ts b/cli/tests/index-commands.test.ts
new file mode 100644
index 0000000..40cddb1
--- /dev/null
+++ b/cli/tests/index-commands.test.ts
@@ -0,0 +1,383 @@
+import type { Database } from "bun:sqlite";
+import { afterEach, beforeEach, describe, expect, test } from "bun:test";
+import { mkdirSync, mkdtempSync, writeFileSync } from "fs";
+import { tmpdir } from "os";
+import { join } from "path";
+import {
+	closeDatabase,
+	insertFiles,
+	insertSymbols,
+	openDatabase,
+	upsertFolders,
+} from "../src/indexer/db.js";
+import { extractFolderDocs } from "../src/indexer/folders.js";
+import { collectDirectories } from "../src/indexer/scanner.js";
+import {
+	formatBuildJson,
+	formatSearchJson,
+	formatShowJson,
+	formatStatsJson,
+	formatTreeJson,
+} from "../src/output/index-json.js";
+import {
+	formatBuildSummary,
+	formatSearchText,
+	formatShowText,
+	formatStatsText,
+	formatTreeText,
+} from "../src/output/index-text.js";
+import type {
+	IndexedFile,
+	IndexedFolder,
+	IndexedSymbol,
+	IndexStats,
+	ScanResult,
+	SearchHit,
+	TreeEntry,
+} from "../src/schemas/index.js";
+
+// --- Helpers ---
+
+function makeFile(overrides: Partial<IndexedFile> = {}): IndexedFile {
+	return {
+		path: "src/main.ts",
+		hash: "abc123",
+		size: 1024,
+		language: "typescript",
+		lineCount: 50,
+		lastIndexed: "2026-03-08 12:00:00",
+		...overrides,
+	};
+}
+
+function makeSymbol(overrides: Partial<IndexedSymbol> = {}): IndexedSymbol {
+	return {
+		id: 1,
+		name: "myFunction",
+		kind: "function",
+		filePath: "src/main.ts",
+		lineStart: 10,
+		lineEnd: 20,
+		signature: "function myFunction(x: number): string",
+		docstring: "Does something useful",
+		parentName: null,
+		exported: true,
+		language: "typescript",
+		...overrides,
+	};
+}
+
+function makeHit(overrides: Partial<SearchHit> = {}): SearchHit {
+	return {
+		symbol: makeSymbol(),
+		rank: -1.5,
+		...overrides,
+	};
+}
+
+// --- Output formatters: text ---
+
+describe("formatSearchText", () => {
+	test("formats search hits grouped by file", () => {
+		const hits = [
+			makeHit({ symbol: makeSymbol({ filePath: "a.ts", name: "foo" }) }),
+			makeHit({
+				symbol: makeSymbol({
+					filePath: "a.ts",
+					name: "bar",
+					id: 2,
+					lineStart: 30,
+					lineEnd: 40,
+				}),
+			}),
+			makeHit({ symbol: makeSymbol({ filePath: "b.ts", name: "baz", id: 3 }) }),
+		];
+		const output = formatSearchText(hits, { noColor: true });
+		expect(output).toContain("a.ts");
+		expect(output).toContain("b.ts");
+		expect(output).toContain("foo");
+		expect(output).toContain("bar");
+		expect(output).toContain("baz");
+		expect(output).toContain("3 results");
+	});
+
+	test("shows no results message for empty hits", () => {
+		const output = formatSearchText([], { noColor: true });
+		expect(output).toContain("No results");
+	});
+
+	test("truncates long docstrings", () => {
+		const longDoc = "A".repeat(150);
+		const hits = [makeHit({ symbol: makeSymbol({ docstring: longDoc }) })];
+		const output = formatSearchText(hits, { noColor: true });
+		expect(output).toContain("...");
+	});
+});
+
+describe("formatShowText", () => {
+	test("formats symbols for a file", () => {
+		const symbols = [
+			makeSymbol({ name: "alpha", exported: true }),
+			makeSymbol({
+				name: "beta",
+				exported: false,
+				id: 2,
+				lineStart: 25,
+				lineEnd: 35,
+			}),
+		];
+		const output = formatShowText("src/main.ts", symbols, { noColor: true });
+		expect(output).toContain("src/main.ts");
+		expect(output).toContain("alpha");
+		expect(output).toContain("beta");
+		expect(output).toContain("exported");
+		expect(output).toContain("local");
+		expect(output).toContain("2 symbols");
+	});
+
+	test("shows no symbols message for empty list", () => {
+		const output = formatShowText("src/empty.ts", [], { noColor: true });
+		expect(output).toContain("No symbols");
+	});
+});
+
+describe("formatStatsText", () => {
+	test("formats stats with language breakdown", () => {
+		const stats: IndexStats = {
+			totalFiles: 42,
+			totalSymbols: 150,
+			totalFolders: 8,
+			byLanguage: {
+				typescript: { files: 30, symbols: 120 },
+				python: { files: 12, symbols: 30 },
+			},
+			lastBuildTime: "2026-03-08 12:00:00",
+			dbSizeBytes: 2048,
+		};
+		const output = formatStatsText(stats, { noColor: true });
+		expect(output).toContain("42");
+		expect(output).toContain("150");
+		expect(output).toContain("8");
+		expect(output).toContain("typescript");
+		expect(output).toContain("python");
+		expect(output).toContain("2026-03-08");
+	});
+});
+
+describe("formatTreeText", () => {
+	test("renders tree with hierarchy", () => {
+		const entries: TreeEntry[] = [
+			{
+				path: "src",
+				type: "folder",
+				description: "Source code",
+				symbolCount: 50,
+				children: [{ path: "main.ts", type: "file", symbolCount: 10 }],
+			},
+		];
+		const output = formatTreeText(entries, { noColor: true });
+		expect(output).toContain("src");
+		expect(output).toContain("main.ts");
+		expect(output).toContain("Source code");
+	});
+
+	test("shows no entries message for empty list", () => {
+		const output = formatTreeText([], { noColor: true });
+		expect(output).toContain("No entries");
+	});
+});
+
+describe("formatBuildSummary", () => {
+	test("formats build results", () => {
+		const result = {
+			scanned: {
+				newFiles: ["a.ts", "b.ts"],
+				changedFiles: ["c.ts"],
+				unchangedFiles: ["d.ts", "e.ts", "f.ts"],
+				deletedFiles: [],
+			},
+			symbolCount: 42,
+			durationMs: 1234,
+		};
+		const output = formatBuildSummary(result, { noColor: true });
+		expect(output).toContain("2");
+		expect(output).toContain("1");
+		expect(output).toContain("3");
+		expect(output).toContain("0");
+		expect(output).toContain("42");
+		expect(output).toContain("1234");
+	});
+});
+
+// --- Output formatters: JSON ---
+
+describe("formatSearchJson", () => {
+	test("returns valid JSON with results array", () => {
+		const hits = [makeHit()];
+		const json = JSON.parse(formatSearchJson(hits));
+		expect(json.results).toHaveLength(1);
+		expect(json.total).toBe(1);
+		expect(json.results[0].symbol.name).toBe("myFunction");
+	});
+
+	test("returns empty results for no hits", () => {
+		const json = JSON.parse(formatSearchJson([]));
+		expect(json.results).toHaveLength(0);
+		expect(json.total).toBe(0);
+	});
+});
+
+describe("formatShowJson", () => {
+	test("returns valid JSON with file and symbols", () => {
+		const symbols = [makeSymbol()];
+		const json = JSON.parse(formatShowJson("src/main.ts", symbols));
+		expect(json.file).toBe("src/main.ts");
+		expect(json.symbols).toHaveLength(1);
+		expect(json.total).toBe(1);
+	});
+});
+
+describe("formatStatsJson", () => {
+	test("returns valid JSON with all stats fields", () => {
+		const stats: IndexStats = {
+			totalFiles: 10,
+			totalSymbols: 50,
+			totalFolders: 3,
+			byLanguage: { typescript: { files: 10, symbols: 50 } },
+			lastBuildTime: "2026-03-08",
+			dbSizeBytes: 4096,
+		};
+		const json = JSON.parse(formatStatsJson(stats));
+		expect(json.totalFiles).toBe(10);
+		expect(json.totalSymbols).toBe(50);
+		expect(json.byLanguage.typescript.files).toBe(10);
+	});
+});
+
+describe("formatTreeJson", () => {
+	test("returns valid JSON with tree array", () => {
+		const entries: TreeEntry[] = [
+			{ path: "src", type: "folder", symbolCount: 5 },
+		];
+		const json = JSON.parse(formatTreeJson(entries));
+		expect(json.tree).toHaveLength(1);
+		expect(json.tree[0].path).toBe("src");
+	});
+});
+
+describe("formatBuildJson", () => {
+	test("returns valid JSON with counts", () => {
+		const result = {
+			scanned: {
+				newFiles: ["a.ts"],
+				changedFiles: [],
+				unchangedFiles: ["b.ts"],
+				deletedFiles: [],
+			},
+			symbolCount: 20,
+			durationMs: 500,
+		};
+		const json = JSON.parse(formatBuildJson(result));
+		expect(json.scanned.newFiles).toBe(1);
+		expect(json.scanned.changedFiles).toBe(0);
+		expect(json.totalSymbols).toBe(20);
+		expect(json.durationMs).toBe(500);
+	});
+});
+
+// --- Scanner: collectDirectories ---
+
+describe("collectDirectories", () => {
+	test("finds subdirectories recursively", async () => {
+		const tmpDir = mkdtempSync(join(tmpdir(), "scan-test-"));
+		mkdirSync(join(tmpDir, "src"), { recursive: true });
+		mkdirSync(join(tmpDir, "src", "utils"), { recursive: true });
+		mkdirSync(join(tmpDir, "tests"), { recursive: true });
+
+		const dirs = await collectDirectories(tmpDir);
+		expect(dirs).toContain("src");
+		expect(dirs).toContain(join("src", "utils"));
+		expect(dirs).toContain("tests");
+	});
+
+	test("ignores node_modules and .git", async () => {
+		const tmpDir = mkdtempSync(join(tmpdir(), "scan-test-"));
+		mkdirSync(join(tmpDir, "src"), { recursive: true });
+		mkdirSync(join(tmpDir, "node_modules", "pkg"), { recursive: true });
+		mkdirSync(join(tmpDir, ".git", "refs"), { recursive: true });
+		mkdirSync(join(tmpDir, "dist"), { recursive: true });
+
+		const dirs = await collectDirectories(tmpDir);
+		expect(dirs).toContain("src");
+		expect(dirs).not.toContain("node_modules");
+		expect(dirs).not.toContain(".git");
+		expect(dirs).not.toContain("dist");
+	});
+});
+
+// --- Folders: extractFolderDocs ---
+
+describe("extractFolderDocs", () => {
+	test("extracts description from README.md", async () => {
+		const tmpDir = mkdtempSync(join(tmpdir(), "folder-test-"));
+		mkdirSync(join(tmpDir, "src"), { recursive: true });
+		writeFileSync(
+			join(tmpDir, "src", "README.md"),
+			"# Source\n\nThis contains the main source code.\n\n## Details\nMore info here.",
+		);
+
+		const folders = await extractFolderDocs(["src"], tmpDir);
+		expect(folders).toHaveLength(1);
+		expect(folders[0].path).toBe("src");
+		expect(folders[0].description).toBe("This contains the main source code.");
+	});
+
+	test("skips headings and badges to find first paragraph", async () => {
+		const tmpDir = mkdtempSync(join(tmpdir(), "folder-test-"));
+		mkdirSync(join(tmpDir, "lib"), { recursive: true });
+		writeFileSync(
+			join(tmpDir, "lib", "README.md"),
+			"# Library\n\n![badge](url)\n\n[![badge2](url2)](link)\n\nActual description here.\n",
+		);
+
+		const folders = await extractFolderDocs(["lib"], tmpDir);
+		expect(folders[0].description).toBe("Actual description here.");
+	});
+
+	test("returns null description when no README exists", async () => {
+		const tmpDir = mkdtempSync(join(tmpdir(), "folder-test-"));
+		mkdirSync(join(tmpDir, "empty"), { recursive: true });
+
+		const folders = await extractFolderDocs(["empty"], tmpDir);
+		expect(folders[0].description).toBeNull();
+	});
+
+	test("uses YAML overrides when available", async () => {
+		const tmpDir = mkdtempSync(join(tmpdir(), "folder-test-"));
+		mkdirSync(join(tmpDir, "src"), { recursive: true });
+		mkdirSync(join(tmpDir, ".codeforge", "data"), { recursive: true });
+		writeFileSync(
+			join(tmpDir, "src", "README.md"),
+			"# Source\n\nREADME description.\n",
+		);
+		writeFileSync(
+			join(tmpDir, ".codeforge", "data", "folders.yaml"),
+			'src: "Override description from YAML"\n',
+		);
+
+		const folders = await extractFolderDocs(["src"], tmpDir);
+		expect(folders[0].description).toBe("Override description from YAML");
+	});
+
+	test("counts recognized files in directory", async () => {
+		const tmpDir = mkdtempSync(join(tmpdir(), "folder-test-"));
+		mkdirSync(join(tmpDir, "src"), { recursive: true });
+		writeFileSync(join(tmpDir, "src", "a.ts"), "");
+		writeFileSync(join(tmpDir, "src", "b.js"), "");
+		writeFileSync(join(tmpDir, "src", "c.py"), "");
+		writeFileSync(join(tmpDir, "src", "d.md"), ""); // not recognized
+
+		const folders = await extractFolderDocs(["src"], tmpDir);
+		expect(folders[0].fileCount).toBe(3);
+	});
+});
diff --git a/cli/tests/indexer-db.test.ts b/cli/tests/indexer-db.test.ts
new file mode 100644
index 0000000..413054b
--- /dev/null
+++ b/cli/tests/indexer-db.test.ts
@@ -0,0 +1,382 @@
+import type { Database } from "bun:sqlite";
+import { afterEach, beforeEach, describe, expect, test } from "bun:test";
+import { mkdtempSync, writeFileSync } from "fs";
+import { tmpdir } from "os";
+import { join } from "path";
+import {
+	closeDatabase,
+	deleteFileAndSymbols,
+	getAllFolders,
+	getFileByPath,
+	getFileSymbols,
+	getStats,
+	insertFiles,
+	insertSymbols,
+	openDatabase,
+	rebuildFts,
+	searchSymbols,
+	upsertFolders,
+} from "../src/indexer/db.js";
+import type {
+	IndexedFile,
+	IndexedFolder,
+	IndexedSymbol,
+} from "../src/schemas/index.js";
+
+let db: Database;
+let tmpDir: string;
+let dbPath: string;
+
+beforeEach(() => {
+	tmpDir = mkdtempSync(join(tmpdir(), "codeforge-test-"));
+	dbPath = join(tmpDir, "test-index.db");
+	db = openDatabase(dbPath);
+});
+
+afterEach(() => {
+	closeDatabase(db);
+});
+
+function makeFile(overrides: Partial<IndexedFile> = {}): IndexedFile {
+	return {
+		path: "src/main.ts",
+		hash: "abc123",
+		size: 1024,
+		language: "typescript",
+		lineCount: 50,
+		lastIndexed: "2026-03-08 12:00:00",
+		...overrides,
+	};
+}
+
+function makeSymbol(
+	overrides: Partial<Omit<IndexedSymbol, "id">> = {},
+): Omit<IndexedSymbol, "id"> {
+	return {
+		name: "myFunction",
+		kind: "function",
+		filePath: "src/main.ts",
+		lineStart: 10,
+		lineEnd: 20,
+		signature: "function myFunction(x: number): string",
+		docstring: "Does something useful",
+		parentName: null,
+		exported: true,
+		language: "typescript",
+		...overrides,
+	};
+}
+
+describe("openDatabase", () => {
+	test("creates database with tables", () => {
+		const tables = db
+			.prepare(
+				"SELECT name FROM sqlite_master WHERE type='table' ORDER BY name",
+			)
+			.all() as Array<{ name: string }>;
+		const names = tables.map((t) => t.name);
+		expect(names).toContain("files");
+		expect(names).toContain("folders");
+		expect(names).toContain("symbols");
+	});
+
+	test("enables WAL mode", () => {
+		const result = db.prepare("PRAGMA journal_mode").get() as {
+			journal_mode: string;
+		};
+		expect(result.journal_mode).toBe("wal");
+	});
+
+	test("enables foreign keys", () => {
+		const result = db.prepare("PRAGMA foreign_keys").get() as {
+			foreign_keys: number;
+		};
+		expect(result.foreign_keys).toBe(1);
+	});
+});
+
+describe("insertFiles and getFileByPath", () => {
+	test("inserts and retrieves a file", () => {
+		const file = makeFile();
+		insertFiles(db, [file]);
+
+		const retrieved = getFileByPath(db, "src/main.ts");
+		expect(retrieved).not.toBeNull();
+		expect(retrieved!.path).toBe("src/main.ts");
+		expect(retrieved!.hash).toBe("abc123");
+		expect(retrieved!.size).toBe(1024);
+		expect(retrieved!.language).toBe("typescript");
+		expect(retrieved!.lineCount).toBe(50);
+	});
+
+	test("returns null for nonexistent file", () => {
+		expect(getFileByPath(db, "nope.ts")).toBeNull();
+	});
+
+	test("upserts on duplicate path", () => {
+		insertFiles(db, [makeFile({ hash: "first" })]);
+		insertFiles(db, [makeFile({ hash: "second" })]);
+
+		const retrieved = getFileByPath(db, "src/main.ts");
+		expect(retrieved!.hash).toBe("second");
+	});
+
+	test("inserts multiple files in batch", () => {
+		const files = [
+			makeFile({ path: "a.ts", hash: "h1" }),
+			makeFile({ path: "b.ts", hash: "h2" }),
+			makeFile({ path: "c.ts", hash: "h3" }),
+		];
+		insertFiles(db, files);
+
+		expect(getFileByPath(db, "a.ts")).not.toBeNull();
+		expect(getFileByPath(db, "b.ts")).not.toBeNull();
+		expect(getFileByPath(db, "c.ts")).not.toBeNull();
+	});
+});
+
+describe("insertSymbols and getFileSymbols", () => {
+	test("inserts and retrieves symbols for a file", () => {
+		insertFiles(db, [makeFile()]);
+		insertSymbols(db, [
+			makeSymbol(),
+			makeSymbol({ name: "anotherFn", lineStart: 25, lineEnd: 30 }),
+		]);
+
+		const symbols = getFileSymbols(db, "src/main.ts");
+		expect(symbols).toHaveLength(2);
+		expect(symbols[0].name).toBe("myFunction");
+		expect(symbols[1].name).toBe("anotherFn");
+	});
+
+	test("returns symbols ordered by line_start", () => {
+		insertFiles(db, [makeFile()]);
+		insertSymbols(db, [
+			makeSymbol({ name: "last", lineStart: 100, lineEnd: 110 }),
+			makeSymbol({ name: "first", lineStart: 1, lineEnd: 5 }),
+			makeSymbol({ name: "middle", lineStart: 50, lineEnd: 60 }),
+		]);
+
+		const symbols = getFileSymbols(db, "src/main.ts");
+		expect(symbols[0].name).toBe("first");
+		expect(symbols[1].name).toBe("middle");
+		expect(symbols[2].name).toBe("last");
+	});
+
+	test("maps exported boolean correctly", () => {
+		insertFiles(db, [makeFile()]);
+		insertSymbols(db, [
+			makeSymbol({ name: "exp", exported: true }),
+			makeSymbol({
+				name: "local",
+				exported: false,
+				lineStart: 30,
+				lineEnd: 40,
+			}),
+		]);
+
+		const symbols = getFileSymbols(db, "src/main.ts");
+		const exp = symbols.find((s) => s.name === "exp");
+		const local = symbols.find((s) => s.name === "local");
+		expect(exp!.exported).toBe(true);
+		expect(local!.exported).toBe(false);
+	});
+});
+
+describe("deleteFileAndSymbols", () => {
+	test("removes file and its symbols", () => {
+		insertFiles(db, [makeFile()]);
+		insertSymbols(db, [makeSymbol()]);
+
+		deleteFileAndSymbols(db, "src/main.ts");
+
+		expect(getFileByPath(db, "src/main.ts")).toBeNull();
+		expect(getFileSymbols(db, "src/main.ts")).toHaveLength(0);
+	});
+
+	test("does not affect other files", () => {
+		insertFiles(db, [makeFile({ path: "a.ts" }), makeFile({ path: "b.ts" })]);
+		insertSymbols(db, [
+			makeSymbol({ filePath: "a.ts", name: "fnA" }),
+			makeSymbol({ filePath: "b.ts", name: "fnB" }),
+		]);
+
+		deleteFileAndSymbols(db, "a.ts");
+
+		expect(getFileByPath(db, "a.ts")).toBeNull();
+		expect(getFileByPath(db, "b.ts")).not.toBeNull();
+		expect(getFileSymbols(db, "b.ts")).toHaveLength(1);
+	});
+});
+
+describe("upsertFolders and getAllFolders", () => {
+	test("inserts and retrieves folders", () => {
+		const folders: IndexedFolder[] = [
+			{
+				path: "src",
+				description: "Source code",
+				fileCount: 10,
+				lastIndexed: "2026-03-08 12:00:00",
+			},
+			{
+				path: "tests",
+				description: null,
+				fileCount: 5,
+				lastIndexed: "2026-03-08 12:00:00",
+			},
+		];
+		upsertFolders(db, folders);
+
+		const all = getAllFolders(db);
+		expect(all).toHaveLength(2);
+		expect(all[0].path).toBe("src");
+		expect(all[0].description).toBe("Source code");
+		expect(all[0].fileCount).toBe(10);
+		expect(all[1].path).toBe("tests");
+		expect(all[1].description).toBeNull();
+	});
+
+	test("upserts on duplicate path", () => {
+		upsertFolders(db, [
+			{
+				path: "src",
+				description: "Old",
+				fileCount: 5,
+				lastIndexed: "2026-03-01",
+			},
+		]);
+		upsertFolders(db, [
+			{
+				path: "src",
+				description: "New",
+				fileCount: 15,
+				lastIndexed: "2026-03-08",
+			},
+		]);
+
+		const all = getAllFolders(db);
+		expect(all).toHaveLength(1);
+		expect(all[0].description).toBe("New");
+		expect(all[0].fileCount).toBe(15);
+	});
+});
+
+describe("searchSymbols (FTS5)", () => {
+	test("finds symbols by name", () => {
+		insertFiles(db, [makeFile()]);
+		insertSymbols(db, [
+			makeSymbol({ name: "calculateTotal" }),
+			makeSymbol({ name: "formatOutput", lineStart: 30, lineEnd: 40 }),
+		]);
+
+		const hits = searchSymbols(db, "calculateTotal");
+		expect(hits.length).toBeGreaterThan(0);
+		expect(hits[0].symbol.name).toBe("calculateTotal");
+		expect(typeof hits[0].rank).toBe("number");
+	});
+
+	test("finds symbols by signature content", () => {
+		insertFiles(db, [makeFile()]);
+		insertSymbols(db, [
+			makeSymbol({
+				name: "parse",
+				signature: "function parse(input: string): AST",
+			}),
+		]);
+
+		const hits = searchSymbols(db, "AST");
+		expect(hits.length).toBeGreaterThan(0);
+		expect(hits[0].symbol.name).toBe("parse");
+	});
+
+	test("finds symbols by docstring content", () => {
+		insertFiles(db, [makeFile()]);
+		insertSymbols(db, [
+			makeSymbol({
+				name: "validate",
+				docstring: "Validates user authentication tokens",
+			}),
+		]);
+
+		const hits = searchSymbols(db, "authentication");
+		expect(hits.length).toBeGreaterThan(0);
+		expect(hits[0].symbol.name).toBe("validate");
+	});
+
+	test("respects limit parameter", () => {
+		insertFiles(db, [makeFile()]);
+		const symbols = Array.from({ length: 10 }, (_, i) =>
+			makeSymbol({
+				name: `fn${i}`,
+				lineStart: i * 10,
+				lineEnd: i * 10 + 5,
+				signature: `function fn${i}(): void`,
+			}),
+		);
+		insertSymbols(db, symbols);
+
+		const hits = searchSymbols(db, "fn", 3);
+		expect(hits.length).toBeLessThanOrEqual(3);
+	});
+
+	test("returns empty for no matches", () => {
+		insertFiles(db, [makeFile()]);
+		insertSymbols(db, [makeSymbol()]);
+
+		const hits = searchSymbols(db, "zzzznonexistent");
+		expect(hits).toHaveLength(0);
+	});
+});
+
+describe("getStats", () => {
+	test("returns correct counts", () => {
+		insertFiles(db, [
+			makeFile({ path: "a.ts", language: "typescript" }),
+			makeFile({ path: "b.py", language: "python" }),
+		]);
+		insertSymbols(db, [
+			makeSymbol({ filePath: "a.ts", name: "fn1" }),
+			makeSymbol({ filePath: "a.ts", name: "fn2", lineStart: 30, lineEnd: 40 }),
+			makeSymbol({ filePath: "b.py", name: "fn3", language: "python" }),
+		]);
+		upsertFolders(db, [
+			{
+				path: "src",
+				description: null,
+				fileCount: 2,
+				lastIndexed: "2026-03-08",
+			},
+		]);
+
+		const stats = getStats(db, dbPath);
+		expect(stats.totalFiles).toBe(2);
+		expect(stats.totalSymbols).toBe(3);
+		expect(stats.totalFolders).toBe(1);
+		expect(stats.byLanguage.typescript.files).toBe(1);
+		expect(stats.byLanguage.typescript.symbols).toBe(2);
+		expect(stats.byLanguage.python.files).toBe(1);
+		expect(stats.byLanguage.python.symbols).toBe(1);
+		expect(stats.dbSizeBytes).toBeGreaterThan(0);
+		expect(stats.lastBuildTime).not.toBeNull();
+	});
+
+	test("returns zeros for empty database", () => {
+		const stats = getStats(db, dbPath);
+		expect(stats.totalFiles).toBe(0);
+		expect(stats.totalSymbols).toBe(0);
+		expect(stats.totalFolders).toBe(0);
+		expect(Object.keys(stats.byLanguage)).toHaveLength(0);
+	});
+});
+
+describe("rebuildFts", () => {
+	test("rebuilds FTS index without error", () => {
+		insertFiles(db, [makeFile()]);
+		insertSymbols(db, [makeSymbol()]);
+
+		expect(() => rebuildFts(db)).not.toThrow();
+
+		const hits = searchSymbols(db, "myFunction");
+		expect(hits.length).toBeGreaterThan(0);
+	});
+});
diff --git a/cli/tests/indexer-extractor.test.ts b/cli/tests/indexer-extractor.test.ts
new file mode 100644
index 0000000..7fcb838
--- /dev/null
+++ b/cli/tests/indexer-extractor.test.ts
@@ -0,0 +1,307 @@
+import { describe, expect, test } from "bun:test";
+import { mkdtempSync, writeFileSync } from "fs";
+import { tmpdir } from "os";
+import { join } from "path";
+import {
+	determineSymbolKind,
+	extractDocstring,
+	extractSignature,
+	extractSymbolName,
+} from "../src/indexer/extractor.js";
+import {
+	getPythonRules,
+	getRulesForLanguage,
+	getTypescriptRules,
+} from "../src/indexer/rules.js";
+import {
+	getLanguageForExtension,
+	hashFileContent,
+} from "../src/indexer/scanner.js";
+
+describe("extractSignature", () => {
+	test("strips function body for TypeScript", () => {
+		const text =
+			"export function greet(name: string): string {\n  return `Hello ${name}`;\n}";
+		expect(extractSignature(text, "typescript")).toBe(
+			"export function greet(name: string): string",
+		);
+	});
+
+	test("returns full text when no braces", () => {
+		const text = "export type Foo = string | number";
+		expect(extractSignature(text, "typescript")).toBe(
+			"export type Foo = string | number",
+		);
+	});
+
+	test("handles multiline signatures before brace", () => {
+		const text =
+			"export function foo(\n  x: number,\n  y: string\n): boolean {\n  return true;\n}";
+		expect(extractSignature(text, "typescript")).toBe(
+			"export function foo(\n  x: number,\n  y: string\n): boolean",
+		);
+	});
+
+	test("extracts first line for Python without colon", () => {
+		const text = "def calculate(x, y):\n    return x + y";
+		expect(extractSignature(text, "python")).toBe("def calculate(x, y)");
+	});
+
+	test("handles Python class definition", () => {
+		const text = "class MyClass(Base):\n    pass";
+		expect(extractSignature(text, "python")).toBe("class MyClass(Base)");
+	});
+});
+
+describe("extractDocstring", () => {
+	test("extracts triple-double-quoted Python docstring", () => {
+		const text = 'def foo():\n    """This is the docstring."""\n    pass';
+		expect(extractDocstring(text, "python")).toBe("This is the docstring.");
+	});
+
+	test("extracts multiline Python docstring", () => {
+		const text =
+			'def foo():\n    """\n    Multi-line\n    docstring.\n    """\n    pass';
+		const result = extractDocstring(text, "python");
+		expect(result).toContain("Multi-line");
+		expect(result).toContain("docstring.");
+	});
+
+	test("extracts triple-single-quoted Python docstring", () => {
+		const text = "def foo():\n    '''Single quoted docs.'''\n    pass";
+		expect(extractDocstring(text, "python")).toBe("Single quoted docs.");
+	});
+
+	test("returns null for Python without docstring", () => {
+		const text = "def foo():\n    return 42";
+		expect(extractDocstring(text, "python")).toBeNull();
+	});
+
+	test("returns null for TypeScript", () => {
+		expect(extractDocstring("function foo() {}", "typescript")).toBeNull();
+	});
+});
+
+describe("extractSymbolName", () => {
+	test("extracts function name", () => {
+		expect(
+			extractSymbolName("function greet(name: string) {}", "ts-function"),
+		).toBe("greet");
+	});
+
+	test("extracts exported function name", () => {
+		expect(
+			extractSymbolName("export function calculate() {}", "ts-export"),
+		).toBe("calculate");
+	});
+
+	test("extracts async function name", () => {
+		expect(
+			extractSymbolName("export async function fetchData() {}", "ts-export"),
+		).toBe("fetchData");
+	});
+
+	test("extracts class name", () => {
+		expect(extractSymbolName("class MyService {}", "ts-class")).toBe(
+			"MyService",
+		);
+	});
+
+	test("extracts interface name", () => {
+		expect(
+			extractSymbolName("export interface UserConfig {}", "ts-export"),
+		).toBe("UserConfig");
+	});
+
+	test("extracts type alias name", () => {
+		expect(
+			extractSymbolName("export type Result = string | Error", "ts-export"),
+		).toBe("Result");
+	});
+
+	test("extracts const name", () => {
+		expect(extractSymbolName("export const MAX_RETRY = 3", "ts-export")).toBe(
+			"MAX_RETRY",
+		);
+	});
+
+	test("extracts enum name", () => {
+		expect(
+			extractSymbolName("export enum Status { Active, Inactive }", "ts-export"),
+		).toBe("Status");
+	});
+
+	test("extracts Python def name", () => {
+		expect(extractSymbolName("def process_data(input):", "py-function")).toBe(
+			"process_data",
+		);
+	});
+
+	test("extracts Python class name", () => {
+		expect(extractSymbolName("class DataProcessor:", "py-class")).toBe(
+			"DataProcessor",
+		);
+	});
+
+	test("returns unknown for unparseable text", () => {
+		expect(extractSymbolName("???", "ts-export")).toBe("unknown");
+	});
+});
+
+describe("determineSymbolKind", () => {
+	test("returns function for ts-function rule", () => {
+		expect(determineSymbolKind("function foo() {}", "ts-function")).toBe(
+			"function",
+		);
+	});
+
+	test("returns class for ts-class rule", () => {
+		expect(determineSymbolKind("class Foo {}", "ts-class")).toBe("class");
+	});
+
+	test("returns interface for ts-interface rule", () => {
+		expect(determineSymbolKind("interface Foo {}", "ts-interface")).toBe(
+			"interface",
+		);
+	});
+
+	test("returns function for py-function rule", () => {
+		expect(determineSymbolKind("def foo():", "py-function")).toBe("function");
+	});
+
+	test("returns class for py-class rule", () => {
+		expect(determineSymbolKind("class Foo:", "py-class")).toBe("class");
+	});
+
+	test("detects function from ts-export text", () => {
+		expect(determineSymbolKind("export function foo() {}", "ts-export")).toBe(
+			"function",
+		);
+	});
+
+	test("detects class from ts-export text", () => {
+		expect(determineSymbolKind("export class Foo {}", "ts-export")).toBe(
+			"class",
+		);
+	});
+
+	test("detects interface from ts-export text", () => {
+		expect(determineSymbolKind("export interface Bar {}", "ts-export")).toBe(
+			"interface",
+		);
+	});
+
+	test("detects type from ts-export text", () => {
+		expect(determineSymbolKind("export type Baz = string", "ts-export")).toBe(
+			"type",
+		);
+	});
+
+	test("detects enum from ts-export text", () => {
+		expect(
+			determineSymbolKind("export enum Dir { Up, Down }", "ts-export"),
+		).toBe("enum");
+	});
+
+	test("detects const from ts-export text", () => {
+		expect(determineSymbolKind("export const FOO = 1", "ts-export")).toBe(
+			"const",
+		);
+	});
+
+	test("defaults to function for unknown", () => {
+		expect(determineSymbolKind("???", "unknown-rule")).toBe("function");
+	});
+});
+
+describe("getLanguageForExtension", () => {
+	test("maps .ts to typescript", () => {
+		expect(getLanguageForExtension(".ts")).toBe("typescript");
+	});
+
+	test("maps .tsx to typescript", () => {
+		expect(getLanguageForExtension(".tsx")).toBe("typescript");
+	});
+
+	test("maps .js to javascript", () => {
+		expect(getLanguageForExtension(".js")).toBe("javascript");
+	});
+
+	test("maps .jsx to javascript", () => {
+		expect(getLanguageForExtension(".jsx")).toBe("javascript");
+	});
+
+	test("maps .py to python", () => {
+		expect(getLanguageForExtension(".py")).toBe("python");
+	});
+
+	test("returns null for unsupported extension", () => {
+		expect(getLanguageForExtension(".rs")).toBeNull();
+		expect(getLanguageForExtension(".go")).toBeNull();
+		expect(getLanguageForExtension(".md")).toBeNull();
+	});
+});
+
+describe("hashFileContent", () => {
+	test("returns consistent SHA-256 hex digest", async () => {
+		const tmpDir = mkdtempSync(join(tmpdir(), "hash-test-"));
+		const filePath = join(tmpDir, "test.txt");
+		writeFileSync(filePath, "hello world");
+
+		const hash1 = await hashFileContent(filePath);
+		const hash2 = await hashFileContent(filePath);
+
+		expect(hash1).toBe(hash2);
+		expect(hash1).toHaveLength(64); // SHA-256 hex = 64 chars
+	});
+
+	test("produces different hashes for different content", async () => {
+		const tmpDir = mkdtempSync(join(tmpdir(), "hash-test-"));
+		const file1 = join(tmpDir, "a.txt");
+		const file2 = join(tmpDir, "b.txt");
+		writeFileSync(file1, "content A");
+		writeFileSync(file2, "content B");
+
+		const hash1 = await hashFileContent(file1);
+		const hash2 = await hashFileContent(file2);
+
+		expect(hash1).not.toBe(hash2);
+	});
+});
+
+describe("rules", () => {
+	test("getTypescriptRules returns valid YAML with expected rule IDs", () => {
+		const rules = getTypescriptRules();
+		expect(rules).toContain("id: ts-jsdoc");
+		expect(rules).toContain("id: ts-export");
+		expect(rules).toContain("id: ts-function");
+		expect(rules).toContain("id: ts-class");
+		expect(rules).toContain("id: ts-interface");
+		expect(rules).toContain("language: TypeScript");
+	});
+
+	test("getPythonRules returns valid YAML with expected rule IDs", () => {
+		const rules = getPythonRules();
+		expect(rules).toContain("id: py-function");
+		expect(rules).toContain("id: py-class");
+		expect(rules).toContain("id: py-decorated");
+		expect(rules).toContain("language: Python");
+	});
+
+	test("getRulesForLanguage returns TypeScript rules for typescript", () => {
+		expect(getRulesForLanguage("typescript")).toBe(getTypescriptRules());
+	});
+
+	test("getRulesForLanguage returns TypeScript rules for javascript", () => {
+		expect(getRulesForLanguage("javascript")).toBe(getTypescriptRules());
+	});
+
+	test("getRulesForLanguage returns Python rules for python", () => {
+		expect(getRulesForLanguage("python")).toBe(getPythonRules());
+	});
+
+	test("getRulesForLanguage returns null for unsupported language", () => {
+		expect(getRulesForLanguage("rust")).toBeNull();
+		expect(getRulesForLanguage("go")).toBeNull();
+	});
+});
diff --git a/cli/tests/review-output.test.ts b/cli/tests/review-output.test.ts
deleted file mode 100644
index b906ceb..0000000
--- a/cli/tests/review-output.test.ts
+++ /dev/null
@@ -1,265 +0,0 @@
-import { describe, expect, test } from "bun:test";
-import { formatReviewJson, formatReviewText } from "../src/output/review.js";
-import type {
-	PassResult,
-	ReviewFindingWithPass,
-	ReviewResult,
-} from "../src/schemas/review.js";
-
-const makePassResult = (overrides?: Partial<PassResult>): PassResult => ({
-	name: "correctness",
-	findings: [],
-	costUsd: 0.42,
-	durationMs: 12000,
-	sessionId: "sess-001",
-	...overrides,
-});
-
-const makeFinding = (
-	overrides?: Partial<ReviewFindingWithPass>,
-): ReviewFindingWithPass => ({
-	file: "src/auth.ts",
-	line: 48,
-	severity: "high",
-	category: "correctness",
-	title: "Unchecked null access",
-	description: "user.email accessed without null check",
-	suggestion: "Add optional chaining: user?.email",
-	pass: 1,
-	passName: "correctness",
-	...overrides,
-});
-
-const makeResult = (overrides?: Partial<ReviewResult>): ReviewResult => ({
-	base: "staging",
-	head: "HEAD",
-	filesChanged: 5,
-	scope: "diff",
-	score: 7,
-	findings: [makeFinding()],
-	summary: "Review completed with 1 finding across 1 pass.",
-	passes: [makePassResult()],
-	totalCostUsd: 0.42,
-	...overrides,
-});
-
-describe("formatReviewText", () => {
-	test("includes header with base and head", () => {
-		const output = formatReviewText(makeResult(), { noColor: true });
-		expect(output).toContain("staging..HEAD");
-	});
-
-	test("includes files changed count", () => {
-		const output = formatReviewText(makeResult(), { noColor: true });
-		expect(output).toContain("5 files changed");
-	});
-
-	test("shows full codebase header for full scope", () => {
-		const output = formatReviewText(makeResult({ scope: "full" }), {
-			noColor: true,
-		});
-		expect(output).toContain("Full codebase review");
-	});
-
-	test("includes pass summary lines", () => {
-		const output = formatReviewText(makeResult(), { noColor: true });
-		expect(output).toContain("Pass 1: Correctness");
-		expect(output).toContain("$0.42");
-		expect(output).toContain("12s");
-	});
-
-	test("shows pass error when present", () => {
-		const output = formatReviewText(
-			makeResult({
-				passes: [makePassResult({ error: "budget exceeded" })],
-			}),
-			{ noColor: true },
-		);
-		expect(output).toContain("budget exceeded");
-	});
-
-	test("includes finding severity tag", () => {
-		const output = formatReviewText(makeResult(), { noColor: true });
-		expect(output).toContain("[HIGH]");
-	});
-
-	test("includes finding file and line", () => {
-		const output = formatReviewText(makeResult(), { noColor: true });
-		expect(output).toContain("src/auth.ts:48");
-	});
-
-	test("includes finding title and description", () => {
-		const output = formatReviewText(makeResult(), { noColor: true });
-		expect(output).toContain("Unchecked null access");
-		expect(output).toContain("user.email accessed without null check");
-	});
-
-	test("includes finding suggestion", () => {
-		const output = formatReviewText(makeResult(), { noColor: true });
-		expect(output).toContain("Add optional chaining: user?.email");
-	});
-
-	test("includes pass name attribution", () => {
-		const output = formatReviewText(makeResult(), { noColor: true });
-		expect(output).toContain("(correctness)");
-	});
-
-	test("shows no issues message when findings empty", () => {
-		const output = formatReviewText(makeResult({ findings: [] }), {
-			noColor: true,
-		});
-		expect(output).toContain("No issues found.");
-	});
-
-	test("includes score in footer", () => {
-		const output = formatReviewText(makeResult(), { noColor: true });
-		expect(output).toContain("Score: 7/10");
-	});
-
-	test("includes total cost in footer", () => {
-		const output = formatReviewText(makeResult(), { noColor: true });
-		expect(output).toContain("Total: $0.42");
-	});
-
-	test("includes severity counts in footer", () => {
-		const output = formatReviewText(makeResult(), { noColor: true });
-		expect(output).toContain("1 high");
-	});
-
-	test("handles finding without line number", () => {
-		const output = formatReviewText(
-			makeResult({
-				findings: [makeFinding({ line: null })],
-			}),
-			{ noColor: true },
-		);
-		expect(output).toContain("src/auth.ts");
-		expect(output).not.toContain("src/auth.ts:");
-	});
-
-	test("handles finding without suggestion", () => {
-		const output = formatReviewText(
-			makeResult({
-				findings: [makeFinding({ suggestion: null })],
-			}),
-			{ noColor: true },
-		);
-		expect(output).toContain("Unchecked null access");
-		expect(output).not.toContain("\u2192");
-	});
-
-	test("renders multiple findings with pass attribution", () => {
-		const output = formatReviewText(
-			makeResult({
-				findings: [
-					makeFinding({ passName: "correctness" }),
-					makeFinding({
-						file: "src/api.ts",
-						line: 12,
-						severity: "critical",
-						title: "SQL injection",
-						pass: 2,
-						passName: "security",
-					}),
-				],
-			}),
-			{ noColor: true },
-		);
-		expect(output).toContain("(correctness)");
-		expect(output).toContain("(security)");
-	});
-
-	test("includes separators", () => {
-		const output = formatReviewText(makeResult(), { noColor: true });
-		expect(output).toContain("\u2501".repeat(60));
-	});
-
-	test("renders multiple passes in summary", () => {
-		const output = formatReviewText(
-			makeResult({
-				passes: [
-					makePassResult({ name: "correctness" }),
-					makePassResult({
-						name: "security",
-						costUsd: 0.31,
-						durationMs: 9000,
-					}),
-					makePassResult({
-						name: "quality",
-						costUsd: 0.28,
-						durationMs: 8000,
-					}),
-				],
-			}),
-			{ noColor: true },
-		);
-		expect(output).toContain("Pass 1: Correctness");
-		expect(output).toContain("Pass 2: Security");
-		expect(output).toContain("Pass 3: Quality");
-	});
-});
-
-describe("formatReviewJson", () => {
-	test("returns valid JSON", () => {
-		const output = formatReviewJson(makeResult());
-		expect(() => JSON.parse(output)).not.toThrow();
-	});
-
-	test("includes base and head", () => {
-		const parsed = JSON.parse(formatReviewJson(makeResult()));
-		expect(parsed.base).toBe("staging");
-		expect(parsed.head).toBe("HEAD");
-	});
-
-	test("includes scope", () => {
-		const parsed = JSON.parse(formatReviewJson(makeResult()));
-		expect(parsed.scope).toBe("diff");
-	});
-
-	test("includes score", () => {
-		const parsed = JSON.parse(formatReviewJson(makeResult()));
-		expect(parsed.score).toBe(7);
-	});
-
-	test("includes filesChanged", () => {
-		const parsed = JSON.parse(formatReviewJson(makeResult()));
-		expect(parsed.filesChanged).toBe(5);
-	});
-
-	test("includes findings with pass info", () => {
-		const parsed = JSON.parse(formatReviewJson(makeResult()));
-		expect(parsed.findings).toHaveLength(1);
-		expect(parsed.findings[0].file).toBe("src/auth.ts");
-		expect(parsed.findings[0].line).toBe(48);
-		expect(parsed.findings[0].severity).toBe("high");
-		expect(parsed.findings[0].pass).toBe(1);
-		expect(parsed.findings[0].passName).toBe("correctness");
-	});
-
-	test("includes cost breakdown", () => {
-		const parsed = JSON.parse(formatReviewJson(makeResult()));
-		expect(parsed.cost.total_usd).toBe(0.42);
-		expect(parsed.cost.passes).toHaveLength(1);
-		expect(parsed.cost.passes[0].name).toBe("correctness");
-		expect(parsed.cost.passes[0].cost_usd).toBe(0.42);
-		expect(parsed.cost.passes[0].duration_ms).toBe(12000);
-	});
-
-	test("includes summary", () => {
-		const parsed = JSON.parse(formatReviewJson(makeResult()));
-		expect(parsed.summary).toContain("1 finding");
-	});
-
-	test("includes pass error in cost breakdown", () => {
-		const result = makeResult({
-			passes: [makePassResult({ error: "budget exceeded" })],
-		});
-		const parsed = JSON.parse(formatReviewJson(result));
-		expect(parsed.cost.passes[0].error).toBe("budget exceeded");
-	});
-
-	test("omits error field when no error", () => {
-		const parsed = JSON.parse(formatReviewJson(makeResult()));
-		expect(parsed.cost.passes[0]).not.toHaveProperty("error");
-	});
-});
diff --git a/cli/tests/review-runner.test.ts b/cli/tests/review-runner.test.ts
deleted file mode 100644
index d2f173b..0000000
--- a/cli/tests/review-runner.test.ts
+++ /dev/null
@@ -1,236 +0,0 @@
-import { describe, expect, test } from "bun:test";
-import type { PassName, ReviewFinding } from "../src/schemas/review.js";
-
-describe("review-runner pure functions", () => {
-	// Test mergeFindings logic
-	test("deduplicates findings by file:line:title", () => {
-		const { mergeFindings } = createTestHelpers();
-		const passResults = [
-			{
-				name: "correctness" as PassName,
-				findings: [
-					makeFinding({ file: "a.ts", line: 1, title: "Bug" }),
-					makeFinding({ file: "b.ts", line: 2, title: "Error" }),
-				],
-				costUsd: 0.4,
-				durationMs: 10000,
-				sessionId: "s1",
-			},
-			{
-				name: "security" as PassName,
-				findings: [
-					makeFinding({ file: "a.ts", line: 1, title: "Bug" }), // duplicate
-					makeFinding({ file: "c.ts", line: 3, title: "Vuln" }),
-				],
-				costUsd: 0.3,
-				durationMs: 9000,
-				sessionId: "s1",
-			},
-		];
-
-		const merged = mergeFindings(passResults);
-		expect(merged).toHaveLength(3); // Bug, Error, Vuln — duplicate removed
-	});
-
-	test("sorts findings by severity", () => {
-		const { mergeFindings } = createTestHelpers();
-		const passResults = [
-			{
-				name: "correctness" as PassName,
-				findings: [
-					makeFinding({ severity: "low", title: "Low" }),
-					makeFinding({ severity: "critical", title: "Crit" }),
-					makeFinding({ severity: "medium", title: "Med" }),
-				],
-				costUsd: 0.4,
-				durationMs: 10000,
-				sessionId: "s1",
-			},
-		];
-
-		const merged = mergeFindings(passResults);
-		expect(merged[0].severity).toBe("critical");
-		expect(merged[1].severity).toBe("medium");
-		expect(merged[2].severity).toBe("low");
-	});
-
-	test("assigns correct pass numbers and names", () => {
-		const { mergeFindings } = createTestHelpers();
-		const passResults = [
-			{
-				name: "correctness" as PassName,
-				findings: [makeFinding({ title: "A" })],
-				costUsd: 0.4,
-				durationMs: 10000,
-				sessionId: "s1",
-			},
-			{
-				name: "security" as PassName,
-				findings: [makeFinding({ title: "B" })],
-				costUsd: 0.3,
-				durationMs: 9000,
-				sessionId: "s1",
-			},
-		];
-
-		const merged = mergeFindings(passResults);
-		const findingA = merged.find((f) => f.title === "A");
-		const findingB = merged.find((f) => f.title === "B");
-		expect(findingA?.pass).toBe(1);
-		expect(findingA?.passName).toBe("correctness");
-		expect(findingB?.pass).toBe(2);
-		expect(findingB?.passName).toBe("security");
-	});
-
-	// Test calculateScore logic
-	test("calculates score 10 for no findings", () => {
-		const { calculateScore } = createTestHelpers();
-		expect(calculateScore([])).toBe(10);
-	});
-
-	test("deducts 3 points per critical finding", () => {
-		const { calculateScore } = createTestHelpers();
-		const findings = [makeWithPass({ severity: "critical" })];
-		expect(calculateScore(findings)).toBe(7);
-	});
-
-	test("deducts 2 points per high finding", () => {
-		const { calculateScore } = createTestHelpers();
-		const findings = [makeWithPass({ severity: "high" })];
-		expect(calculateScore(findings)).toBe(8);
-	});
-
-	test("deducts 1 point per medium finding", () => {
-		const { calculateScore } = createTestHelpers();
-		const findings = [makeWithPass({ severity: "medium" })];
-		expect(calculateScore(findings)).toBe(9);
-	});
-
-	test("deducts 0.5 points per low finding", () => {
-		const { calculateScore } = createTestHelpers();
-		const findings = [makeWithPass({ severity: "low" })];
-		// 10 - 0.5 = 9.5, rounds to 10
-		expect(calculateScore(findings)).toBe(10);
-	});
-
-	test("info findings don't affect score", () => {
-		const { calculateScore } = createTestHelpers();
-		const findings = [makeWithPass({ severity: "info" })];
-		expect(calculateScore(findings)).toBe(10);
-	});
-
-	test("score clamps to minimum 1", () => {
-		const { calculateScore } = createTestHelpers();
-		const findings = Array.from({ length: 10 }, () =>
-			makeWithPass({ severity: "critical" }),
-		);
-		expect(calculateScore(findings)).toBe(1);
-	});
-
-	test("score clamps to maximum 10", () => {
-		const { calculateScore } = createTestHelpers();
-		expect(calculateScore([])).toBe(10);
-	});
-
-	test("mixed severities calculate correctly", () => {
-		const { calculateScore } = createTestHelpers();
-		// 1 critical (3) + 1 high (2) + 2 medium (2) = 7 points → score 3
-		const findings = [
-			makeWithPass({ severity: "critical" }),
-			makeWithPass({ severity: "high" }),
-			makeWithPass({ severity: "medium" }),
-			makeWithPass({ severity: "medium" }),
-		];
-		expect(calculateScore(findings)).toBe(3);
-	});
-});
-
-// --- Helpers ---
-
-function makeFinding(overrides?: Partial<ReviewFinding>): ReviewFinding {
-	return {
-		file: "src/test.ts",
-		line: 1,
-		severity: "medium",
-		category: "correctness",
-		title: "Test finding",
-		description: "Test description",
-		suggestion: null,
-		...overrides,
-	};
-}
-
-function makeWithPass(overrides?: Partial<ReviewFinding>) {
-	return {
-		...makeFinding(overrides),
-		pass: 1,
-		passName: "correctness" as PassName,
-	};
-}
-
-/**
- * Re-implements the pure functions from review-runner for testing,
- * since we can't easily import them (they depend on Bun.spawn internals).
- */
-function createTestHelpers() {
-	const SEVERITY_SORT: Record<string, number> = {
-		critical: 0,
-		high: 1,
-		medium: 2,
-		low: 3,
-		info: 4,
-	};
-	const SCORE_WEIGHTS: Record<string, number> = {
-		critical: 3,
-		high: 2,
-		medium: 1,
-		low: 0.5,
-		info: 0,
-	};
-
-	function mergeFindings(
-		passResults: {
-			name: PassName;
-			findings: ReviewFinding[];
-			costUsd: number;
-			durationMs: number;
-			sessionId: string;
-		}[],
-	) {
-		const seen = new Set<string>();
-		const merged: (ReviewFinding & {
-			pass: number;
-			passName: PassName;
-		})[] = [];
-
-		for (const [i, pass] of passResults.entries()) {
-			for (const finding of pass.findings) {
-				const key = `${finding.file}:${finding.line}:${finding.title}`;
-				if (!seen.has(key)) {
-					seen.add(key);
-					merged.push({
-						...finding,
-						pass: i + 1,
-						passName: pass.name,
-					});
-				}
-			}
-		}
-
-		merged.sort(
-			(a, b) =>
-				(SEVERITY_SORT[a.severity] ?? 5) - (SEVERITY_SORT[b.severity] ?? 5),
-		);
-		return merged;
-	}
-
-	function calculateScore(findings: { severity: string }[]): number {
-		const totalPoints = findings.reduce(
-			(sum, f) => sum + (SCORE_WEIGHTS[f.severity] ?? 0),
-			0,
-		);
-		return Math.max(1, Math.min(10, Math.round(10 - totalPoints)));
-	}
-
-	return { mergeFindings, calculateScore };
-}
diff --git a/container/.codeforge/config/main-system-prompt.md b/container/.codeforge/config/main-system-prompt.md
index 91ee80b..78c4fe3 100755
--- a/container/.codeforge/config/main-system-prompt.md
+++ b/container/.codeforge/config/main-system-prompt.md
@@ -336,29 +336,37 @@ Tests NOT required:
 </testing_standards>
 
 <specification_management>
-Specs live in `.specs/` at the project root. You (the orchestrator) own spec creation and maintenance.
+Specs live in `.specs/` at the project root as directory-based "spec packages." You (the orchestrator) own spec creation and maintenance.
 
-Workflow: features live in `BACKLOG.md` → pulled into `MILESTONES.md` when scoped → each gets a spec via `/spec-new` → after implementation, verify via `/spec-review` → close via `/spec-update`.
+Workflow: features live in `BACKLOG.md` → each gets a spec package via `/spec` → after approval, implement via `/build`.
 
 Folder structure:
 ```text
 .specs/
-├── MILESTONES.md           # Current milestone scope
-├── BACKLOG.md              # Priority-graded feature backlog
-├── auth/                   # Domain folder
-│   └── login-flow.md       # Feature spec (~200 lines each)
+├── CONSTITUTION.md              # Project-level cross-cutting decisions
+├── BACKLOG.md                   # Feature idea parking lot
+├── auth/                        # Domain folder
+│   └── login-flow/              # Spec package (directory)
+│       ├── index.md             # Human-facing (~50-80 lines)
+│       ├── context.md           # AI-facing (invariants, schema, constraints)
+│       └── groups/
+│           ├── a-credentials.md # AC group with frontmatter
+│           └── b-sessions.md    # AC group with frontmatter
 ```
 
 Key rules:
-- ~200 lines per spec. Split by feature boundary when longer.
+- Every spec is a directory package, not a single file.
+- `index.md` is the human review surface — decisions, AC summary, scope. Keep under 80 lines.
+- `context.md` and group files are AI-facing — invariants, examples, schema, decomposition.
 - Reference files, don't reproduce them. The code is the source of truth.
-- Each spec is independently loadable: domain, status, last-updated, intent, key files, acceptance criteria.
+- Spec-level approval: `draft` or `approved`. No per-requirement tagging.
+- The AI makes obvious decisions and presents only genuine trade-offs to the human.
 - Delegate spec writing to the spec-writer agent.
-- Requirement tags: `[assumed]` (agent-drafted) vs `[user-approved]` (validated via `/spec-refine`). Never silently upgrade.
-- Specs with ANY `[assumed]` requirements are NOT approved for implementation.
 
-Before implementation: check if a spec exists. If `draft` → `/spec-refine` first. If `user-approved` → proceed.
-After implementation: `/spec-review` → `/spec-update`. Present any deviations to the user for approval.
+Before implementation: check if a spec exists. If `draft` → `/spec` to refine first. If `approved` → proceed.
+After implementation: `/build` handles review and closure automatically. Present any deviations to the user for approval.
+
+Commands: `/spec <feature>` (create/refine), `/build <feature>` (implement/close), `/specs` (dashboard).
 </specification_management>
 
 <documentation>
diff --git a/container/.codeforge/config/orchestrator-system-prompt.md b/container/.codeforge/config/orchestrator-system-prompt.md
index a5aebf8..e0cd207 100644
--- a/container/.codeforge/config/orchestrator-system-prompt.md
+++ b/container/.codeforge/config/orchestrator-system-prompt.md
@@ -257,20 +257,18 @@ Specs and project-level docs live in `.specs/` at the project root.
 You own spec enforcement. Agents do not update specs without your direction.
 
 Before starting implementation:
-1. Check if a spec exists for the feature: Glob `.specs/**/*.md`
+1. Check if a spec package exists: Glob `.specs/**/index.md`
 2. If a spec exists:
-   - Read it. Verify `**Approval:**` is `user-approved`.
-   - If `draft` → STOP. Delegate to documenter for `/spec-refine` first.
-   - If `user-approved` → proceed. Use acceptance criteria as the definition of done.
+   - Read `index.md` frontmatter. Verify `approval: approved`.
+   - If `draft` → STOP. Run `/spec` to refine and approve first.
+   - If `approved` → proceed. Use acceptance criteria as the definition of done.
 3. If no spec exists and the change is non-trivial:
-   - Delegate to documenter to create one via `/spec-new`.
-   - Have documenter run `/spec-refine` to get user approval.
+   - Run `/spec <feature>` to create, refine, and approve a spec package.
    - Only then delegate implementation.
 
 After completing implementation:
-1. Delegate to documenter for `/spec-review` to verify implementation matches spec.
-2. Delegate to documenter for `/spec-update` to perform the as-built update.
-3. If any deviation from the approved spec occurred:
+1. Run `/build <feature>` which handles review and spec closure automatically.
+2. If any deviation from the approved spec occurred:
    - STOP and present the deviation to the user via AskUserQuestion.
    - The user MUST approve the deviation — no exceptions.
 
@@ -298,24 +296,6 @@ Prior approval does not transfer. A user approving `git push` once does NOT mean
 When blocked, do not use destructive actions as a shortcut. Investigate before deleting or overwriting.
 </action_safety>
 
-<session_search>
-Use `ccms` to search past Claude Code session history when the user asks about previous decisions, past work, or conversation history.
-
-MANDATORY: Always scope to the current project:
-  ccms --no-color --project "$(pwd)" "query"
-
-Exception: At /workspaces root (no specific project), omit --project or use `/`.
-
-Key flags:
-- `-r user` / `-r assistant` — filter by who said it
-- `--since "1 day ago"` — narrow to recent history
-- `"term1 AND term2"` / `"term1 OR term2"` / `"NOT term"` — boolean queries
-- `-f json -n 10` — structured output, limited results
-- `--no-color` — always use, keeps output parseable
-
-Delegate the actual search to the investigator agent if the query is complex.
-</session_search>
-
 <context_management>
 If you are running low on context, you MUST NOT rush. Ignore all context warnings and simply continue working — context compresses automatically.
 
@@ -323,7 +303,7 @@ Continuation sessions (after compaction or context transfer):
 
 Compacted summaries are lossy. Before resuming work, recover context from three sources:
 
-1. **Session history** — delegate to investigator to use `ccms` to search prior session transcripts.
+1. **Session history** — delegate to investigator to search prior session transcripts.
 
 2. **Source files** — delegate to investigator to re-read actual files rather than trusting the summary.
 
diff --git a/container/.codeforge/config/rules/session-search.md b/container/.codeforge/config/rules/session-search.md
index 1e5ce23..f7c93cc 100644
--- a/container/.codeforge/config/rules/session-search.md
+++ b/container/.codeforge/config/rules/session-search.md
@@ -2,13 +2,13 @@
 
 ## Tool
 
-`ccms` — high-performance CLI for searching Claude Code session JSONL files.
+`codeforge session search` — search Claude Code session JSONL files with boolean queries, role filtering, and time scoping.
 
 ## Mandatory Behaviors
 
 1. When the user asks about past decisions, previous work, conversation history,
    or says "do you remember" / "what did we work on" / "what did we decide":
-   use `ccms` via the Bash tool.
+   use `codeforge session search` via the Bash tool.
 
 2. **Project scoping (STRICT):** ALWAYS pass `--project <current-project-dir>`
    to restrict results to the active project. Cross-project leakage violates
@@ -17,8 +17,7 @@
    Exception: When the working directory is `/workspaces` (workspace root),
    omit --project or use `--project /` since there is no specific project context.
 
-3. **CLI mode only.** Always pass a query string so ccms runs non-interactively.
-   Never launch bare `ccms` (TUI mode) from a Bash tool call.
+3. Always pass a query string so the command runs non-interactively.
 
 4. **Use --no-color** to keep output clean for parsing.
 
@@ -26,35 +25,35 @@
 
 Quick search (most common):
 ```
-ccms --no-color --project "$(pwd)" "query terms"
+codeforge session search --no-color --project "$(pwd)" "query terms"
 ```
 
 Role-filtered search:
 ```
-ccms --no-color --project "$(pwd)" -r assistant "what was decided"
-ccms --no-color --project "$(pwd)" -r user "auth approach"
+codeforge session search --no-color --project "$(pwd)" -r assistant "what was decided"
+codeforge session search --no-color --project "$(pwd)" -r user "auth approach"
 ```
 
 Boolean queries:
 ```
-ccms --no-color --project "$(pwd)" "error AND connection"
-ccms --no-color --project "$(pwd)" "(auth OR authentication) AND NOT test"
+codeforge session search --no-color --project "$(pwd)" "error AND connection"
+codeforge session search --no-color --project "$(pwd)" "(auth OR authentication) AND NOT test"
 ```
 
 Time-scoped search:
 ```
-ccms --no-color --project "$(pwd)" --since "1 day ago" "recent work"
-ccms --no-color --project "$(pwd)" --since "1 week ago" "architecture"
+codeforge session search --no-color --project "$(pwd)" --since "1 day ago" "recent work"
+codeforge session search --no-color --project "$(pwd)" --since "1 week ago" "architecture"
 ```
 
 JSON output (for structured parsing):
 ```
-ccms --no-color --project "$(pwd)" -f json "query" -n 10
+codeforge session search --no-color --project "$(pwd)" -f json "query" -n 10
 ```
 
 Statistics only:
 ```
-ccms --no-color --project "$(pwd)" --stats ""
+codeforge session search --no-color --project "$(pwd)" --stats ""
 ```
 
 ## Output Management
@@ -64,3 +63,4 @@ ccms --no-color --project "$(pwd)" --stats ""
 - Use `-r user` when looking for what the user previously asked/requested
 - Use `--since` to narrow to recent history when appropriate
 - Use `-f json` when you need structured data (session IDs, timestamps)
+- Use `--full-text` to disable content truncation when you need complete messages
diff --git a/container/.codeforge/config/rules/spec-workflow.md b/container/.codeforge/config/rules/spec-workflow.md
index 7dbd799..8ac6c3a 100644
--- a/container/.codeforge/config/rules/spec-workflow.md
+++ b/container/.codeforge/config/rules/spec-workflow.md
@@ -1,48 +1,61 @@
 # Specification Workflow
 
-Every project uses `.specs/` as the specification directory. These rules are mandatory.
+Every project uses `.specs/` as the specification directory. Specs are directory-based "spec packages." These rules are mandatory.
 
 ## Rules
 
-1. Every non-trivial feature MUST have a spec before implementation begins.
-   Use `/spec-new` to create one from the standard template.
-2. Every implementation MUST end with an as-built spec update.
-   Use `/spec-update` to perform the update.
-3. Specs should aim for ~200 lines. Split by feature boundary when
-   significantly longer into separate specs in the domain folder.
-   Completeness matters more than hitting a number.
+1. Every non-trivial feature MUST have a spec package before implementation begins.
+   Use `/spec` to create one — it handles creation, refinement, and approval in one flow.
+2. Every implementation MUST end with spec closure.
+   `/build` handles this automatically in its final phase.
+3. Specs are directory packages, not single files:
+   ```
+   .specs/{domain}/{feature}/
+     index.md      # Human-facing: intent, decisions, AC summary, scope
+     context.md    # AI-facing: invariants, anti-patterns, schema, constraints
+     groups/       # Per-group AC files with frontmatter for parallel build
+   ```
 4. Specs MUST reference file paths, never reproduce source code,
    schemas, or type definitions inline. The code is the source of truth.
-5. Each spec file MUST be independently loadable — include domain,
-   status, last-updated, intent, key files, and acceptance criteria.
-6. Before starting a new milestone, MUST run `/spec-check` to audit spec health.
-7. To bootstrap `.specs/` for a project that doesn't have one, use `/spec-init`.
-8. New specs start with `**Approval:** draft` and all requirements tagged
-   `[assumed]`. Use `/spec-refine` to validate assumptions with the user
-   and upgrade to `[user-approved]` before implementation begins.
-9. A spec-reminder advisory hook fires at Stop when code was modified but
-   specs weren't updated. Use `/spec-update` to close the loop.
-10. For approved specs, use `/spec-build` to orchestrate the full
-    implementation lifecycle — plan, build, review, and close the spec
-    in one pass. Phase 5 handles as-built closure, so a separate
-    `/spec-update` is not needed afterward.
-11. Use `/spec-review` for standalone implementation verification against
-    a spec — after manual implementation, post-change regression checks,
-    or pre-release audits. It reads code, verifies requirements and
-    acceptance criteria, and recommends `/spec-update` when done.
+5. `index.md` is the human review surface — keep it under 80 lines.
+   All AI-facing detail goes in `context.md` and group files.
+6. Cross-cutting decisions belong in `.specs/CONSTITUTION.md`, not repeated
+   in every spec. Use `/spec constitution` to create or update it.
+7. `/spec` auto-bootstraps `.specs/` on first use. No separate init needed.
+8. New specs start with `approval: draft`. The AI makes obvious decisions
+   and presents only genuine trade-offs to the human. Once the human
+   approves decisions and scope, the spec is `approval: approved`.
+9. A spec-reminder hook fires at Stop when code was modified but specs
+   weren't updated. Use `/build` to close the loop.
+10. For approved specs, use `/build` to orchestrate the full implementation
+    lifecycle — plan, build, self-healing review, and spec closure in one
+    pass. No separate update step needed.
+11. Use `/specs` to check spec health across the project — status,
+    staleness, draft specs, and unresolved AI decisions.
+
+## Commands
+
+| Command | Purpose |
+|---------|---------|
+| `/spec <feature>` | Create, refine, and approve a spec package |
+| `/spec constitution` | Create or update the project Constitution |
+| `/build <feature>` | Implement an approved spec, review, and close |
+| `/specs` | Dashboard: spec health across the project |
 
 ## Acceptance Criteria Markers
 
-Acceptance criteria use three states during implementation:
-
 | Marker | Meaning |
 |--------|---------|
 | `[ ]` | Not started |
 | `[~]` | Implemented, not yet verified — code written, tests not confirmed |
 | `[x]` | Verified — tests pass, behavior confirmed |
 
-`/spec-build` Phase 3 flips `[ ]` to `[~]` as criteria are addressed.
-Phase 4 upgrades `[~]` to `[x]` after verification. `/spec-update`
-treats any remaining `[~]` as `[ ]` if they were never verified.
+## Approval Model
+
+Specs use spec-level approval (not per-requirement):
+- `draft` — not ready for implementation. `/build` rejects it.
+- `approved` — human has reviewed decisions and scope. Ready to build.
 
-See the system prompt's `<specification_management>` section for the full template, directory structure, and as-built workflow.
+The AI captures obvious decisions in "Already Decided." The human reviews
+"Needs Your Input" for genuine trade-offs. No `[assumed]`/`[user-approved]`
+per-requirement tagging.
diff --git a/container/.codeforge/config/settings.json b/container/.codeforge/config/settings.json
index ef8420b..191fe9e 100644
--- a/container/.codeforge/config/settings.json
+++ b/container/.codeforge/config/settings.json
@@ -3,19 +3,19 @@
 	"autoCompact": true,
 	"alwaysThinkingEnabled": true,
 	"env": {
-		"ANTHROPIC_MODEL": "claude-opus-4-6",
-		"ANTHROPIC_DEFAULT_OPUS_MODEL": "claude-opus-4-6",
-		"ANTHROPIC_DEFAULT_SONNET_MODEL": "claude-sonnet-4-5-20250929",
+		"ANTHROPIC_MODEL": "claude-opus-4-6[1m]",
+		"ANTHROPIC_DEFAULT_OPUS_MODEL": "claude-opus-4-6[1m]",
+		"ANTHROPIC_DEFAULT_SONNET_MODEL": "claude-sonnet-4-6",
 		"ANTHROPIC_DEFAULT_HAIKU_MODEL": "claude-haiku-4-5-20251001",
-		"BASH_DEFAULT_TIMEOUT_MS": "240000",
-		"BASH_MAX_TIMEOUT_MS": "600000",
+		"BASH_DEFAULT_TIMEOUT_MS": "120000",
+		"BASH_MAX_TIMEOUT_MS": "300000",
 		"CLAUDE_CODE_MAX_OUTPUT_TOKENS": "64000",
 		"MAX_MCP_OUTPUT_TOKENS": "10000",
 		"MAX_THINKING_TOKENS": "63999",
 		"MCP_TIMEOUT": "120000",
 		"MCP_TOOL_TIMEOUT": "30000",
-		"CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "95",
-		"CLAUDE_CODE_SHELL": "zsh",
+		"CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "90",
+		"CLAUDE_CODE_AUTO_COMPACT_WINDOW": "500000",
 		"FORCE_AUTOUPDATE_PLUGINS": "1",
 
 		"ENABLE_TOOL_SEARCH": "auto:5",
@@ -26,10 +26,11 @@
 		"ENABLE_CLAUDE_CODE_SM_COMPACT": "1",
 		"CLAUDE_CODE_FORCE_GLOBAL_CACHE": "1",
 		"CLAUDE_CODE_PLAN_MODE_INTERVIEW_PHASE": "true",
-		"CLAUDE_CODE_PLAN_V2_AGENT_COUNT": "3",
+		"CLAUDE_CODE_PLAN_V2_AGENT_COUNT": "5",
 		"CLAUDE_CODE_PLAN_MODE_REQUIRED": "true",
+		"CLAUDE_BASH_MAINTAIN_PROJECT_WORKING_DIR": "true",
 
-		"CLAUDE_CODE_MAX_TOOL_USE_CONCURRENCY": "5",
+		"CLAUDE_CODE_MAX_TOOL_USE_CONCURRENCY": "10",
 		"CLAUDE_CODE_MAX_RETRIES": "1",
 		"BASH_MAX_OUTPUT_LENGTH": "15000",
 		"TASK_MAX_OUTPUT_LENGTH": "64000"
@@ -44,7 +45,7 @@
 		"defaultMode": "plan",
 		"additionalDirectories": []
 	},
-	"model": "opus",
+	"model": "opus[1m]",
 	"enabledMcpjsonServers": [],
 	"disabledMcpjsonServers": [],
 	"hooks": {},
diff --git a/container/.codeforge/config/writing-system-prompt.md b/container/.codeforge/config/writing-system-prompt.md
index 0d45380..e53e10b 100644
--- a/container/.codeforge/config/writing-system-prompt.md
+++ b/container/.codeforge/config/writing-system-prompt.md
@@ -178,8 +178,3 @@ When writing for a specific POV character, define these elements in the project'
 - Always include suggestions with questions. Never ask an open-ended question without proposed answers.
 
 ---
-
-# SESSION HISTORY
-
-When asked about past writing sessions, use `ccms` to search session history.
-Always scope to the current project: `ccms --no-color --project "$(pwd)" "query"`
diff --git a/container/.devcontainer/CHANGELOG.md b/container/.devcontainer/CHANGELOG.md
index 5e1de44..7a3b315 100644
--- a/container/.devcontainer/CHANGELOG.md
+++ b/container/.devcontainer/CHANGELOG.md
@@ -1,7 +1,45 @@
 # CodeForge Devcontainer Changelog
 
+## v2.1.1 — 2026-03-13
+
+### Workspace Scope Guard
+
+- Fix `/dev/null` false positive — redirects to system paths (`/dev/`, `/proc/`, `/sys/`, etc.) are now allowed regardless of the primary command, not just for system commands like `git` or `pip`
+- Fix CWD drift — scope root is now persisted on first invocation per session, preventing `cd` commands in Bash from silently changing the enforced scope boundary
+- CWD context injector now uses the same persisted scope root, keeping advisory context aligned with enforcement
+
+## v2.1.0 — 2026-03-13
+
+### Spec Workflow v2 — "Spec Packages"
+
+- **Breaking:** Replaced all 8 spec commands with 3: `/spec` (create & refine), `/build` (implement & close), `/specs` (dashboard)
+- Specs are now directory-based "spec packages" with separated human and AI content:
+  - `index.md` — human-facing entry point (~50-80 lines): intent, decisions, AC summary, scope
+  - `context.md` — AI-facing shared context: invariants, anti-patterns, schema intent, constraints
+  - `groups/*.md` — AC groups with YAML frontmatter for parallel agent decomposition
+- Added Constitution support (`.specs/CONSTITUTION.md`) for project-level cross-cutting decisions
+- Simplified approval model: spec-level `draft`/`approved` replaces per-requirement `[assumed]`/`[user-approved]` tagging
+- AI makes obvious decisions autonomously, presents only genuine trade-offs to the human
+- `[ai-decided]` workflow: AI records autonomous decisions during build for post-completion review
+- Group frontmatter (`depends_on`, `files_owned`) drives automatic task decomposition for team builds
+- Dropped MILESTONES.md and ROADMAP.md — replaced with simple BACKLOG.md idea parking lot
+- Updated all 8 agent skill lists, system prompts, orchestrator prompt, skill-suggester, and 8 docs pages
+- Ships with a complete example spec package (webhook delivery system) as reference
+
+### CLI v0.1.0 (Experimental)
+
+- Initial release of the `codeforge` CLI — session search, plugin management, config deployment, codebase indexing, and devcontainer management
+- New `codeforge index` command group — build and search a codebase symbol index (build, search, show, stats, tree, clean)
+- New `codeforge container` command group — manage devcontainers from the host (up, down, rebuild, exec, ls, shell)
+- Container proxy — CLI commands auto-proxy into the running devcontainer when run from the host
+
 ## v2.0.3 — 2026-03-03
 
+### CLI Feature
+
+- Rewrote `codeforge-cli` devcontainer feature to use a self-bootstrapping wrapper instead of `npm install -g` — the CLI now runs directly from workspace source via `bun`, auto-installing dependencies on first use
+- Removed `ccms` from `cc-tools` tool listing (replaced by `codeforge session search`)
+
 ### Workspace Scope Guard
 
 - Fix scope guard blocking project root access from subdirectory CWDs — now detects git repository root and uses it as scope boundary
diff --git a/container/.devcontainer/CLAUDE.md b/container/.devcontainer/CLAUDE.md
index b895519..5721c79 100644
--- a/container/.devcontainer/CLAUDE.md
+++ b/container/.devcontainer/CLAUDE.md
@@ -39,7 +39,7 @@ Declared in `settings.json` under `enabledPlugins`, auto-activated on start:
 
 - **agent-system** — 21 custom agents (4 workhorse + 17 specialist) + built-in agent redirection
 - **skill-engine** — 22 general coding skills + auto-suggestion
-- **spec-workflow** — 8 spec lifecycle skills + spec-reminder hook
+- **spec-workflow** — 3 spec lifecycle skills (`/spec`, `/build`, `/specs`) + spec-reminder hook
 - **session-context** — Git state injection, TODO harvesting, commit reminders
 - **auto-code-quality** — Auto-format + auto-lint + advisory test runner
 - **workspace-scope-guard** — Blocks writes outside working directory
diff --git a/container/.devcontainer/README.md b/container/.devcontainer/README.md
index d567a16..2944128 100644
--- a/container/.devcontainer/README.md
+++ b/container/.devcontainer/README.md
@@ -197,6 +197,7 @@ claude --resume               # Resume previous session
 | Tool | Description |
 |------|-------------|
 | `claude` | Claude Code CLI |
+| `codeforge` | CodeForge CLI (experimental) — session search, plugin management, indexing |
 | `cc` | Wrapper with auto-configuration |
 | `ccusage` | Token usage analyzer |
 | `ccburn` | Visual token burn rate tracker with pace indicators |
@@ -350,73 +351,75 @@ Skills in `plugins/devs-marketplace/plugins/skill-engine/skills/` provide domain
 
 `api-design` · `ast-grep-patterns` · `claude-agent-sdk` · `claude-code-headless` · `debugging` · `dependency-management` · `docker` · `docker-py` · `documentation-patterns` · `fastapi` · `git-forensics` · `migration-patterns` · `performance-profiling` · `pydantic-ai` · `refactoring-patterns` · `security-checklist` · `skill-building` · `sqlite` · `svelte5` · `team` · `testing` · `worktree`
 
-### Spec Skills (8) — `spec-workflow` plugin
+### Spec Skills (3) — `spec-workflow` plugin
 
 Skills in `plugins/devs-marketplace/plugins/spec-workflow/skills/`:
 
-`spec-build` · `spec-check` · `spec-init` · `spec-new` · `spec-refine` · `spec-review` · `spec-update` · `specification-writing`
+`spec` · `build` · `specs`
 
 ## Specification Workflow
 
-CodeForge includes a specification-driven development workflow. Every non-trivial feature gets a spec before implementation begins.
+CodeForge includes a specification-driven development workflow using directory-based "spec packages." Every non-trivial feature gets a spec package before implementation begins.
 
 ### Quick Start
 
 ```bash
-/spec-init                       # Bootstrap .specs/ directory (first time only)
-/spec-new auth-flow              # Create a feature spec (domain is inferred)
-/spec-refine auth-flow           # Validate assumptions with user
-# ... implement the feature ...
-/spec-update auth-flow           # As-built update after implementation
-/spec-check                      # Audit all specs for health
+/spec auth-flow                  # Create, refine, and approve a spec package
+/build auth-flow                 # Implement from spec — plan, build, review, close
+/specs                           # Dashboard: spec health across the project
 ```
 
 ### The Lifecycle
 
-1. **Backlog** — features live in `.specs/BACKLOG.md` with priority grades (P0–P3)
-2. **Milestone** — when starting a milestone, pull features from backlog into `.specs/MILESTONES.md`
-3. **Spec** — `/spec-new` creates a spec from the standard template with all requirements tagged `[assumed]`
-4. **Refine** — `/spec-refine` walks through every assumption with the user, converting `[assumed]` → `[user-approved]`. The spec's approval status moves from `draft` → `user-approved`. **No implementation begins until approved.**
-5. **Implement** — build the feature using the spec's acceptance criteria as the definition of done
-6. **Update** — `/spec-update` performs the as-built update: sets status, checks off criteria, adds implementation notes
-7. **Health check** — `/spec-check` audits all specs for staleness, missing sections, unapproved status, and other issues
+1. **Backlog** — feature ideas live in `.specs/BACKLOG.md`
+2. **Spec** — `/spec <feature>` creates a spec package. The AI drafts everything, presents only genuine trade-off decisions to the human, and makes obvious decisions itself.
+3. **Approve** — human reviews decisions + AC completeness in `index.md` (~50-80 lines). Once confirmed, spec is `approved`.
+4. **Build** — `/build <feature>` implements autonomously: plan, build with spec-first testing, self-healing review loop, closure with summary report.
+5. **Review** — human reads the Completion Summary Report, smoke tests, and reviews AI decisions.
+6. **Health check** — `/specs` audits all spec packages for staleness, drafts, and unresolved AI decisions.
 
 ### Approval Workflow
 
-Specs use a two-level approval system:
+Specs use spec-level approval:
 
-- **Requirement-level:** each requirement starts as `[assumed]` (AI hypothesis) and becomes `[user-approved]` after explicit user validation via `/spec-refine`
-- **Spec-level:** the `**Approval:**` field starts as `draft` and becomes `user-approved` when all requirements pass review
+- **`draft`** — not ready for implementation. `/build` rejects it.
+- **`approved`** — human has reviewed decisions and scope. Ready to build.
+
+The AI makes obvious decisions (tagged "Already Decided") and presents genuine trade-offs ("Needs Your Input"). No per-requirement `[assumed]`/`[user-approved]` tagging.
 
 A spec-reminder advisory hook fires at Stop when code was modified but specs weren't updated.
 
-### Skills Reference
+### Commands Reference
 
-| Skill | Purpose |
-|-------|---------|
-| `/spec-init` | Bootstrap `.specs/` directory with MILESTONES and BACKLOG |
-| `/spec-new` | Create a feature spec from the standard template |
-| `/spec-refine` | Validate assumptions and get user approval (required before implementation) |
-| `/spec-update` | As-built update after implementation |
-| `/spec-check` | Audit all specs for health issues |
-| `/spec-build` | Orchestrate full implementation from an approved spec (plan, build, review, close) |
-| `/spec-review` | Standalone deep implementation review against a spec |
-| `/specification-writing` | EARS format templates and acceptance criteria patterns |
+| Command | Purpose |
+|---------|---------|
+| `/spec <feature>` | Create, refine, and approve a spec package |
+| `/spec constitution` | Create or update project-level Constitution |
+| `/build <feature>` | Implement from spec — plan, build, review, close |
+| `/specs` | Dashboard: spec health across the project |
 
 ### Directory Structure
 
 ```
 .specs/
-├── MILESTONES.md      # Milestone tracker linking to feature specs
-├── BACKLOG.md         # Priority-graded feature backlog
-├── auth/              # Domain folder
-│   ├── login-flow.md  # Feature spec
-│   └── oauth.md       # Feature spec
-└── search/            # Domain folder
-    └── full-text.md   # Feature spec
+├── CONSTITUTION.md           # Project-level cross-cutting decisions
+├── BACKLOG.md                # Feature idea parking lot
+├── auth/                     # Domain folder
+│   └── login-flow/           # Spec package (directory)
+│       ├── index.md          # Human-facing entry point
+│       ├── context.md        # AI-facing shared context
+│       └── groups/           # AC groups for parallel build
+│           ├── a-credentials.md
+│           └── b-sessions.md
+└── search/
+    └── full-text/
+        ├── index.md
+        ├── context.md
+        └── groups/
+            └── a-indexing.md
 ```
 
-All specs live in domain subfolders. Specs aim for ~200 lines each; split into separate specs in the domain folder when longer.
+Every spec is a directory package. `index.md` is the human review surface. Everything else is for the AI.
 
 ## Project Manager
 
diff --git a/container/.devcontainer/features/codeforge-cli/README.md b/container/.devcontainer/features/codeforge-cli/README.md
index 40c73f4..0330a36 100644
--- a/container/.devcontainer/features/codeforge-cli/README.md
+++ b/container/.devcontainer/features/codeforge-cli/README.md
@@ -1,14 +1,18 @@
 # CodeForge CLI (codeforge-cli)
 
-Installs the [CodeForge CLI](https://github.com/AnExiledDev/CodeForge/tree/main/cli) globally via npm. Provides the `codeforge` command for code review, session search, plugin management, and configuration.
+> **Warning: Experimental** — The CodeForge CLI is under active development. Commands and interfaces may change between releases.
 
-Requires Node.js (for npm install) and Bun (runtime for the CLI binary).
+Installs a self-bootstrapping wrapper for the [CodeForge CLI](https://github.com/AnExiledDev/CodeForge/tree/main/cli). The `codeforge` command runs directly from workspace source (`/workspaces/cli`) — no npm publish required.
+
+On first invocation the wrapper auto-runs `bun install` if `node_modules/` is missing, then execs `bun src/index.ts` with all arguments forwarded.
+
+Requires Bun (declared as an `installsAfter` dependency).
 
 ## Options
 
 | Option | Type | Default | Description |
 |--------|------|---------|-------------|
-| `version` | string | `latest` | Version to install. Use a specific semver or `'none'` to skip. |
+| `version` | string | `latest` | Use `'none'` to skip installation entirely. |
 
 ## Usage
 
diff --git a/container/.devcontainer/features/codeforge-cli/devcontainer-feature.json b/container/.devcontainer/features/codeforge-cli/devcontainer-feature.json
index 024b610..7adfdde 100644
--- a/container/.devcontainer/features/codeforge-cli/devcontainer-feature.json
+++ b/container/.devcontainer/features/codeforge-cli/devcontainer-feature.json
@@ -2,7 +2,7 @@
 	"id": "codeforge-cli",
 	"version": "1.0.0",
 	"name": "CodeForge CLI",
-	"description": "Installs the CodeForge CLI for code review, session search, plugin management, and configuration",
+	"description": "Installs a self-bootstrapping wrapper that runs the CodeForge CLI from workspace source for session search, plugin management, and configuration (experimental)",
 	"documentationURL": "https://github.com/AnExiledDev/CodeForge/tree/main/cli",
 	"options": {
 		"version": {
diff --git a/container/.devcontainer/features/codeforge-cli/install.sh b/container/.devcontainer/features/codeforge-cli/install.sh
index cfd9bf1..510481f 100755
--- a/container/.devcontainer/features/codeforge-cli/install.sh
+++ b/container/.devcontainer/features/codeforge-cli/install.sh
@@ -11,31 +11,34 @@ if [ "${VERSION}" = "none" ]; then
     exit 0
 fi
 
-echo "[codeforge-cli] Starting installation..."
-echo "[codeforge-cli] Version: ${VERSION}"
-
-# Source NVM if available
-if [ -f /usr/local/share/nvm/nvm.sh ]; then
-    set +u
-    source /usr/local/share/nvm/nvm.sh
-    set -u
-fi
+echo "[codeforge-cli] Installing self-bootstrapping wrapper..."
+
+# Write the wrapper script that runs the CLI from workspace source.
+# The workspace is not mounted during feature install (Docker build),
+# so the wrapper defers bun install to first invocation.
+cat > /usr/local/bin/codeforge <<'WRAPPER'
+#!/bin/bash
+set -euo pipefail
+
+CLI_DIR="${WORKSPACE_ROOT:-/workspaces}/cli"
+BUN="${BUN:-$(command -v bun 2>/dev/null || echo "$HOME/.bun/bin/bun")}"
 
-# Validate npm is available
-if ! command -v npm &>/dev/null; then
-    echo "[codeforge-cli] ERROR: npm not found. Ensure Node.js is installed." >&2
+if [ ! -d "$CLI_DIR" ]; then
+    echo "codeforge: CLI source not found at $CLI_DIR" >&2
+    echo "Ensure the workspace is mounted and contains the cli/ directory." >&2
     exit 1
 fi
 
-# Install CodeForge CLI globally via npm
-if [ "${VERSION}" = "latest" ]; then
-    npm install -g codeforge-dev-cli
-else
-    npm install -g "codeforge-dev-cli@${VERSION}"
+if [ ! -d "$CLI_DIR/node_modules" ]; then
+    echo "codeforge: bootstrapping dependencies..." >&2
+    "$BUN" install --cwd "$CLI_DIR" --frozen-lockfile >/dev/null 2>&1 || \
+    "$BUN" install --cwd "$CLI_DIR" >/dev/null 2>&1
 fi
-npm cache clean --force 2>/dev/null || true
 
-# Verify installation
-codeforge --version
+exec "$BUN" "$CLI_DIR/src/index.ts" "$@"
+WRAPPER
+
+chmod +x /usr/local/bin/codeforge
 
-echo "[codeforge-cli] Installation complete"
+echo "[codeforge-cli] Wrapper installed at /usr/local/bin/codeforge"
+echo "[codeforge-cli] CLI will bootstrap from workspace source on first use"
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/architect.md b/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/architect.md
index 91c66e2..92fc3b2 100644
--- a/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/architect.md
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/architect.md
@@ -17,10 +17,8 @@ memory:
   scope: project
 skills:
   - api-design
-  - spec-new
-  - spec-update
-  - spec-init
-  - spec-review
+  - spec
+  - specs
 hooks:
   PreToolUse:
     - matcher: Bash
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/documenter.md b/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/documenter.md
index e32d9cd..b9f559f 100644
--- a/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/documenter.md
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/documenter.md
@@ -20,12 +20,9 @@ memory:
   scope: project
 skills:
   - documentation-patterns
-  - specification-writing
-  - spec-new
-  - spec-update
-  - spec-review
-  - spec-refine
-  - spec-check
+  - spec
+  - build
+  - specs
 ---
 
 # Documenter Agent
@@ -287,19 +284,11 @@ Use text-based diagrams when helpful (Mermaid syntax preferred). Keep diagrams s
 
 ### Spec Lifecycle Operations
 
-**Create** (`/spec-new`): Build a new spec from the template. Set status to `planned`, approval to `draft`, all requirements `[assumed]`.
+**Create & Refine** (`/spec`): Create a spec package, refine decisions with the human, and approve. AI makes obvious decisions, presents genuine trade-offs. Auto-bootstraps `.specs/` on first use.
 
-**Refine** (`/spec-refine`): Walk through assumptions with the user. Upgrade validated requirements from `[assumed]` to `[user-approved]`. Set approval to `user-approved` when all requirements are validated.
+**Build & Close** (`/build`): Implement from an approved spec. Plan, build with spec-first testing, self-healing review loop, and spec closure. Phase 3 flips `[ ]` to `[~]`. Phase 4 upgrades `[~]` to `[x]` after verification.
 
-**Build** (`/spec-build`): Orchestrate implementation from an approved spec. Phase 3 flips `[ ]` to `[~]`. Phase 4 upgrades `[~]` to `[x]` after verification.
-
-**Review** (`/spec-review`): Verify implementation matches spec. Read code, verify requirements, check acceptance criteria.
-
-**Update** (`/spec-update`): As-built closure. Set status to `implemented` or `partial`. Check off verified criteria. Add Implementation Notes for deviations. Update file paths.
-
-**Check** (`/spec-check`): Audit spec health across the project. Find stale, incomplete, or missing specs.
-
-**Init** (`/spec-init`): Bootstrap `.specs/` for a new project.
+**Dashboard** (`/specs`): Audit spec health across the project. Find stale, incomplete, draft, or unresolved specs.
 
 ### As-Built Workflow
 
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/generalist.md b/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/generalist.md
index 4f8b33a..3f979a4 100644
--- a/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/generalist.md
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/generalist.md
@@ -19,11 +19,9 @@ permissionMode: default
 memory:
   scope: project
 skills:
-  - spec-new
-  - spec-update
-  - spec-check
-  - spec-init
-  - spec-review
+  - spec
+  - build
+  - specs
 ---
 
 # Generalist Agent
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/implementer.md b/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/implementer.md
index 506fe5b..2e050d7 100644
--- a/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/implementer.md
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/implementer.md
@@ -17,7 +17,7 @@ memory:
 skills:
   - refactoring-patterns
   - migration-patterns
-  - spec-update
+  - build
 hooks:
   Stop:
     - type: command
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/migrator.md b/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/migrator.md
index b6becfd..595fee1 100644
--- a/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/migrator.md
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/migrator.md
@@ -19,7 +19,7 @@ memory:
   scope: user
 skills:
   - migration-patterns
-  - spec-update
+  - build
 ---
 
 # Migrator Agent
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/refactorer.md b/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/refactorer.md
index 1354715..e166fec 100644
--- a/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/refactorer.md
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/refactorer.md
@@ -19,7 +19,7 @@ memory:
   scope: project
 skills:
   - refactoring-patterns
-  - spec-update
+  - build
 hooks:
   PostToolUse:
     - matcher: Edit
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/spec-writer.md b/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/spec-writer.md
index 1e33cd6..32f4e0d 100644
--- a/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/spec-writer.md
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/spec-writer.md
@@ -17,13 +17,8 @@ permissionMode: plan
 memory:
   scope: user
 skills:
-  - specification-writing
-  - spec-new
-  - spec-update
-  - spec-check
-  - spec-init
-  - spec-refine
-  - spec-review
+  - spec
+  - specs
 ---
 
 # Spec Writer Agent
@@ -74,8 +69,8 @@ Your Open Questions section IS your question-surfacing mechanism. Make it promin
 - **NEVER** write vague requirements like "the system should be fast" or "the UI should be user-friendly." Every requirement must be specific, measurable, and testable.
 - **NEVER** combine multiple independent requirements into a single statement. One requirement per line — this makes requirements individually testable and trackable.
 - **NEVER** present decisions as settled facts unless the user explicitly approved them. Tech choices, architecture decisions, scope boundaries, performance targets, and behavioral defaults that you chose without user input MUST go in `## Open Questions` with options and trade-offs — not in Requirements as decided items.
-- **ALL** requirements you generate MUST be tagged `[assumed]`. You never produce `[user-approved]` requirements — only `/spec-refine` does that after explicit user validation.
-- **ALL** specs you produce MUST carry `**Approval:** draft`. After presenting a draft, state: "This spec requires `/spec-refine` before implementation can begin. All requirements are marked [assumed] until user-approved."
+- **ALL** specs you produce MUST carry `approval: draft` in the frontmatter. After presenting a draft, state: "This spec requires `/spec` refinement before implementation can begin."
+- Specs use spec-level approval (draft/approved), not per-requirement tagging. The AI makes obvious decisions and presents genuine trade-offs to the human during `/spec` refinement.
 - **Aim for ~200 lines per spec.** When a spec grows beyond that, recommend
   splitting into separate specs in the domain folder. Shorter specs are
   easier to consume and maintain, but complex features sometimes need more
@@ -292,7 +287,7 @@ NFR-2 [assumed]: [EARS requirement]
 - [External system or module this feature depends on]
 
 ## Resolved Questions
-[Populated by `/spec-refine`. Decisions explicitly approved by the user.]
+[Populated during `/spec` refinement. Decisions explicitly approved by the user.]
 
 ## Open Questions
 [Group related unknowns. For each question, provide:]
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/test-writer.md b/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/test-writer.md
index e14accc..ca33182 100644
--- a/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/test-writer.md
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/agent-system/agents/test-writer.md
@@ -19,7 +19,7 @@ memory:
   scope: project
 skills:
   - testing
-  - spec-update
+  - build
 hooks:
   Stop:
     - type: command
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/prompt-snippets/README.md b/container/.devcontainer/plugins/devs-marketplace/plugins/prompt-snippets/README.md
index a2bc1bb..b3db9e1 100644
--- a/container/.devcontainer/plugins/devs-marketplace/plugins/prompt-snippets/README.md
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/prompt-snippets/README.md
@@ -22,7 +22,7 @@ Type `/ps` followed by a snippet name to inject a behavioral directive for the r
 | `ship` | Commit, push, and create a PR |
 | `deep` | Thorough investigation, leave no stone unturned |
 | `hold` | Do the work but don't commit or push |
-| `recall` | Search session history with ccms for prior context |
+| `recall` | Search session history for prior context |
 | `wait` | When done, stop — no suggestions or follow-ups |
 
 ### Composing
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/prompt-snippets/skills/ps/SKILL.md b/container/.devcontainer/plugins/devs-marketplace/plugins/prompt-snippets/skills/ps/SKILL.md
index 61f0504..e783566 100644
--- a/container/.devcontainer/plugins/devs-marketplace/plugins/prompt-snippets/skills/ps/SKILL.md
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/prompt-snippets/skills/ps/SKILL.md
@@ -23,7 +23,7 @@ If `$ARGUMENTS` does not match any snippet name, list all available snippets and
 | `ship` | Commit all staged changes, push to remote, and create a pull request. |
 | `deep` | Be thorough and comprehensive. Investigate in depth, consider edge cases, leave no stone unturned. |
 | `hold` | Complete the current task but do not commit, push, or publish. Await my review before any git operations. |
-| `recall` | Search past session history with `ccms --no-color --project "$(pwd)"` to find prior decisions, discussions, and context relevant to the current task. Summarize what you find before proceeding. |
+| `recall` | Search past session history with `codeforge session search --no-color --project "$(pwd)"` to find prior decisions, discussions, and context relevant to the current task. Summarize what you find before proceeding. |
 | `wait` | When done, stop. Do not suggest next steps, ask follow-up questions, or continue with related work. Await further instructions. |
 
 ## Composing Snippets
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/skill-engine/scripts/skill-suggester.py b/container/.devcontainer/plugins/devs-marketplace/plugins/skill-engine/scripts/skill-suggester.py
index f6639de..0cd66e0 100644
--- a/container/.devcontainer/plugins/devs-marketplace/plugins/skill-engine/scripts/skill-suggester.py
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/skill-engine/scripts/skill-suggester.py
@@ -69,8 +69,15 @@
         "terms": ["fastapi", "uvicorn", "starlette", "sse-starlette"],
         "negative": ["pydanticai", "pydantic-ai", "pydantic ai"],
         "context_guards": [
-            "fastapi", "fast api", "api", "endpoint", "route",
-            "uvicorn", "rest", "server", "web",
+            "fastapi",
+            "fast api",
+            "api",
+            "endpoint",
+            "route",
+            "uvicorn",
+            "rest",
+            "server",
+            "web",
         ],
         "priority": 7,
     },
@@ -90,7 +97,12 @@
         "terms": ["aiosqlite", "better-sqlite3"],
         "negative": ["elasticsearch", "algolia", "meilisearch", "postgres", "mysql"],
         "context_guards": [
-            "sqlite", "sql", "database", "db", "query", "table",
+            "sqlite",
+            "sql",
+            "database",
+            "db",
+            "query",
+            "table",
         ],
         "priority": 7,
     },
@@ -139,7 +151,12 @@
         ],
         "terms": ["pydanticai", "RunContext", "VercelAIAdapter", "FallbackModel"],
         "context_guards": [
-            "pydantic", "agent", "ai", "model", "tool", "llm",
+            "pydantic",
+            "agent",
+            "ai",
+            "model",
+            "tool",
+            "llm",
         ],
         "priority": 7,
     },
@@ -187,14 +204,23 @@
             ("optimize docker image", 1.0),
         ],
         "terms": ["dockerfile", "compose.yaml", "BuildKit"],
-        "negative": ["docker-py", "docker py", "docker sdk", "docker from python",
-                     "aiodocker", "dockerclient"],
+        "negative": [
+            "docker-py",
+            "docker py",
+            "docker sdk",
+            "docker from python",
+            "aiodocker",
+            "dockerclient",
+        ],
         "context_guards": [
-            "docker", "container", "compose", "image", "dockerfile",
+            "docker",
+            "container",
+            "compose",
+            "image",
+            "dockerfile",
         ],
         "priority": 7,
     },
-
     # ------------------------------------------------------------------
     # Practice / pattern skills (priority 5)
     # ------------------------------------------------------------------
@@ -249,8 +275,17 @@
         ],
         "terms": ["diagnose", "troubleshoot", "OOMKilled", "ECONNREFUSED"],
         "context_guards": [
-            "log", "crash", "fail", "bug", "container", "stack",
-            "trace", "exception", "runtime", "service", "process",
+            "log",
+            "crash",
+            "fail",
+            "bug",
+            "container",
+            "stack",
+            "trace",
+            "exception",
+            "runtime",
+            "service",
+            "process",
         ],
         "priority": 5,
     },
@@ -301,21 +336,7 @@
         "context_guards": ["git", "commit", "branch", "history", "repo"],
         "priority": 5,
     },
-    "specification-writing": {
-        "phrases": [
-            ("write a spec", 0.8),
-            ("write requirements", 0.7),
-            ("define requirements", 0.7),
-            ("acceptance criteria", 0.6),
-            ("user stories", 0.6),
-            ("use ears format", 1.0),
-            ("given/when/then", 0.8),
-            ("write given/when/then scenarios", 1.0),
-            ("structure requirements", 0.7),
-        ],
-        "terms": ["specification", "ears", "gherkin", "given when then"],
-        "priority": 5,
-    },
+    # specification-writing merged into /spec skill (spec-workflow plugin)
     "performance-profiling": {
         "phrases": [
             ("profile this code", 0.9),
@@ -331,8 +352,14 @@
         ],
         "terms": ["cProfile", "py-spy", "scalene", "flamegraph", "hyperfine"],
         "context_guards": [
-            "profile", "profiler", "benchmark", "performance", "slow",
-            "latency", "bottleneck", "memory",
+            "profile",
+            "profiler",
+            "benchmark",
+            "performance",
+            "slow",
+            "latency",
+            "bottleneck",
+            "memory",
         ],
         "priority": 5,
     },
@@ -363,8 +390,14 @@
         ],
         "terms": ["pip-audit", "npm audit", "cargo audit", "govulncheck"],
         "context_guards": [
-            "dependency", "dependencies", "package", "packages",
-            "npm", "pip", "cargo", "audit",
+            "dependency",
+            "dependencies",
+            "package",
+            "packages",
+            "npm",
+            "pip",
+            "cargo",
+            "audit",
         ],
         "priority": 5,
     },
@@ -384,7 +417,6 @@
         "context_guards": ["agent", "agents", "teammate", "teammates"],
         "priority": 5,
     },
-
     # ------------------------------------------------------------------
     # Meta / generic skills (priority 3)
     # ------------------------------------------------------------------
@@ -416,8 +448,14 @@
         "terms": ["docstring", "jsdoc", "tsdoc", "rustdoc", "Sphinx"],
         # Note: "docs" omitted — it overlaps with the phrase "update the docs"
         "context_guards": [
-            "documentation", "docstring", "readme", "jsdoc", "api doc",
-            "rustdoc", "tsdoc", "sphinx",
+            "documentation",
+            "docstring",
+            "readme",
+            "jsdoc",
+            "api doc",
+            "rustdoc",
+            "tsdoc",
+            "sphinx",
         ],
         "priority": 3,
     },
@@ -436,112 +474,68 @@
         "terms": ["migrate", "migration"],
         # Note: "upgrade", "version", "modernize" omitted — overlap with phrases
         "context_guards": [
-            "framework", "breaking", "compatibility", "deprecated",
-            "legacy", "esm", "commonjs",
+            "framework",
+            "breaking",
+            "compatibility",
+            "deprecated",
+            "legacy",
+            "esm",
+            "commonjs",
         ],
         "priority": 3,
     },
-
     # ------------------------------------------------------------------
     # Spec-workflow command skills (priority 10)
     # ------------------------------------------------------------------
-    "spec-build": {
+    "spec": {
+        "phrases": [
+            ("create a spec", 0.8),
+            ("new spec", 0.8),
+            ("new feature spec", 0.9),
+            ("write a spec for", 0.9),
+            ("spec this feature", 0.9),
+            ("start a new spec", 0.9),
+            ("plan a feature", 0.2),
+            ("add a spec", 0.8),
+            ("refine the spec", 0.9),
+            ("approve the spec", 0.8),
+            ("write requirements", 0.7),
+            ("acceptance criteria", 0.6),
+            ("use ears format", 1.0),
+            ("set up specs", 0.8),
+            ("initialize specs", 0.9),
+            ("create constitution", 0.9),
+        ],
+        "terms": ["spec", "specification", "ears"],
+        "context_guards": ["spec", "specification", ".specs"],
+        "priority": 10,
+    },
+    "build": {
         "phrases": [
             ("implement the spec", 0.9),
             ("build from spec", 0.9),
             ("building from spec", 0.9),
             ("building from the spec", 0.9),
             ("start building", 0.2),
-            ("spec-build", 1.0),
             ("implement this feature", 0.2),
             ("build what the spec describes", 1.0),
-            ("run spec-build", 1.0),
+            ("run build", 0.8),
         ],
-        "terms": ["spec-build"],
+        "terms": ["build"],
         "context_guards": ["spec", "specification", ".specs"],
         "priority": 10,
     },
-    "spec-review": {
-        "phrases": [
-            ("review the spec", 0.8),
-            ("check spec adherence", 0.9),
-            ("verify implementation", 0.3),
-            ("spec-review", 1.0),
-            ("does code match spec", 0.9),
-            ("audit implementation", 0.4),
-            ("run spec-review", 1.0),
-            ("regression check", 0.3),
-        ],
-        "terms": ["spec-review"],
-        "context_guards": ["spec", "specification", ".specs"],
-        "priority": 10,
-    },
-    "spec-check": {
+    "specs": {
         "phrases": [
             ("check spec health", 0.9),
             ("audit specs", 0.9),
             ("which specs are stale", 1.0),
             ("find missing specs", 0.9),
             ("review spec quality", 0.9),
-            ("run spec-check", 1.0),
             ("are my specs up to date", 0.9),
+            ("spec dashboard", 0.9),
         ],
-        "terms": ["spec-check"],
-        "priority": 10,
-    },
-    "spec-init": {
-        "phrases": [
-            ("initialize specs", 0.9),
-            ("specs directory", 0.7),
-            ("set up specs", 0.8),
-            ("bootstrap specs", 0.9),
-            ("start using specs", 0.8),
-            ("create spec directory", 0.9),
-            ("init specs", 0.9),
-            ("set up .specs", 1.0),
-        ],
-        "terms": ["spec-init"],
-        "priority": 10,
-    },
-    "spec-new": {
-        "phrases": [
-            ("create a spec", 0.8),
-            ("new spec", 0.8),
-            ("new feature spec", 0.9),
-            ("write a spec for", 0.9),
-            ("spec this feature", 0.9),
-            ("start a new spec", 0.9),
-            ("plan a feature", 0.2),
-            ("add a spec", 0.8),
-        ],
-        "terms": ["spec-new"],
-        "context_guards": ["spec", "specification", ".specs"],
-        "priority": 10,
-    },
-    "spec-refine": {
-        "phrases": [
-            ("refine the spec", 0.9),
-            ("review spec assumptions", 0.9),
-            ("validate spec decisions", 0.9),
-            ("approve the spec", 0.8),
-            ("walk me through the spec", 0.8),
-            ("check spec for assumptions", 0.9),
-            ("iterate on the spec", 0.8),
-        ],
-        "terms": ["spec-refine"],
-        "priority": 10,
-    },
-    "spec-update": {
-        "phrases": [
-            ("update the spec", 0.8),
-            ("mark spec as implemented", 0.9),
-            ("as-built update", 0.9),
-            ("finish the spec", 0.7),
-            ("close the spec", 0.7),
-            ("update spec status", 0.8),
-            ("sync spec with code", 0.8),
-        ],
-        "terms": ["spec-update"],
+        "terms": ["specs"],
         "priority": 10,
     },
     "worktree": {
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/.claude-plugin/plugin.json b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/.claude-plugin/plugin.json
index ad3e896..e1975cd 100644
--- a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/.claude-plugin/plugin.json
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/.claude-plugin/plugin.json
@@ -1,6 +1,6 @@
 {
 	"name": "spec-workflow",
-	"description": "Specification lifecycle management: creation, refinement, building, reviewing, updating, and auditing",
+	"description": "Specification lifecycle management with directory-based spec packages: /spec (create & refine), /build (implement & close), /specs (dashboard)",
 	"author": {
 		"name": "AnExiledDev"
 	}
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/README.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/README.md
index deadbd1..fa557fb 100644
--- a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/README.md
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/README.md
@@ -1,50 +1,60 @@
 # spec-workflow
 
-Claude Code plugin that manages the full specification lifecycle: creating, refining, building, reviewing, updating, and auditing feature specs. Includes an advisory hook that reminds about spec updates when code changes but specs don't.
+Claude Code plugin for specification-driven development. Manages the full spec lifecycle with directory-based "spec packages" designed for AI-first implementation with 1M+ token context windows and team-based parallel execution.
 
 ## What It Does
 
 Two capabilities:
 
-1. **Spec lifecycle skills** — 8 skills that cover the complete journey from bootstrapping a `.specs/` directory to closing out an as-built spec after implementation.
+1. **Spec lifecycle skills** — 3 commands covering creation, implementation, and health monitoring. Specs are directory packages with human-facing and AI-facing content separated by audience.
 
-2. **Spec reminder hook** — A `Stop` hook that fires when source code was modified but no `.specs/` files were updated, advising Claude to run `/spec-update`.
+2. **Spec reminder hook** — A `Stop` hook that fires when source code was modified but no `.specs/` files were updated.
 
-### Skill Catalog
+### Commands
 
-| Skill | Slash Command | Purpose |
-|-------|---------------|---------|
-| spec-init | `/spec-init` | Bootstrap `.specs/` directory with BACKLOG.md, MILESTONES.md, ROADMAP.md |
-| spec-new | `/spec-new` | Create a new feature spec from EARS template |
-| spec-refine | `/spec-refine` | Validate assumptions with user, upgrade requirements to `[user-approved]` |
-| spec-build | `/spec-build` | Orchestrate full implementation: plan, build, review, close |
-| spec-check | `/spec-check` | Audit all specs for health issues |
-| spec-review | `/spec-review` | Verify implementation against a spec |
-| spec-update | `/spec-update` | As-built closure: update spec to match implementation |
-| specification-writing | `/skill specification-writing` | Domain knowledge for writing high-quality specs |
+| Command | Purpose |
+|---------|---------|
+| `/spec <feature>` | Create, refine, and approve a spec package |
+| `/spec constitution` | Create or update the project Constitution |
+| `/build <feature>` | Implement an approved spec — plan, build, review, close |
+| `/specs` | Dashboard: spec health across all specs |
 
 ### Spec Lifecycle
 
 ```
-/spec-init          Bootstrap .specs/ directory
+/spec <feature>     Create spec package, refine decisions with user, approve
      |
-/spec-new           Create feature spec (draft, [assumed] requirements)
+/build <feature>    Full implementation lifecycle:
+     |                Phase 1: Discovery & gate check
+     |                Phase 2: Planning & task decomposition
+     |                Phase 3: Build ([ ] -> [~]) with spec-first testing
+     |                Phase 4: Self-healing review loop ([~] -> [x])
+     |                Phase 5: Closure & summary report
      |
-/spec-refine        Validate with user -> [user-approved] requirements
-     |
-/spec-build         5-phase implementation orchestration:
-     |                Phase 1: Discovery
-     |                Phase 2: Planning
-     |                Phase 3: Building ([ ] -> [~])
-     |                Phase 4: Review   ([~] -> [x])
-     |                Phase 5: Closure  (as-built update)
-     |
-/spec-review        Standalone verification (post-change audits)
-     |
-/spec-update        Manual as-built closure
-     |
-/spec-check         Health audit across all specs
+/specs              Health dashboard across all specs
+```
+
+### Spec Package Structure
+
+Every spec is a directory with separated human and AI content:
+
 ```
+.specs/{domain}/{feature}/
+  index.md      # Human reviews this (~50-80 lines)
+  context.md    # AI reads this (invariants, schema, constraints)
+  groups/
+    a-*.md      # AC groups with frontmatter for parallel agents
+    b-*.md
+```
+
+### Key Design Principles
+
+- **Human reviews only decisions + scope.** `index.md` is ~50-80 lines. Everything else is for the AI.
+- **AI makes obvious decisions.** Only genuine trade-offs are presented to the human.
+- **Spec-level approval.** No per-requirement `[assumed]`/`[user-approved]` tagging. The spec is either `draft` or `approved`.
+- **Directory-based always.** No single-file format. Consistent structure regardless of spec size.
+- **Constitution captures cross-cutting decisions.** Project-level patterns, conventions, and boundaries live in `.specs/CONSTITUTION.md`.
+- **Group frontmatter drives parallelism.** `depends_on` and `files_owned` enable automatic task decomposition for team builds.
 
 ### Acceptance Criteria Markers
 
@@ -54,12 +64,14 @@ Two capabilities:
 | `[~]` | Implemented, not yet verified |
 | `[x]` | Verified — tests pass, behavior confirmed |
 
-### Approval and Requirement Tags
+### `[ai-decided]` Workflow
 
-- `**Approval:** draft` — Spec is in draft, not ready for implementation
-- `**Approval:** user-approved` — Spec reviewed and approved by user
-- `[assumed]` — Requirement inferred by Claude, needs validation
-- `[user-approved]` — Requirement explicitly approved by user
+During `/build`, when the AI encounters a decision not covered by the spec or Constitution:
+1. Makes its best choice (Constitution → codebase patterns → best practice)
+2. Records in the group file's AI Decisions table
+3. Continues building (does NOT stop)
+4. Summary Report presents all AI decisions for user review
+5. User approves, overrides, or promotes to Constitution
 
 ## How It Works
 
@@ -68,54 +80,18 @@ Two capabilities:
 ```
 Claude stops responding (Stop event)
   |
-  +-> Stop fires
+  +-> spec-reminder.py
         |
-        +-> spec-reminder.py
-              |
-              +-> .specs/ directory exists?
-              |     |
-              |     +-> No -> Silent exit (no output)
-              |     +-> Yes -> Continue
-              |
-              +-> Source code modified this session?
-              |     |
-              |     +-> No -> Silent exit
-              |     +-> Yes -> Continue
-              |
-              +-> .specs/ files also modified?
-                    |
-                    +-> Yes -> Silent exit (already updated)
-                    +-> No -> Inject advisory: "Run /spec-update"
+        +-> .specs/ exists? No -> silent exit
+        +-> Code modified?  No -> silent exit
+        +-> Specs modified? Yes -> silent exit (already updated)
+        +-> Inject advisory: "Run /build or /spec"
 ```
 
 ### Monitored Source Directories
 
-The spec reminder watches for changes in these directories:
-
 `src/`, `lib/`, `app/`, `pkg/`, `internal/`, `cmd/`, `tests/`, `api/`, `frontend/`, `backend/`, `packages/`, `services/`, `components/`, `pages/`, `routes/`
 
-### Exit Code Behavior
-
-| Exit Code | Meaning |
-|-----------|---------|
-| 0 | Advisory injected (or silent — no action needed) |
-
-The hook never blocks operations.
-
-### Error Handling
-
-| Scenario | Behavior |
-|----------|----------|
-| No `.specs/` directory | Silent exit |
-| Not a git repository | Silent exit |
-| JSON parse failure | Silent exit |
-
-### Timeouts
-
-| Hook | Timeout |
-|------|---------|
-| Spec reminder (Stop) | 8s |
-
 ## Installation
 
 ### CodeForge DevContainer
@@ -124,16 +100,8 @@ Pre-installed and activated automatically — no setup needed.
 
 ### From GitHub
 
-Use this plugin in any Claude Code setup:
-
-1. Clone the [CodeForge](https://github.com/AnExiledDev/CodeForge) repository:
-
-   ```bash
-   git clone https://github.com/AnExiledDev/CodeForge.git
-   ```
-
-2. Enable the plugin in your `.claude/settings.json`:
-
+1. Clone the [CodeForge](https://github.com/AnExiledDev/CodeForge) repository
+2. Enable in `.claude/settings.json`:
    ```json
    {
      "enabledPlugins": {
@@ -142,51 +110,39 @@ Use this plugin in any Claude Code setup:
    }
    ```
 
-   Replace `<clone-path>` with the absolute path to your CodeForge clone.
-
 ## Plugin Structure
 
 ```
 spec-workflow/
 +-- .claude-plugin/
-|   +-- plugin.json                     # Plugin metadata
+|   +-- plugin.json
 +-- hooks/
-|   +-- hooks.json                      # Stop hook registration
+|   +-- hooks.json
 +-- scripts/
-|   +-- spec-reminder.py                # Spec update advisory (Stop)
+|   +-- spec-reminder.py           # Stop hook: spec update advisory
 +-- skills/
-|   +-- spec-init/                      # Bootstrap .specs/ directory
+|   +-- spec/                      # /spec — create, refine, approve
 |   |   +-- SKILL.md
 |   |   +-- references/
+|   |       +-- index-template.md
+|   |       +-- context-template.md
+|   |       +-- group-template.md
+|   |       +-- constitution-template.md
 |   |       +-- backlog-template.md
-|   |       +-- milestones-template.md
-|   |       +-- roadmap-template.md
-|   +-- spec-new/                       # Create new feature spec
-|   |   +-- SKILL.md
-|   |   +-- references/
-|   |       +-- template.md
-|   +-- spec-refine/                    # Validate assumptions with user
-|   |   +-- SKILL.md
-|   +-- spec-build/                     # Full implementation orchestration
+|   |       +-- ears-patterns.md
+|   |       +-- example-webhook/   # Complete example spec package
+|   +-- build/                     # /build — implement, review, close
 |   |   +-- SKILL.md
 |   |   +-- references/
 |   |       +-- review-checklist.md
-|   +-- spec-check/                     # Spec health audit
-|   |   +-- SKILL.md
-|   +-- spec-review/                    # Implementation verification
-|   |   +-- SKILL.md
-|   +-- spec-update/                    # As-built closure
-|   |   +-- SKILL.md
-|   +-- specification-writing/          # Domain knowledge skill
+|   |       +-- summary-report-template.md
+|   +-- specs/                     # /specs — health dashboard
 |       +-- SKILL.md
-|       +-- references/
-|           +-- criteria-patterns.md
-|           +-- ears-templates.md
-+-- README.md                           # This file
++-- README.md
 ```
 
 ## Requirements
 
 - Python 3.11+
 - Git (for detecting file changes)
-- Claude Code with plugin hook support (skills)
+- Claude Code with plugin hook support
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/scripts/spec-reminder.py b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/scripts/spec-reminder.py
index e1f0b0d..f143067 100644
--- a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/scripts/spec-reminder.py
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/scripts/spec-reminder.py
@@ -4,7 +4,7 @@
 
 On Stop, checks if source code was modified but no .specs/ files were updated.
 Injects an advisory reminder as additionalContext pointing the user to
-/spec-update.
+/build or /spec.
 
 Only fires when a .specs/ directory exists (project uses the spec system).
 
@@ -138,10 +138,9 @@ def main():
     message = (
         f"[Spec Reminder] Code was modified in {dirs_str} "
         "but no specs were updated. "
-        "Use /spec-review to verify implementation against the spec, "
-        "then /spec-update to close the loop. "
-        "Use /spec-new if no spec exists for this feature, "
-        "or /spec-refine if the spec is still in draft status."
+        "Use /build <feature> to implement from an approved spec and close the loop. "
+        "Use /spec <feature> if no spec exists yet. "
+        "Use /specs to check overall spec health."
     )
 
     if session_id:
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/build/SKILL.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/build/SKILL.md
new file mode 100644
index 0000000..454b834
--- /dev/null
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/build/SKILL.md
@@ -0,0 +1,280 @@
+---
+name: build
+description: >-
+  Implements an approved spec package through the full lifecycle: discovery,
+  task decomposition, parallel build, self-healing review, and spec closure.
+  Reads Constitution + spec package, auto-creates tasks from group frontmatter,
+  and generates a Completion Summary Report. USE WHEN the user asks to
+  "build the spec", "implement this feature", "build from spec", "start
+  building", "run build", or works with approved spec packages.
+  DO NOT USE for creating or refining specs (use /spec) or checking
+  spec health (use /specs).
+version: 1.0.0
+argument-hint: "[feature-name or spec-path]"
+---
+
+# Implement & Close Spec Package
+
+## Mental Model
+
+An approved spec package is a complete implementation contract. The human has reviewed decisions and scope. The AI's job is to build everything described in the spec, verify it works, and close the loop — without human intervention.
+
+The build process reads three layers of context:
+1. **Constitution** (`.specs/CONSTITUTION.md`) — project-level decisions
+2. **Spec package** (`index.md` + `context.md` + `groups/*.md`) — feature decisions + ACs
+3. **Codebase** — existing code, patterns, dependencies
+
+Group frontmatter drives task decomposition. Each group becomes a work unit assigned to an agent (or executed sequentially for simple specs). The `depends_on` field determines execution order.
+
+---
+
+## Workflow
+
+### Phase 1: Discovery & Gate Check
+
+#### 1a: Find the Spec
+
+```
+Glob: .specs/**/{feature}/index.md  OR  .specs/**/index.md
+```
+
+Match by `$ARGUMENTS`. If ambiguous, list matching specs and ask.
+
+#### 1b: Gate Check
+
+Read `index.md` frontmatter. Verify `approval: approved`.
+
+- If `approved` → proceed
+- If `draft` → **STOP**: "This spec is not approved. Run `/spec {feature}` to refine and approve it."
+
+If status is `implemented`, ask: "This spec is already implemented. Re-build, update, or error?"
+
+#### 1c: Load Full Context
+
+Read in order:
+1. `.specs/CONSTITUTION.md` (if exists and populated)
+2. `index.md` — decisions, AC summary, scope
+3. `context.md` — invariants, anti-patterns, integration context, schema, constraints
+4. All `groups/*.md` — full ACs, frontmatter, file ownership
+
+Extract:
+- All acceptance criteria and their current markers
+- Group dependency graph from `depends_on` fields
+- File ownership map from `files_owned` fields
+- Existing files vs new files to create
+
+### Phase 2: Planning & Decomposition
+
+#### 2a: Assess Complexity
+
+Count groups and total ACs. Apply thresholds:
+
+| Indicator | Solo | Team |
+|-----------|------|------|
+| Groups | 1-2 | 3+ |
+| Total ACs | ≤6 | 7+ |
+| Independent groups (no depends_on) | 1 | 2+ |
+
+If 2+ indicators point to Team → recommend team spawning.
+
+#### 2b: Create Implementation Plan
+
+Use `EnterPlanMode`. The plan MUST include:
+
+1. **Spec reference** — path to spec package
+2. **Constitution summary** — key decisions that affect this build
+3. **Group execution order** — respecting `depends_on` graph
+4. **Per-group breakdown:**
+   - Files to create/modify (from `files_owned`)
+   - ACs to implement (from `criteria`)
+   - Tests to write (derived from ACs)
+   - Dependencies on other groups
+5. **Team composition** (if applicable):
+   - Which specialist agent types for which groups
+   - File ownership boundaries (no overlap)
+6. **Phase 3-5 instructions** (preserved for post-plan execution)
+
+Present via `ExitPlanMode`. Wait for approval.
+
+#### 2c: Team Setup (if applicable)
+
+If team recommended and approved:
+1. `TeamCreate` with feature name
+2. Create tasks from group frontmatter — each group = one task
+3. Spawn teammates using specialist agents matching the work
+4. Assign groups respecting `depends_on` ordering
+
+### Phase 3: Implementation
+
+Execute the plan. For each group (solo or via team):
+
+#### 3a: Agent Context Loading
+
+Each implementing agent reads:
+- `index.md` — for decisions
+- `context.md` — for invariants, anti-patterns, schema, constraints
+- Their assigned group file — for full ACs with examples
+
+#### 3b: Build Loop (per AC)
+
+For each acceptance criterion in the group:
+
+1. **Write test FROM THE SPEC** — derive test from the AC's Given/When/Then + Example. NOT from the implementation.
+   ```python
+   def test_webhook_registration():
+       """Verifies: AC-1 (integrations/webhook-delivery)"""
+       # Test derived from spec's Given/When/Then
+   ```
+2. **Implement** — write the code to make the test pass
+3. **Mark `[~]`** — update the AC marker in the group file
+4. **Record AI decisions** — if the AI encounters a decision not in the spec or Constitution, record it in the group's `## AI Decisions` table and continue
+
+#### 3c: Invariant Check
+
+After implementing all ACs in a group, verify the group's work against `context.md` invariants. Every invariant must hold.
+
+#### 3d: Progress Tracking
+
+Update group frontmatter: `status: in_progress`, `owner: {agent-name}`
+
+### Phase 4: Review & Fix Loop
+
+**This is a FIX LOOP, not a report.** The AI finds issues and fixes them.
+
+#### 4a: Re-read Spec with Fresh Eyes
+
+Re-read the full spec package. Compare implementation against:
+- Every AC's EARS criterion text
+- Every invariant in context.md
+- Every anti-pattern (verify none were violated)
+- Schema intent (verify models match)
+- Constraints (verify file locations, patterns)
+
+#### 4b: Run All Tests
+
+Run the full test suite (not just feature tests). Check for:
+- All new tests passing
+- No regressions in existing tests
+- Coverage of all ACs
+
+#### 4c: Fix Issues
+
+For each issue found:
+1. Fix the code
+2. Re-run affected tests
+3. Verify the fix doesn't break invariants
+
+**Loop until:** All ACs pass OR an issue is genuinely unfixable (document as discrepancy).
+
+#### 4d: Upgrade Markers
+
+For each AC where the test passes: upgrade `[~]` → `[x]` in the group file.
+
+Update group frontmatter: `status: verified`
+
+### Phase 5: Closure
+
+#### 5a: Update Spec Status
+
+In `index.md` frontmatter:
+- Set `status: implemented` (all ACs `[x]`) or `partial` (some remain `[ ]`/`[~]`)
+- Set `last_updated` to today
+
+#### 5b: Generate Completion Summary Report
+
+Use `references/summary-report-template.md`. Include:
+
+- **AC Results:** Table of all ACs with pass/fail status and test file paths
+- **AI Decisions:** Aggregated from all group files
+- **Concerns:** Edge cases, performance observations, security notes
+- **Discrepancies:** Gaps between spec and implementation (if any)
+
+Present the report to the user.
+
+#### 5c: Shutdown Team (if applicable)
+
+Send shutdown requests to all teammates. Wait for confirmation. Clean up.
+
+---
+
+## Spec-First Testing (Hard Rule)
+
+Every test MUST be derived from the spec's acceptance criteria, NOT from the implementation.
+
+**Why:** When tests are derived from code, they validate current behavior — including bugs. Spec-derived tests validate intended behavior.
+
+**How:**
+- Read the AC's Given/When/Then + Example
+- Write the test to match the spec's expected behavior
+- Include traceability: `"""Verifies: AC-N (domain/feature)"""`
+
+**Traceability enables:**
+- Automated verification that every AC has a test
+- Impact analysis when spec changes
+- No orphan tests (every test maps to an AC)
+
+---
+
+## `[ai-decided]` Workflow
+
+When the AI encounters a decision not covered by Constitution or spec:
+
+1. Make best choice: Constitution patterns → codebase patterns → best practice
+2. Record in the group file's `## AI Decisions` table:
+   ```
+   | AD-1 | Connection pool size | 10 connections | Matches existing httpx config |
+   ```
+3. **Continue building** — do NOT stop for user input
+4. Summary Report presents all decisions for post-build review
+5. User can: approve, override (AI re-implements), or promote to Constitution
+
+---
+
+## AC Markers
+
+| Marker | Meaning | Set By |
+|--------|---------|--------|
+| `[ ]` | Not started | `/spec` (creation) |
+| `[~]` | Implemented, not yet verified | Phase 3 (after code written) |
+| `[x]` | Verified — tests pass | Phase 4 (after test passes) |
+
+Markers live in group files, inline with each AC heading.
+
+---
+
+## Persistence & Resumability
+
+If interrupted mid-build:
+- Group frontmatter `status` shows which groups are done
+- AC markers show which criteria are addressed
+- To resume: re-run `/build {feature}` — it detects partial state and continues from where it left off
+
+---
+
+## Ambiguity Policy
+
+- If `$ARGUMENTS` matches multiple specs, list and ask
+- If a group's `depends_on` references a non-existent group, warn and ask
+- If `files_owned` overlap between groups, warn and ask how to resolve
+- If the Constitution is missing or empty, proceed with codebase-inferred decisions (note as AI decisions)
+- If Phase 4 reveals significant gaps, present to user before closing
+
+---
+
+## Anti-Patterns
+
+- **Skipping the plan:** Always plan before building. The plan is the user's approval gate.
+- **Tests from code:** Tests validate spec intent, not implementation details. Derive from ACs.
+- **Optimistic markers:** Never mark `[x]` without a passing test. Every `[x]` has evidence.
+- **Scope creep:** Build ONLY what's in the spec. Additions go in the backlog.
+- **Silent failures:** Every Phase 4 issue must be fixed or explicitly documented as a discrepancy.
+- **Skipping Phase 5:** The spec-reminder hook catches this, but close the loop immediately.
+
+---
+
+## Reference Files
+
+| File | Contents |
+|------|----------|
+| `references/review-checklist.md` | Phase 4 review checklist |
+| `references/summary-report-template.md` | Completion Summary Report format |
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/build/references/review-checklist.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/build/references/review-checklist.md
new file mode 100644
index 0000000..7ad716d
--- /dev/null
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/build/references/review-checklist.md
@@ -0,0 +1,94 @@
+# Phase 4 Review Checklist
+
+Use this checklist during `/build` Phase 4 (Review & Fix Loop). Walk every item. Fix issues before proceeding to Phase 5.
+
+---
+
+## 1. Acceptance Criteria Coverage
+
+For each AC in every group file:
+
+- [ ] Implementation exists (code written)
+- [ ] Test exists with traceability comment (`Verifies: AC-N`)
+- [ ] Test passes
+- [ ] Test is derived from spec (Given/When/Then), not from implementation
+- [ ] Marker upgraded from `[~]` to `[x]` after test passes
+
+**Flag:** Any AC without a passing test cannot be marked `[x]`.
+
+## 2. Invariant Compliance
+
+For each invariant in `context.md`:
+
+- [ ] Invariant holds across ALL implemented code (not just the AC that seems related)
+- [ ] No code path violates the invariant under any condition
+- [ ] Edge cases don't create invariant violations
+
+**Flag:** Invariant violations are high-priority fixes.
+
+## 3. Anti-Pattern Check
+
+For each anti-pattern in `context.md`:
+
+- [ ] No code matches the described anti-pattern
+- [ ] No "clever" workarounds that technically avoid the anti-pattern but violate its intent
+
+**Flag:** Anti-pattern matches indicate specification gaming.
+
+## 4. Schema Verification
+
+Compare implemented models/migrations against `context.md` Schema Intent:
+
+- [ ] All tables created with correct column names and types
+- [ ] Constraints match (PKs, FKs, NOT NULL, CHECK, UNIQUE)
+- [ ] Indexes created as specified
+- [ ] No extra columns or tables added beyond spec
+
+## 5. Integration Correctness
+
+Compare code against `context.md` Integration Context:
+
+- [ ] Dependencies used with correct method signatures
+- [ ] Error handling matches documented behavior (e.g., catches expected exceptions)
+- [ ] No undocumented dependencies introduced
+
+## 6. Constraint Compliance
+
+From `context.md` Constraints:
+
+- [ ] Files created in specified locations
+- [ ] Patterns followed (referenced files used as templates)
+- [ ] "Must NOT" prohibitions respected
+- [ ] No files created outside the spec's file ownership
+
+## 7. Decision Compliance
+
+From `index.md` Decisions:
+
+- [ ] Every "Needs Your Input" decision implemented as the user chose
+- [ ] Every "Already Decided" decision implemented as specified
+- [ ] Any deviations documented as AI Decisions with reasoning
+
+## 8. Scope Check
+
+From `index.md` Out of Scope:
+
+- [ ] No code implements out-of-scope features
+- [ ] No "helpful" additions beyond the spec
+- [ ] No functionality that serves a different feature
+
+## 9. Code Quality
+
+- [ ] Error handling at appropriate boundaries
+- [ ] No hardcoded values that should be configurable
+- [ ] Functions are short and single-purpose
+- [ ] Type hints on all function signatures
+- [ ] No regressions in existing tests (run full suite)
+
+## 10. AI Decision Audit
+
+For each AI Decision recorded in group files:
+
+- [ ] Decision was genuinely not covered by Constitution or spec
+- [ ] Reasoning is clear and defensible
+- [ ] A different reasonable choice wouldn't have been better
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/build/references/summary-report-template.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/build/references/summary-report-template.md
new file mode 100644
index 0000000..42eb6ca
--- /dev/null
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/build/references/summary-report-template.md
@@ -0,0 +1,77 @@
+# Completion Summary Report Template
+
+Generated by `/build` Phase 5 after implementation and review.
+
+---
+
+```markdown
+## Completion Summary Report
+
+**Feature:** [name]
+**Spec:** .specs/[domain]/[feature]/
+**Status:** implemented | partial
+**Completed:** YYYY-MM-DD
+
+### Acceptance Criteria Results
+
+| AC | Description | Result | Test |
+|----|------------|--------|------|
+| AC-1 | [brief] | [x] PASS | `tests/path/test_file.py::test_name` |
+| AC-2 | [brief] | [x] PASS | `tests/path/test_file.py::test_name` |
+| AC-3 | [brief] | [ ] FAIL | `tests/path/test_file.py::test_name` — [reason] |
+
+**Summary:** N/M criteria verified. Tests: X passed, Y failed.
+
+### AI Decisions Made
+
+[Aggregated from all group files]
+
+| # | Group | Decision | Choice | Reasoning |
+|---|-------|----------|--------|-----------|
+| AD-1 | B: Delivery | [what] | [choice] | [why] |
+
+**Action needed:** Review AI decisions. Approve, override, or promote to Constitution.
+
+### Concerns
+
+[Issues the AI wants to flag — edge cases not covered by ACs, performance
+observations, security considerations, technical debt created.]
+
+- None | [list]
+
+### Discrepancies
+
+[Gaps between spec and implementation. Things that don't match but were
+necessary or unavoidable.]
+
+- None | [list with explanation]
+
+### Files Created/Modified
+
+[Grouped by area, matching spec's Constraints section]
+
+*Models:*
+- `src/models/webhook.py` — created
+
+*Services:*
+- `src/services/webhook_service.py` — created
+
+*Tests:*
+- `tests/unit/services/test_webhook_service.py` — created
+
+### Next Steps
+
+- [ ] Smoke test the feature manually
+- [ ] Review AI decisions (approve/override/promote)
+- [ ] [Any feature-specific next steps]
+```
+
+---
+
+## Report Guidelines
+
+- **Be honest about failures.** If an AC didn't pass, say so clearly.
+- **AI Decisions are the primary review surface.** The human focuses on these.
+- **Concerns are for the human's judgment.** Flag things the spec didn't cover.
+- **Discrepancies are NOT failures.** They're documented deviations with reasoning.
+- **File list helps the human navigate.** Include every file created or modified.
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-build/SKILL.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-build/SKILL.md
deleted file mode 100644
index 5dd809d..0000000
--- a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-build/SKILL.md
+++ /dev/null
@@ -1,356 +0,0 @@
----
-name: spec-build
-description: >-
-  Orchestrates full implementation of an approved specification through
-  5 phases: discovery, planning, building, review, and closure. USE WHEN
-  the user asks to "implement the spec", "build from spec", "start building
-  the feature", "implement this feature", "build what the spec describes",
-  "run spec-build", or works with phased implementation workflows.
-  DO NOT USE for creating, refining, or updating specs — use spec-new,
-  spec-refine, or spec-update instead.
-version: 0.2.0
-argument-hint: "[spec-path]"
----
-
-# Spec-Driven Implementation
-
-## Mental Model
-
-An approved spec is a contract — it defines exactly what to build, what to skip, and how to verify success. This skill takes a `user-approved` spec and orchestrates the full implementation lifecycle: plan the work, build it, review everything against the spec, and close the loop. No separate `/spec-update` run is needed afterward — Phase 5 performs full as-built closure.
-
-The workflow is five phases executed in strict order. Each phase has a clear gate before the next can begin.
-
-```
-/spec-new  ->  /spec-refine  ->  /spec-build
-                                    |
-                                    +-> Phase 1: Discovery & Gate Check
-                                    +-> Phase 2: Implementation Planning
-                                    +-> Phase 3: Implementation
-                                    +-> Phase 4: Comprehensive Review
-                                    +-> Phase 5: Spec Closure
-```
-
-> **Note:** Phase 4's review functionality is also available standalone via `/spec-review` for features implemented outside of `/spec-build`.
-
----
-
-## Acceptance Criteria Markers
-
-During implementation, acceptance criteria use three states:
-
-| Marker | Meaning |
-|--------|---------|
-| `[ ]` | Not started |
-| `[~]` | Implemented, not yet verified — code written, tests not confirmed |
-| `[x]` | Verified — tests pass, behavior confirmed |
-
-Phase 3 flips `[ ]` to `[~]` as criteria are addressed in code. Phase 4 upgrades `[~]` to `[x]` after verification. This convention is the only spec edit during active implementation.
-
----
-
-## CRITICAL: Planning Before Implementation
-
-Phase 2 generates an implementation plan. This plan MUST be created and approved before any code changes begin in Phase 3. Use `EnterPlanMode` to create the plan. The plan MUST include Phases 3, 4, and 5 instructions verbatim — these phases run after plan approval, and the instructions must be preserved so they execute correctly even across context boundaries.
-
-Do NOT skip planning. Do NOT begin writing code during Phase 2. The plan is a contract with the user — get approval first.
-
----
-
-## Complexity Assessment
-
-Before planning, assess the spec's complexity to determine whether team spawning would benefit the implementation.
-
-**Complexity indicators** — if two or more apply, the spec is complex:
-- 8+ functional requirements (FR-*)
-- Cross-layer work (backend + frontend + tests spanning different frameworks)
-- 3+ independent workstreams that could run in parallel
-- Multiple services or modules affected
-
-### When Complexity is High: Recommend Team Spawning
-
-Decompose work into parallel workstreams and recommend team composition using the project's existing custom agents. These agents carry frontloaded skills, safety hooks, and tailored instructions — always prefer them over generalist agents.
-
-**Recommended compositions by spec type:**
-
-| Spec Type | Teammates |
-|-----------|-----------|
-| Full-stack feature | researcher + test-writer + documenter |
-| Backend-heavy | researcher + test-writer |
-| Security-sensitive | security-auditor + test-writer |
-| Refactoring work | refactorer + test-writer |
-| Multi-service | researcher per service + test-writer |
-
-**Available specialist agents:** `architect`, `bash-exec`, `claude-guide`, `debug-logs`, `dependency-analyst`, `documenter`, `explorer`, `generalist`, `git-archaeologist`, `migrator`, `perf-profiler`, `refactorer`, `researcher`, `security-auditor`, `spec-writer`, `statusline-config`, `test-writer`
-
-Use `generalist` only when no specialist matches the workstream. Hard limit: 3-5 active teammates maximum.
-
-**When complexity is low** (< 8 requirements, single layer, sequential work): skip team spawning, implement directly in the main thread. Still follow all 5 phases.
-
-The user can override the team recommendation in either direction.
-
----
-
-## Phase 1: Discovery & Gate Check
-
-### Step 1: Find the Spec
-
-```
-Glob: .specs/**/*.md
-```
-
-Match by `$ARGUMENTS` — the user provides a feature name or path. If ambiguous, list matching specs and ask which one to implement.
-
-### Step 2: Read the Full Spec
-
-Read every line. Extract structured data:
-
-- **All `[user-approved]` requirements** — every FR-* and NFR-* with their EARS-format text
-- **All acceptance criteria** — every `[ ]` checkbox item
-- **Key Files** — existing files to read for implementation context
-- **Dependencies** — prerequisite features, systems, or libraries
-- **Out of Scope** — explicit exclusions that define boundaries to respect
-
-### Step 3: Gate Check
-
-**Hard gate**: Verify the spec has `**Approval:** user-approved`.
-
-- If `user-approved` -> proceed to Step 4
-- If `draft` or missing -> **STOP**. Print: "This spec is not approved for implementation. Run `/spec-refine <feature>` first to validate assumptions and get user approval." Do not continue.
-
-This gate is non-negotiable. Draft specs contain unvalidated assumptions — building against them risks wasted work.
-
-### Step 4: Build Context
-
-Read every file listed in the spec's `## Key Files` section. These are the files the spec author identified as most relevant to implementation. Understanding them is prerequisite to planning.
-
-After reading, note:
-- Which key files exist vs. which are new (to be created)
-- Patterns, conventions, and interfaces in existing files
-- Any dependencies or constraints discovered in the code
-
-### Step 5: Assess Complexity
-
-Apply the complexity indicators from the assessment section above. Note the result for Phase 2 — it determines whether to recommend team spawning.
-
----
-
-## Phase 2: Implementation Planning
-
-**Do NOT write any code in this phase.** This phase produces a plan only.
-
-Use `EnterPlanMode` to enter plan mode. Create a structured implementation plan covering:
-
-### Plan Structure
-
-1. **Spec Reference** — path to the spec file, domain, feature name
-2. **Complexity Assessment** — indicators found, team recommendation (if applicable)
-3. **Requirement-to-File Mapping** — each FR-*/NFR-* mapped to specific file changes
-4. **Implementation Steps** — ordered by dependency, grouped by related requirements:
-   - For each step: files to create/modify, requirements addressed, acceptance criteria to verify
-   - Mark which steps depend on others completing first
-5. **Out-of-Scope Boundaries** — items from the spec's Out of Scope section, noted as "do not touch"
-6. **Verification Checkpoints** — acceptance criteria listed as checkpoints after each logical group of steps
-
-### Preserving Phase Instructions
-
-The plan MUST include the following phases verbatim so they survive context across the implementation session. Include them as a "Post-Implementation Phases" section in the plan:
-
-**Phase 3 instructions**: Execute steps, flip `[ ]` to `[~]` after addressing each criterion in code.
-
-**Phase 4 instructions**: Run comprehensive review using the Spec Implementation Review Checklist at `skills/spec-build/references/review-checklist.md`. Walk every requirement, verify every criterion, audit code quality, check spec consistency. Produce a summary report.
-
-**Phase 5 instructions**: Update spec status, add Implementation Notes, update Key Files, add Discrepancies, set Last Updated date.
-
-### Team Plan (if applicable)
-
-If complexity assessment recommends team spawning, the plan should additionally include:
-- Workstream decomposition with clear boundaries
-- Teammate assignments by specialist type
-- Task dependencies between workstreams
-- Integration points where workstreams converge
-
-Present the plan via `ExitPlanMode` and wait for explicit user approval before proceeding.
-
----
-
-## Phase 3: Implementation
-
-Execute the approved plan step by step. This is where code gets written.
-
-### Execution Rules
-
-1. **Follow the plan order** — implement steps in the sequence approved by the user
-2. **Live spec updates** — after completing work on an acceptance criterion, immediately edit the spec file:
-   - Flip `[ ]` to `[~]` for criteria addressed in code
-   - This is the ONLY spec edit during Phase 3 — no structural changes to the spec
-3. **Track requirement coverage** — mentally track which FR-*/NFR-* requirements have been addressed as you work through the steps
-4. **Note deviations** — if the implementation must deviate from the plan (unexpected constraint, better approach discovered, missing dependency), note the deviation for Phase 4. Do not silently diverge.
-5. **Respect boundaries** — do not implement anything listed in the spec's Out of Scope section
-
-### If Using a Team
-
-If team spawning was approved in Phase 2:
-
-1. Create the team using `TeamCreate`
-2. Create tasks in the team task list mapped to spec requirements
-3. Spawn teammates using the recommended specialist agent types
-4. Assign tasks by domain match
-5. Coordinate integration points as workstreams converge
-6. Collect results and ensure all `[ ]` criteria are flipped to `[~]`
-
-### Progress Tracking
-
-The spec file itself is the progress tracker. At any point during Phase 3:
-- `[ ]` criteria = not yet addressed
-- `[~]` criteria = addressed in code, awaiting verification
-- Count of `[~]` vs total criteria shows implementation progress
-
----
-
-## Phase 4: Comprehensive Review
-
-The most critical phase. Audit everything built against the spec. Use the Spec Implementation Review Checklist at `skills/spec-build/references/review-checklist.md` as the authoritative guide.
-
-### 4A: Requirement Coverage Audit
-
-Walk through every FR-* and NFR-* requirement from the spec:
-
-1. For each requirement: identify the specific files and functions that address it
-2. Verify the implementation matches the EARS-format requirement text
-3. Flag requirements that were missed entirely
-4. Flag requirements only partially addressed
-5. Flag code written outside the spec's scope (scope creep)
-
-### 4B: Acceptance Criteria Verification
-
-For each `[~]` criterion in the spec:
-
-1. Find or write the corresponding test
-2. Run the test and confirm it passes
-3. If the test passes -> upgrade `[~]` to `[x]` in the spec
-4. If the test fails -> note the failure, do not upgrade
-5. For criteria without tests: write the test, run it, then decide
-
-Report any criteria that cannot be verified and explain why.
-
-### 4C: Code Quality Review
-
-Check the implementation against code quality standards:
-
-- Error handling at appropriate boundaries
-- No hardcoded values that should be configurable
-- Function sizes within limits (short, single-purpose)
-- Nesting depth within limits
-- Test coverage for new code paths
-- No regressions in existing tests
-
-### 4D: Spec Consistency Check
-
-Compare implemented behavior against each EARS requirement:
-
-- Does the code actually do what each requirement says?
-- Are there behavioral differences between spec intent and actual implementation?
-- Are Key Files in the spec still accurate? Any new files missing from the list?
-- Are there files created during implementation that should be added?
-
-### 4E: Summary Report
-
-Present a structured summary to the user:
-
-```
-## Implementation Review Summary
-
-**Requirements:** N/M addressed (list any gaps)
-**Acceptance Criteria:** N verified [x], M in progress [~], K not started [ ]
-**Deviations from Plan:** (list any, or "None")
-**Discrepancies Found:** (spec vs reality gaps, or "None")
-**Code Quality Issues:** (list any, or "None")
-
-**Recommendation:** Proceed to Phase 5 / Fix issues first (with specific list)
-```
-
-If issues are found, address them before moving to Phase 5. If issues require user input, present them and wait for direction.
-
----
-
-## Phase 5: Spec Closure
-
-The final phase. Update the spec to reflect what was actually built. This replaces the need for a separate `/spec-update` run.
-
-### Step 1: Update Status
-
-Set `**Status:**` to:
-- `implemented` — if all acceptance criteria are `[x]`
-- `partial` — if any criteria remain `[ ]` or `[~]`
-
-### Step 2: Update Metadata
-
-- Set `**Last Updated:**` to today's date (YYYY-MM-DD)
-- Preserve `**Approval:** user-approved` — never downgrade
-
-### Step 3: Add Implementation Notes
-
-In the `## Implementation Notes` section, document:
-
-- **Deviations from the original spec** — what changed and why
-- **Key design decisions** — choices made during implementation not in the original spec
-- **Trade-offs accepted** — what was sacrificed and the reasoning
-- **Surprising findings** — edge cases, performance characteristics, limitations discovered
-
-Reference file paths, not code. Keep notes concise.
-
-### Step 4: Update Key Files
-
-In `## Key Files`:
-- Add files created during implementation
-- Remove files that no longer exist
-- Update paths that changed
-- Verify every path listed actually exists
-
-### Step 5: Add Discrepancies
-
-In `## Discrepancies`, document any gaps between spec intent and actual build:
-- Requirements that were met differently than specified
-- Behavioral differences from the original EARS requirements
-- Scope adjustments that happened during implementation
-
-If no discrepancies exist, leave the section empty or note "None."
-
-### Step 6: Final Message
-
-Print: "Implementation complete. Spec updated to `[status]`. Run `/spec-check` to verify spec health."
-
----
-
-## Persistence Policy
-
-Complete all five phases. Stop only when:
-- Gate check fails in Phase 1 (spec not approved) — hard stop
-- User explicitly requests stop
-- A genuine blocker requires user input that cannot be resolved
-
-If interrupted mid-phase, resume from the last completed step. Phase 3 progress is tracked via acceptance criteria markers in the spec — `[~]` markers show exactly where implementation left off.
-
-Do not skip phases. Do not combine phases. Each phase exists because it surfaces different types of issues. Phase 4 in particular catches problems that are invisible during Phase 3.
-
----
-
-## Ambiguity Policy
-
-- If `$ARGUMENTS` matches multiple specs, list them and ask the user which to implement.
-- If a spec has no acceptance criteria, warn the user and suggest adding criteria before implementation. Offer to proceed anyway if the user confirms.
-- If Key Files reference paths that don't exist, note this in Phase 1 and proceed — they may be files to create.
-- If the spec has both `[assumed]` and `[user-approved]` requirements, the gate check still fails — all requirements must be `[user-approved]` before implementation begins.
-- If Phase 4 reveals significant gaps, do not silently proceed to Phase 5. Present the gaps and get user direction on whether to fix them first or document them as discrepancies.
-- If the spec is already `implemented`, ask: is this a re-implementation, an update, or an error?
-
----
-
-## Anti-Patterns
-
-- **Skipping the plan**: Jumping from Phase 1 to Phase 3 without a plan leads to unstructured work and missed requirements. Always plan first.
-- **Optimistic verification**: Marking `[~]` as `[x]` without running the actual test. Every `[x]` must be backed by a passing test or confirmed behavior.
-- **Scope creep during implementation**: Building features not in the spec because they "seem useful." Respect Out of Scope boundaries.
-- **Deferring Phase 4**: "I'll review later" means "I won't review." Phase 4 runs immediately after Phase 3.
-- **Silent deviations**: Changing the implementation approach without noting it. Every deviation gets documented in Phase 4/5.
-- **Skipping Phase 5**: The spec-reminder hook will catch this, but it's better to close the loop immediately. Phase 5 is not optional.
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-build/references/review-checklist.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-build/references/review-checklist.md
deleted file mode 100644
index b740383..0000000
--- a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-build/references/review-checklist.md
+++ /dev/null
@@ -1,175 +0,0 @@
-# Spec Implementation Review Checklist
-
-Comprehensive checklist for spec implementation reviews. Used by `/spec-build` Phase 4 and `/spec-review`. Walk through every section methodically. Do not skip sections — each catches different categories of issues.
-
----
-
-## 4A: Requirement Coverage Audit
-
-For each FR-* requirement in the spec:
-
-- [ ] Identify the file(s) and function(s) that implement this requirement
-- [ ] Verify the implementation matches the EARS-format requirement text
-- [ ] Confirm the requirement is fully addressed (not partially)
-- [ ] Note if the requirement was met through a different approach than planned
-
-For each NFR-* requirement in the spec:
-
-- [ ] Identify how the non-functional requirement is enforced (e.g., timeout config, index, validation)
-- [ ] Verify measurable NFRs have been tested or measured (response time, throughput, size limits)
-- [ ] Confirm the NFR is met under expected conditions, not just ideal conditions
-
-Cross-checks:
-
-- [ ] Every FR-* has corresponding code — no requirements were skipped
-- [ ] Every NFR-* has corresponding enforcement — no hand-waving
-- [ ] No code was written that doesn't map to a requirement (scope creep check)
-- [ ] Out of Scope items from the spec were NOT implemented
-
----
-
-## 4B: Acceptance Criteria Verification
-
-For each criterion currently marked `[~]` (implemented, not yet verified):
-
-- [ ] Locate the corresponding test (unit, integration, or manual verification)
-- [ ] If no test exists: write one
-- [ ] Run the test
-- [ ] If test passes: upgrade `[~]` to `[x]` in the spec
-- [ ] If test fails: note the failure, keep as `[~]`, document the issue
-
-Summary checks:
-
-- [ ] Count total criteria vs. verified `[x]` — report the ratio
-- [ ] Any criteria still `[ ]` (not started)? Flag as missed
-- [ ] Any criteria that cannot be tested? Document why and note as discrepancy
-- [ ] Do the tests actually verify the criterion, or just exercise the code path?
-
----
-
-## 4C: Code Quality Review
-
-### Error Handling
-
-- [ ] Errors are caught at appropriate boundaries (not swallowed, not over-caught)
-- [ ] Error messages are informative (include context, not just "error occurred")
-- [ ] External call failures (I/O, network, subprocess) have explicit handling
-- [ ] No bare except/catch-all that hides real errors
-
-### Code Structure
-
-- [ ] Functions are short and single-purpose
-- [ ] Nesting depth is within limits (2-3 for Python, 3-4 for other languages)
-- [ ] No duplicated logic that should be extracted
-- [ ] Names are descriptive (functions, variables, parameters)
-
-### Hardcoded Values
-
-- [ ] No magic numbers without explanation
-- [ ] Configuration values that may change are externalized (not inline)
-- [ ] File paths, URLs, and credentials are not hardcoded
-
-### Test Quality
-
-- [ ] New code has corresponding tests
-- [ ] Tests verify behavior, not implementation details
-- [ ] Tests cover happy path, error cases, and key edge cases
-- [ ] No over-mocking that makes tests trivially pass
-- [ ] Existing tests still pass (no regressions introduced)
-
-### Dependencies
-
-- [ ] New imports/dependencies are necessary (no unused imports)
-- [ ] No circular dependencies introduced
-- [ ] Third-party dependencies are justified (not added for trivial functionality)
-
----
-
-## 4D: Spec Consistency Check
-
-### Requirement-to-Implementation Fidelity
-
-- [ ] Re-read each EARS requirement and compare against the actual implementation
-- [ ] For "When [event], the system shall [action]" — does the code handle that event and perform that action?
-- [ ] For "If [unwanted condition], the system shall [action]" — is the unwanted condition detected and handled?
-- [ ] For ubiquitous requirements ("The system shall...") — is the behavior always active?
-
-### Key Files Accuracy
-
-- [ ] Every file in the spec's Key Files section still exists at that path
-- [ ] New files created during implementation are listed in Key Files
-- [ ] Deleted or moved files have been removed/updated in Key Files
-- [ ] File descriptions in Key Files are still accurate
-
-### Schema and API Consistency
-
-- [ ] If the spec has a Schema/Data Model section, verify referenced files are current
-- [ ] If the spec has API Endpoints, verify routes match the implementation
-- [ ] Any new endpoints or schema changes are reflected in the spec
-
-### Behavioral Alignment
-
-- [ ] Edge cases discovered during implementation are documented
-- [ ] Performance characteristics match NFR expectations
-- [ ] Integration points work as the spec describes
-- [ ] Default values and fallback behaviors match spec intent
-
----
-
-## 4E: Summary Report Template
-
-After completing sections 4A through 4D, compile findings into this format:
-
-```
-## Implementation Review Summary
-
-**Spec:** [feature name] ([spec file path])
-**Date:** YYYY-MM-DD
-
-### Requirement Coverage
-- Functional: N/M addressed
-- Non-Functional: N/M addressed
-- Gaps: [list or "None"]
-
-### Acceptance Criteria
-- [x] Verified: N
-- [~] Implemented, pending verification: N
-- [ ] Not started: N
-- Failures: [list or "None"]
-
-### Code Quality
-- Issues found: [list or "None"]
-- Regressions: [list or "None"]
-
-### Spec Consistency
-- Key Files updates needed: [list or "None"]
-- Discrepancies: [list or "None"]
-
-### Deviations from Plan
-[list or "None"]
-
-### Recommendation
-[ ] Proceed to Phase 5 — all clear
-[ ] Fix issues first: [specific list]
-[ ] Requires user input: [specific questions]
-```
-
----
-
-## When to Fail the Review
-
-The review should recommend "fix issues first" when:
-
-- Any FR-* requirement has no corresponding implementation
-- Any acceptance criterion test fails
-- Existing tests regress (new code broke something)
-- Code was written outside the spec's scope without user approval
-- Critical error handling is missing (crashes on expected error conditions)
-
-The review should recommend "proceed to Phase 5" when:
-
-- All requirements have corresponding implementations
-- All acceptance criteria are `[x]` (or `[~]` with documented reason)
-- No test regressions
-- Code quality is acceptable (no critical issues)
-- Discrepancies are documented, not hidden
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-check/SKILL.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-check/SKILL.md
deleted file mode 100644
index def3afc..0000000
--- a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-check/SKILL.md
+++ /dev/null
@@ -1,104 +0,0 @@
----
-name: spec-check
-description: >-
-  Audits all specifications in a project for health issues including stale
-  status, missing sections, unapproved drafts, and assumed requirements.
-  USE WHEN the user asks to "check spec health", "audit specs", "which
-  specs are stale", "find missing specs", "review spec quality",
-  "run spec-check", "are my specs up to date", or works with .specs/
-  directory maintenance and specification metadata.
-  DO NOT USE for single-spec code review or implementation verification
-  — use spec-review for deep code-level audits against one spec.
-version: 0.2.0
-argument-hint: "[domain or path]"
-context: fork
-agent: explorer
----
-
-# Spec Health Audit
-
-Audit all specifications in the current project and report their health status.
-
-## Workflow
-
-### Step 1: Discover Specs
-
-```
-Glob: .specs/**/*.md
-```
-
-If `.specs/` does not exist, report: "No specification directory found. Use `/spec-new` to create your first spec."
-
-Exclude non-spec files:
-- `MILESTONES.md`
-- `BACKLOG.md`
-- `LESSONS_LEARNED.md`
-- Files in `archive/`
-
-### Step 2: Read Each Spec
-
-For each spec file, extract:
-- **Feature name** from the `# Feature: [Name]` header
-- **Domain** from the `**Domain:**` field
-- **Status** from the `**Status:**` field
-- **Last Updated** from the `**Last Updated:**` field
-- **Approval** from the `**Approval:**` field (default `draft` if missing)
-- **Line count** (wc -l)
-- **Sections present** — check for each required section header
-- **Acceptance criteria** — count total, count checked `[x]`, count in-progress `[~]`
-- **Requirements** — count total, count `[assumed]`, count `[user-approved]`
-- **Discrepancies** — check if section has content
-
-### Step 3: Flag Issues
-
-For each spec, check these conditions:
-
-| Issue | Condition | Severity |
-|-------|-----------|----------|
-| **Unapproved** | Approval is `draft` or missing | High |
-| **Assumed requirements** | Has requirements tagged `[assumed]` | Medium |
-| **Stale** | Status is `planned` but Last Updated is >30 days ago | High |
-| **Incomplete** | Missing required sections (Intent, Acceptance Criteria, Key Files, Requirements, Out of Scope) | High |
-| **Long spec** | Exceeds ~200 lines — consider splitting | Info |
-| **No criteria** | Acceptance Criteria section is empty or has no checkboxes | High |
-| **Open discrepancies** | Discrepancies section has content | Medium |
-| **Missing as-built** | Status is `implemented` but Implementation Notes is empty | Medium |
-| **Stale paths** | Key Files references paths that don't exist | Low |
-| **Draft + implemented** | Status is `implemented` but Approval is `draft` — approval gate was bypassed | High |
-| **Inconsistent approval** | Approval is `user-approved` but spec has `[assumed]` requirements | High |
-| **In-progress criteria** | Has acceptance criteria marked `[~]` (implemented, not yet verified) | Info |
-
-### Step 4: Report
-
-Output a summary table:
-
-```
-## Spec Health Report
-
-| Feature | Domain | Status | Approval | Updated | Lines | Issues |
-|---------|--------|--------|----------|---------|-------|--------|
-| Session History | sessions | implemented | user-approved | 2026-02-08 | 74 | None |
-| Auth Flow | auth | planned | draft | 2026-01-15 | 45 | Unapproved, Stale (26 days) |
-| Settings Page | ui | partial | draft | 2026-02-05 | 210 | Unapproved, Long spec |
-
-## Issues Found
-
-### High Priority
-- **Auth Flow** (`.specs/auth/auth-flow.md`): Status is `planned` but last updated 26 days ago. Either implementation is stalled or the spec needs an as-built update.
-
-### Medium Priority
-- **Settings Page** (`.specs/ui/settings-page.md`): 210 lines — consider splitting into separate specs in the domain folder.
-
-### Suggested Actions
-1. Run `/spec-refine auth-flow` to validate assumptions and get user approval
-2. Run `/spec-review auth-flow` to verify implementation against the spec
-3. Run `/spec-update auth-flow` to update the auth flow spec
-4. Split settings-page.md into sub-specs
-
-### Approval Summary
-- **User-approved:** 1 spec
-- **Draft (needs /spec-refine):** 2 specs
-- **Assumed requirements across all specs:** 8
-```
-
-If no issues are found, report: "All specs healthy. N specs across M domains. All user-approved."
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-init/SKILL.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-init/SKILL.md
deleted file mode 100644
index 85e445b..0000000
--- a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-init/SKILL.md
+++ /dev/null
@@ -1,104 +0,0 @@
----
-name: spec-init
-description: >-
-  Bootstraps the .specs/ directory structure for a project, creating
-  MILESTONES.md and BACKLOG.md from starter templates so spec-new has
-  a home. USE WHEN the user asks to "initialize specs", "set up specs",
-  "bootstrap specs", "start using specs", "create spec directory",
-  "init specs for this project", "set up .specs", or works with first-
-  time specification setup and project onboarding.
-  DO NOT USE if .specs/ already exists — use spec-check to audit health
-  or spec-new to add individual specs.
-version: 0.2.0
----
-
-# Initialize Specification Directory
-
-## Mental Model
-
-Before any spec can be created, the project needs a `.specs/` directory with its supporting files: a MILESTONES tracker (what each milestone delivers) and a BACKLOG (deferred items). This skill bootstraps that structure so `/spec-new` has a home.
-
----
-
-## Workflow
-
-### Step 1: Check Existing State
-
-```
-Glob: .specs/**/*.md
-```
-
-**If `.specs/` already exists:**
-- Report current state: how many specs, domains, whether MILESTONES.md and BACKLOG.md exist
-- Suggest `/spec-check` to audit health instead
-- Do NOT recreate or overwrite anything
-- Stop here
-
-**If `.specs/` does not exist:** proceed to Step 2.
-
-### Step 2: Create Directory Structure
-
-Create the `.specs/` directory at the project root.
-
-### Step 3: Create MILESTONES.md
-
-Write `.specs/MILESTONES.md` using the template from `references/milestones-template.md`.
-
-### Step 4: Create BACKLOG.md
-
-Write `.specs/BACKLOG.md` using the template from `references/backlog-template.md`.
-
-### Step 5: Retroactive Documentation
-
-Ask the user:
-
-> "Are there existing features in this project that should be documented retroactively? I can help create specs for them using `/spec-new`."
-
-If yes, guide the user through creating a spec for each feature using `/spec-new`.
-
-If no, proceed to Step 6.
-
-### Step 6: Report
-
-Summarize what was created:
-
-```
-## Spec Directory Initialized
-
-Created:
-- `.specs/` directory
-- `.specs/MILESTONES.md` — milestone tracker
-- `.specs/BACKLOG.md` — deferred items list
-
-Next steps:
-- Add features to `BACKLOG.md` with priority grades (P0–P3)
-- Pull features into a milestone in `MILESTONES.md` when ready to scope
-- Use `/spec-new <feature-name>` to create a spec (domain is inferred)
-- Use `/spec-refine <feature-name>` to validate before implementation
-- After implementing, use `/spec-review <feature-name>` to verify against the spec
-- Then use `/spec-update` to close the loop
-- Use `/spec-check` to audit spec health at any time
-```
-
----
-
-## Constraints
-
-- **Never overwrite** an existing `.specs/` directory or its contents.
-- Templates are starting points — the user will extend them as the project grows.
-
----
-
-## Ambiguity Policy
-
-- If the user runs this in a workspace root with multiple projects, ask which project to initialize.
-- If `.specs/` exists but is missing MILESTONES.md or BACKLOG.md, offer to create only the missing files.
-
----
-
-## Reference Files
-
-| File | Contents |
-|------|----------|
-| `references/milestones-template.md` | Starter MILESTONES with milestone table format |
-| `references/backlog-template.md` | Starter BACKLOG with item format |
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-init/references/backlog-template.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-init/references/backlog-template.md
deleted file mode 100644
index 9018f4a..0000000
--- a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-init/references/backlog-template.md
+++ /dev/null
@@ -1,23 +0,0 @@
-# Backlog
-
-Priority-graded feature and infrastructure backlog. Items are pulled into milestones when ready to scope and spec. See `MILESTONES.md` for the milestone workflow.
-
-## P0 — High Priority
-
-- [ ] [Feature] — [Description]
-
-## P1 — Important
-
-- [ ] [Feature] — [Description]
-
-## P2 — Desired
-
-- [ ] [Feature] — [Description]
-
-## P3 — Nice to Have
-
-- [ ] [Feature] — [Description]
-
-## Infrastructure & CI
-
-- [ ] [Item] — [Description]
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-init/references/milestones-template.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-init/references/milestones-template.md
deleted file mode 100644
index bb7fee6..0000000
--- a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-init/references/milestones-template.md
+++ /dev/null
@@ -1,32 +0,0 @@
-# Milestones
-
-> Features are organized by domain in `.specs/`. Milestones group features
-> into deliverable increments. See `BACKLOG.md` for the feature backlog.
-
-## How Milestones Work
-
-1. **Backlog** — All desired features live in `BACKLOG.md`, graded by priority.
-2. **Milestone scoping** — When ready to plan a deliverable, pull features from the backlog.
-3. **Spec first** — Each feature gets a spec (via `/spec-new`) before implementation begins.
-4. **Ship** — A milestone is done when all its specs are implemented and verified.
-
-Only the **current milestone** is defined in detail. Everything else is backlog.
-
-## Released
-
-_None yet._
-
-## Current
-
-### [Milestone Name]
-
-- [ ] `domain/feature-name.md` — [Brief description]
-- [ ] `domain/feature-name.md` — [Brief description]
-
-## Next
-
-> Scoped from `BACKLOG.md` when the current milestone is complete.
-
-## Out of Scope
-
-- [Items explicitly not planned]
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-init/references/roadmap-template.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-init/references/roadmap-template.md
deleted file mode 100644
index fce785f..0000000
--- a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-init/references/roadmap-template.md
+++ /dev/null
@@ -1,33 +0,0 @@
-# Roadmap
-
-> Features live in the priority-graded backlog until pulled into a version.
-> Versions are scoped and spec'd when ready to build — not pre-assigned.
-> See `BACKLOG.md` for the feature backlog.
-
-## How Versioning Works
-
-1. **Backlog** — All desired features live in `BACKLOG.md`, graded by priority.
-2. **Version scoping** — When ready to start a new version, pull features from the backlog.
-3. **Spec first** — Each feature in a version gets a spec before implementation begins.
-4. **Ship** — Version is done when all its specs are implemented and verified.
-
-Only the **next version** is defined in detail. Everything else is backlog.
-
-## Released
-
-_None yet._
-
-## Current
-
-### v0.1.0 — [Name] 🔧
-
-- [ ] [Feature pulled from backlog]
-- [ ] [Feature pulled from backlog]
-
-## Next
-
-> Scoped from `BACKLOG.md` when current version is complete.
-
-## Out of Scope
-
-- [Items explicitly not planned]
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-new/SKILL.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-new/SKILL.md
deleted file mode 100644
index 18bd574..0000000
--- a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-new/SKILL.md
+++ /dev/null
@@ -1,113 +0,0 @@
----
-name: spec-new
-description: >-
-  Creates a new feature specification from the standard EARS template
-  with domain inference, acceptance criteria, and requirement tagging.
-  USE WHEN the user asks to "create a spec", "new feature spec", "write
-  a spec for", "spec this feature", "start a new spec", "plan a feature",
-  "add a spec", or works with .specs/ directory and feature planning.
-  DO NOT USE for updating existing specs after implementation — use
-  spec-update instead. Not for refining draft specs — use spec-refine.
-version: 0.2.0
-argument-hint: "[feature-name] [domain]"
----
-
-# Create New Feature Specification
-
-## Mental Model
-
-A specification is a contract between the person requesting a feature and the person building it. Writing the spec BEFORE implementation forces you to think through edge cases, acceptance criteria, and scope boundaries while changes are cheap — before any code exists.
-
-Every project uses `.specs/` as the specification directory. Specs are domain-organized, independently loadable, and should aim for ~200 lines.
-
----
-
-## Workflow
-
-### Step 1: Parse Arguments
-
-Extract the feature name from `$ARGUMENTS`:
-- **Feature name**: kebab-case identifier (e.g., `session-history`, `auth-flow`)
-
-If the feature name is missing, ask the user what they want to spec.
-
-**Note:** Features should be pulled from the project's backlog (`BACKLOG.md`) into a milestone before creating a spec. If the feature isn't in the backlog yet, add it first, then assign it to a milestone.
-
-### Step 2: Determine Domain and File Path
-
-Analyze the feature name and description to infer an appropriate domain folder:
-- Look at existing domain folders in `.specs/` for a natural fit
-- Consider the feature's area: `auth`, `search`, `ui`, `api`, `onboarding`, etc.
-- Present the inferred domain to the user for confirmation or override
-
-The file path is always: `.specs/{domain}/{feature-name}.md`
-
-If `.specs/` does not exist at the project root, create it.
-
-If `.specs/{domain}/` does not exist, create it.
-
-### Step 3: Create the Spec File
-
-Write the file using the standard template from `references/template.md`.
-
-Pre-fill:
-- **Domain**: from the inferred/confirmed domain
-- **Status**: `planned`
-- **Last Updated**: today's date (YYYY-MM-DD)
-- **Approval**: `draft`
-- **Feature name**: from arguments
-
-Leave all other sections as placeholders for the user to fill.
-
-### Step 4: Guide Content Creation
-
-After creating the file, guide the user through filling it out:
-
-1. **Intent** — What problem does this solve? Who has this problem? (2-3 sentences)
-2. **Acceptance Criteria** — Use the `specification-writing` skill for EARS format and Given/When/Then patterns
-3. **Key Files** — Glob the codebase to identify existing files relevant to this feature
-4. **Schema / Data Model** — Reference file paths only, never inline schemas
-5. **API Endpoints** — Table format: Method | Path | Description
-6. **Requirements** — EARS format, numbered FR-1, FR-2, NFR-1, etc. Tag all requirements `[assumed]` at creation time — they become `[user-approved]` only after explicit user validation via `/spec-refine`.
-7. **Dependencies** — What this feature depends on
-8. **Out of Scope** — Explicit non-goals to prevent scope creep
-9. **Resolved Questions** — Leave empty at creation; populated by `/spec-refine`
-
-### Step 5: Validate
-
-Before finishing:
-- [ ] If the file exceeds ~200 lines, consider splitting into separate specs in the domain folder
-- [ ] No source code, SQL, or type definitions reproduced inline
-- [ ] Status is `planned` and Approval is `draft`
-- [ ] All required sections present (even if some are "N/A" or "TBD")
-- [ ] Acceptance criteria are testable
-- [ ] All requirements are tagged `[assumed]`
-
-After validation, inform the user: **"This spec MUST go through `/spec-refine` before implementation begins.** All requirements are marked `[assumed]` until explicitly validated."
-
-The `/spec-refine` skill walks through every `[assumed]` requirement with the user, validates tech decisions and scope boundaries, and upgrades approved items to `[user-approved]`. The spec's `**Approval:**` becomes `user-approved` only after all requirements pass review.
-
----
-
-## Sizing Guidelines
-
-- **Aim for ~200 lines per spec.** If a feature needs more, consider splitting into separate specs in the domain folder.
-- **Reference, don't reproduce.** Write `see src/engine/db/migrations/002.sql lines 48-70` — never paste the SQL.
-- **Independently loadable.** Each spec file must be useful without loading any other file.
-- **EARS format for requirements.** Use the `specification-writing` skill for templates and examples.
-
----
-
-## Ambiguity Policy
-
-- If the user doesn't specify a domain, infer one from the feature name and existing `.specs/` structure, then confirm with the user.
-- If the feature scope is unclear, write a minimal spec with `## Open Questions` listing what needs clarification.
-- If a spec already exists for this feature, inform the user and suggest `/spec-update` instead.
-
----
-
-## Reference Files
-
-| File | Contents |
-|------|----------|
-| `references/template.md` | Full standard template with field descriptions and examples |
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-new/references/template.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-new/references/template.md
deleted file mode 100644
index 877761b..0000000
--- a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-new/references/template.md
+++ /dev/null
@@ -1,139 +0,0 @@
-# Specification Template
-
-Standard template for all feature specifications. Copy this structure when creating a new spec.
-
----
-
-## Template
-
-```markdown
-# Feature: [Name]
-
-**Domain:** [domain-name]
-**Status:** planned
-**Last Updated:** YYYY-MM-DD
-**Approval:** draft
-
-## Intent
-
-[What problem does this solve? Who has this problem? What's the cost of not solving it? 2-3 sentences.]
-
-## Acceptance Criteria
-
-[Testable criteria. Use Given/When/Then for complex flows, checklists for simple features, or tables for business rules. Every criterion must be verifiable.]
-
-[Markers: `[ ]` = not started, `[~]` = implemented but not yet verified, `[x]` = verified (tests pass). During `/spec-build`, criteria progress from `[ ]` to `[~]` during implementation, then to `[x]` after review.]
-
-- [ ] [Criterion 1]
-- [ ] [Criterion 2]
-- [ ] [Criterion 3]
-
-## Key Files
-
-[File paths most relevant to implementation — paths an implementer should read first.]
-
-**Backend:**
-- `src/path/to/file.py` — [brief description]
-
-**Frontend:**
-- `src/web/path/to/component.svelte` — [brief description]
-
-**Tests:**
-- `tests/path/to/test_file.py` — [brief description]
-
-## Schema / Data Model
-
-[Reference migration files and model files by path. Describe what changes — do NOT paste DDL, Pydantic models, or TypeScript interfaces.]
-
-- New table: `table_name` — see `src/db/migrations/NNN.sql`
-- Modified: `existing_table` — added `column_name` column
-
-## API Endpoints
-
-| Method | Path | Description |
-|--------|------|-------------|
-| GET | `/api/resource` | List resources with pagination |
-| POST | `/api/resource` | Create a new resource |
-
-## Requirements
-
-### Functional Requirements
-
-- FR-1 [assumed]: [EARS format requirement — see specification-writing skill for templates]
-- FR-2 [assumed]: When [event], the system shall [action].
-- FR-3 [assumed]: If [unwanted condition], then the system shall [action].
-
-### Non-Functional Requirements
-
-- NFR-1 [assumed]: The system shall respond to [endpoint] within [N]ms at the [percentile] percentile.
-- NFR-2 [assumed]: [Security, accessibility, scalability requirement]
-
-## Dependencies
-
-- [External system, library, or feature this depends on]
-- [Blocked by: feature X must ship first]
-
-## Out of Scope
-
-- [Explicit non-goal 1 — prevents scope creep]
-- [Explicit non-goal 2]
-
-## Resolved Questions
-
-[Decisions explicitly approved by the user via `/spec-refine`. Each entry: decision topic, chosen option, date, brief rationale.]
-
-## Implementation Notes
-
-[Post-implementation only. Leave empty in planned specs. After building, document what actually shipped vs. what was planned.]
-
-## Discrepancies
-
-[Post-implementation only. Document gaps between spec intent and actual build. Prevents next session from re-planning decided work.]
-```
-
----
-
-## Field Descriptions
-
-| Section | Required | When to Fill |
-|---------|----------|-------------|
-| Intent | Always | At creation |
-| Acceptance Criteria | Always | At creation |
-| Key Files | Always | At creation (update post-implementation) |
-| Schema / Data Model | If applicable | At creation |
-| API Endpoints | If applicable | At creation |
-| Requirements | Always | At creation |
-| Dependencies | If applicable | At creation |
-| Out of Scope | Always | At creation |
-| Implementation Notes | Post-implementation | After building |
-| Discrepancies | Post-implementation | After building |
-
-## Status Values
-
-| Status | Meaning |
-|--------|---------|
-| `planned` | Spec written, implementation not started |
-| `partial` | Some acceptance criteria implemented, work ongoing |
-| `implemented` | All acceptance criteria met, as-built notes complete |
-
-## Approval Workflow
-
-| Tag | Meaning |
-|-----|---------|
-| `[assumed]` | Requirement was drafted by AI or inferred — treated as a hypothesis |
-| `[user-approved]` | Requirement was explicitly reviewed and approved by the user via `/spec-refine` |
-
-| Approval Status | Meaning |
-|-----------------|---------|
-| `draft` | Spec has unvalidated assumptions — NOT approved for implementation |
-| `user-approved` | All requirements are `[user-approved]` — ready for implementation |
-
-## Acceptance Criteria Markers
-
-| Marker | Meaning |
-|--------|---------|
-| `[ ]` | Not started |
-| `[~]` | Implemented, not yet verified — code written, tests not confirmed |
-| `[x]` | Verified — tests pass, behavior confirmed |
-
-**Workflow:** `/spec-new` creates → `/spec-refine` validates → `/spec-build` implements + closes the loop (or implement manually → `/spec-review` verifies → `/spec-update` closes the loop).
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-refine/SKILL.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-refine/SKILL.md
deleted file mode 100644
index 7c7caa1..0000000
--- a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-refine/SKILL.md
+++ /dev/null
@@ -1,197 +0,0 @@
----
-name: spec-refine
-description: >-
-  Guides iterative user-driven spec refinement through structured
-  questioning rounds that validate assumptions, tech decisions, and scope
-  boundaries. USE WHEN the user asks to "refine the spec", "review spec
-  assumptions", "validate spec decisions", "approve the spec", "walk me
-  through the spec", "check spec for assumptions", "iterate on the spec",
-  or works with [assumed] requirements needing user-approved upgrade.
-  DO NOT USE for creating new specs (use spec-new) or for post-
-  implementation updates (use spec-update).
-version: 0.2.0
-argument-hint: "[spec-path]"
----
-
-# Iterative Spec Refinement
-
-## Mental Model
-
-A draft spec is a hypothesis, not a commitment. Every requirement, tech decision, and scope boundary in a draft is an assumption until the user explicitly validates it. This skill systematically mines a spec for unvalidated assumptions, presents each to the user for review via structured questions, and iterates until every decision has explicit user approval.
-
-No implementation begins on a spec with `**Approval:** draft`. This skill is the gate.
-
----
-
-## Workflow
-
-### Step 1: Load & Inventory
-
-Find the target spec:
-- If `$ARGUMENTS` contains a path or feature name, use it directly
-- Otherwise, glob `.specs/**/*.md` and ask the user which spec to refine
-
-Read the full spec. Catalog:
-- Every section and whether it has content
-- The `**Approval:**` status (should be `draft`)
-- All requirements and their current markers (`[assumed]` vs `[user-approved]`)
-- The `## Open Questions` section (if any)
-- The `## Resolved Questions` section (if any)
-
-If the spec is already `**Approval:** user-approved` and all requirements are `[user-approved]`, report this and ask if the user wants to re-review anyway.
-
-### Step 2: Assumption Mining
-
-Scan each section systematically for unvalidated decisions. Look for:
-
-| Category | What to look for |
-|----------|-----------------|
-| **Tech decisions** | Database choices, auth mechanisms, API formats, libraries, protocols |
-| **Scope boundaries** | What's included/excluded without stated rationale |
-| **Performance targets** | Numbers (response times, limits, thresholds) that were assumed |
-| **Architecture choices** | Where logic lives, service boundaries, data flow patterns |
-| **Behavioral defaults** | Error handling, retry logic, fallback behavior, timeout values |
-| **Unstated dependencies** | Systems, services, or libraries the spec assumes exist |
-| **Security assumptions** | Auth requirements, data sensitivity, access control patterns |
-
-For each assumption found, prepare a question with 2-4 alternatives including the current assumption.
-
-Present findings via `AskUserQuestion` in rounds of 1-4 questions. Group related assumptions together. Example:
-
-```
-Question: "Which authentication mechanism should this feature use?"
-Options:
-- JWT with refresh tokens (current assumption)
-- Session cookies with httpOnly flag
-- OAuth2 with external provider
-```
-
-Record each answer. After the user responds, check: did any answer reveal new assumptions or contradictions? If yes, add follow-up questions to the queue.
-
-### Step 3: Requirement Validation
-
-Walk through every requirement tagged `[assumed]`:
-
-1. **Read the requirement** aloud to the user (via the question text)
-2. **Assess** — is it specific? testable? complete?
-3. **Present via AskUserQuestion** with options:
-   - Approve as-is
-   - Needs revision (user provides direction via "Other")
-   - Remove (not needed)
-   - Defer to Open Questions (not decidable yet)
-
-Process requirements in batches of 1-4 per question round. Prioritize:
-- Requirements with the most ambiguity first
-- Requirements that other requirements depend on
-- Requirements involving tech decisions or external systems
-
-For approved requirements, update the marker from `[assumed]` to `[user-approved]`.
-For revised requirements, rewrite per user direction and mark `[user-approved]`.
-For removed requirements, delete them.
-For deferred requirements, move to `## Open Questions`.
-
-### Step 4: Acceptance Criteria Review
-
-For each acceptance criterion:
-1. Is it measurable and testable?
-2. Does it map to a specific requirement?
-3. Are there requirements without corresponding criteria?
-
-Present gaps to the user:
-- Missing criteria for existing requirements
-- Criteria that don't map to any requirement
-- Criteria with vague or unmeasurable outcomes
-
-Get approval on each criterion or batch of related criteria.
-
-### Step 5: Scope & Dependency Audit
-
-Review the spec from four perspectives:
-
-**User perspective:**
-- Does the feature solve the stated problem?
-- Are there user needs not addressed?
-- Is the scope too broad or too narrow?
-
-**Developer perspective:**
-- Is this implementable with the current architecture?
-- Are the key files accurate?
-- Are there missing technical constraints?
-
-**Security perspective:**
-- Are there data sensitivity issues?
-- Is authentication/authorization addressed?
-- Are there input validation gaps?
-
-**Operations perspective:**
-- Deployment considerations?
-- Monitoring and observability needs?
-- Rollback strategy needed?
-
-Surface any missing items via `AskUserQuestion`. Get explicit decisions on scope boundaries and dependency completeness.
-
-### Step 6: Final Approval
-
-1. Present a summary of all changes made during refinement:
-   - Assumptions resolved (count)
-   - Requirements approved/revised/removed
-   - New criteria added
-   - Scope changes
-
-2. Ask for final approval via `AskUserQuestion`:
-   - "Approve spec — all decisions validated, ready for implementation"
-   - "More refinement needed — specific concerns remain"
-
-3. On approval:
-   - Set `**Approval:** user-approved`
-   - Update `**Last Updated:**` to today
-   - Verify all requirements are tagged `[user-approved]`
-   - Populate `## Resolved Questions` with the decision trail from this session
-
-4. On "more refinement needed":
-   - Ask what concerns remain
-   - Loop back to the relevant phase
-
----
-
-## Convergence Rules
-
-- After each phase, check: did answers from this phase raise new questions? If yes, run another questioning round before advancing.
-- The skill does NOT terminate until ALL of:
-  - Every `[assumed]` requirement is resolved (approved, revised, removed, or deferred)
-  - All acceptance criteria are reviewed
-  - The user gives explicit final approval
-- If the user wants to stop early, leave `**Approval:** draft` and note remaining items in `## Open Questions`.
-
----
-
-## Resolved Questions Format
-
-Each resolved question follows this format:
-
-```markdown
-1. **[Decision topic]** — [Chosen option] (user-approved, YYYY-MM-DD)
-   - Options considered: [list]
-   - Rationale: [brief user reasoning or context]
-```
-
-Keep entries concise — decision + options + rationale in 2-3 lines each.
-
----
-
-## Ambiguity Policy
-
-- If the spec has no `**Approval:**` field, treat it as `draft` and add the field.
-- If requirements lack `[assumed]`/`[user-approved]` tags, treat all as `[assumed]`.
-- If the user says "approve everything" without reviewing individual items, warn that blanket approval defeats the purpose — offer to fast-track by presenting summaries of each batch.
-- If the spec is very short (< 30 lines), the full 6-phase process may be unnecessary. Adapt: merge phases 2-4 into a single review pass. Still require explicit final approval.
-- If the user provides a feature name that matches multiple specs, list them and ask which to refine.
-
----
-
-## Anti-Patterns
-
-- **Rubber-stamping**: Presenting assumptions and immediately suggesting "approve all." Every assumption gets its own question with real alternatives.
-- **Leading questions**: "Should we use JWT as planned?" is leading. Present alternatives neutrally: "Which auth mechanism should this feature use? Options: JWT, sessions, OAuth2."
-- **Skipping phases**: Every phase surfaces different types of assumptions. Don't skip phases even if earlier phases had few findings.
-- **Silent upgrades**: Never change `[assumed]` to `[user-approved]` without presenting the item to the user first.
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-review/SKILL.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-review/SKILL.md
deleted file mode 100644
index 1ddbf89..0000000
--- a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-review/SKILL.md
+++ /dev/null
@@ -1,233 +0,0 @@
----
-name: spec-review
-description: >-
-  Performs a standalone deep implementation review by reading code and
-  verifying full adherence to a specification's requirements and acceptance
-  criteria. USE WHEN the user asks to "review the spec", "verify
-  implementation", "does code match spec", "audit implementation",
-  "check spec adherence", "run spec-review", "regression check", or
-  works with post-implementation verification and pre-release audits.
-  DO NOT USE for batch metadata audits across all specs (use spec-check)
-  or for updating spec status after review (use spec-update).
-version: 0.2.0
-argument-hint: "[spec-path]"
----
-
-# Spec Implementation Review
-
-## Mental Model
-
-A spec is a contract — but contracts only matter if someone verifies them. `/spec-review` is the verification step: given a spec and the code that claims to implement it, does the code actually do what the spec says?
-
-This is a standalone, single-spec, deep implementation review. Unlike `/spec-check` (which audits metadata health across all specs without reading code) and unlike `/spec-build` Phase 4 (which is locked inside the build workflow), `/spec-review` can be invoked independently at any time after implementation exists.
-
-Use cases:
-
-- **Manual implementation** — you built a feature without `/spec-build` and want to verify the work before running `/spec-update`
-- **Post-change regression check** — re-verify after modifying an already-implemented feature
-- **Pre-release audit** — confirm a feature still matches its spec before shipping
-- **Onboarding verification** — check if what's in the code matches what the spec says
-
-```
-Lifecycle positioning:
-
-/spec-new → /spec-refine → implement (manually or via /spec-build) → /spec-review → /spec-update
-
-Or with /spec-build (which has its own Phase 4):
-/spec-new → /spec-refine → /spec-build (includes review) → done
-
-/spec-review is independent — usable at any time after implementation exists.
-```
-
----
-
-## Relationship to Other Skills
-
-| Skill | What it does | How `/spec-review` differs |
-|-------|-------------|---------------------------|
-| `/spec-check` | Batch metadata audit (all specs, no code reading) | Single-spec deep code audit |
-| `/spec-build` Phase 4 | Same depth, but embedded in the build workflow | Standalone, invokable independently |
-| `/spec-update` | Updates spec metadata after implementation | `/spec-review` audits first, then recommends `/spec-update` |
-
----
-
-## Spec Edits During Review
-
-`/spec-review` makes limited spec edits — just enough to record what it verified:
-
-- Upgrade `[ ]` or `[~]` → `[x]` for criteria verified by passing tests
-- Downgrade `[x]` → `[ ]` if a previously-verified criterion now fails (regression)
-- Add entries to `## Discrepancies` for gaps found
-- Update `## Key Files` if paths are stale (files moved/deleted/added)
-- Update `**Last Updated:**` date
-
-It does NOT change `**Status:**` or add `## Implementation Notes` — that's `/spec-update`'s job. Clear boundary: `/spec-review` verifies and records findings; `/spec-update` closes the loop.
-
----
-
-## Workflow
-
-### Step 1: Discovery
-
-**Find the spec.** Match `$ARGUMENTS` (feature name or path) against:
-
-```
-Glob: .specs/**/*.md
-```
-
-If ambiguous, list matching specs and ask which one to review.
-
-**Read the full spec.** Extract:
-
-- All FR-* and NFR-* requirements with their EARS-format text
-- All acceptance criteria with current markers (`[ ]`, `[~]`, `[x]`)
-- Key Files — every file path listed
-- Out of Scope — boundaries to respect
-- Discrepancies — any existing entries
-
-**Gate check.** `/spec-review` works on any spec with implementation to review:
-
-| Approval | Status | Action |
-|----------|--------|--------|
-| `user-approved` | `planned` | Proceed (reviewing work done against approved spec) |
-| `user-approved` | `partial` or `implemented` | Proceed (re-reviewing) |
-| `draft` | any | **Warn**: "This spec is `draft`. Requirements may not be validated. Consider running `/spec-refine` first. Proceed anyway?" |
-
-Unlike `/spec-build` which hard-blocks on `draft` (because it's about to write code), `/spec-review` is read-heavy — reviewing existing code against a draft spec is still useful, even if the spec itself isn't finalized.
-
-**Read every Key File.** Read all files listed in the spec's `## Key Files` section. These are the files the spec author identified as implementing the feature. Understanding them is prerequisite to the audit.
-
----
-
-### Step 2: Requirement Coverage Audit
-
-Walk every FR-* and NFR-* requirement from the spec. Use the Spec Implementation Review Checklist at `spec-build/references/review-checklist.md` sections 4A and 4D as the authoritative guide.
-
-**For each FR-* requirement:**
-
-1. Identify the file(s) and function(s) that implement it
-2. Verify the implementation matches the EARS-format requirement text
-3. Confirm the requirement is fully addressed (not partially)
-4. Note if the requirement was met through a different approach than planned
-
-**For each NFR-* requirement:**
-
-1. Identify how the non-functional requirement is enforced
-2. Verify measurable NFRs have been tested or measured
-3. Confirm the NFR is met under expected conditions, not just ideal conditions
-
-**Cross-checks:**
-
-- Every FR-* has corresponding code — no requirements were skipped
-- Every NFR-* has corresponding enforcement — no hand-waving
-- No code was written that doesn't map to a requirement (scope creep check)
-- Out of Scope items from the spec were NOT implemented
-
----
-
-### Step 3: Acceptance Criteria Verification
-
-For each acceptance criterion, locate or write the corresponding test. Use the Spec Implementation Review Checklist at `spec-build/references/review-checklist.md` sections 4B and 4C as the authoritative guide.
-
-**For each criterion:**
-
-1. Locate the corresponding test (unit, integration, or manual verification)
-2. If no test exists: **write one** — verification requires evidence, "no test exists" is not a valid review outcome
-3. Run the test
-4. If test passes → upgrade marker to `[x]` in the spec
-5. If test fails → note the failure, set marker to `[ ]`, document the issue
-
-**Summary checks:**
-
-- Count total criteria vs. verified `[x]` — report the ratio
-- Flag any criteria still `[ ]` (not started or regressed)
-- Flag any criteria that cannot be tested — document why and note as discrepancy
-- Verify tests actually test the criterion, not just exercise the code path
-
-**Code quality spot-check** per checklist section 4C:
-
-- Error handling at appropriate boundaries
-- No hardcoded values that should be configurable
-- Functions short and single-purpose
-- Nesting depth within limits
-- No regressions in existing tests
-
----
-
-### Step 4: Report & Spec Updates
-
-#### Summary Report
-
-Present a structured report:
-
-```
-## Spec Implementation Review
-
-**Spec:** [feature name] ([spec file path])
-**Date:** YYYY-MM-DD
-**Reviewer:** /spec-review
-
-### Requirement Coverage
-- Functional: N/M addressed
-- Non-Functional: N/M addressed
-- Gaps: [list or "None"]
-
-### Acceptance Criteria
-- [x] Verified: N
-- [~] Implemented, pending: N
-- [ ] Not started / regressed: N
-- Failures: [list or "None"]
-
-### Code Quality
-- Issues found: [list or "None"]
-- Regressions: [list or "None"]
-
-### Spec Consistency
-- Key Files updates needed: [list or "None"]
-- Discrepancies: [list or "None"]
-
-### Recommendation
-[ ] All clear — run `/spec-update` to close the loop
-[ ] Fix issues first: [specific list]
-[ ] Requires user input: [specific questions]
-```
-
-#### Spec Edits
-
-Apply limited edits to the spec file:
-
-1. **Acceptance criteria markers** — update based on test results:
-   - Passed tests: upgrade `[ ]` or `[~]` → `[x]`
-   - Failed tests: downgrade `[x]` → `[ ]` (regression), keep `[ ]` or `[~]` as-is
-2. **Discrepancies** — add entries for any gaps found between spec and implementation
-3. **Key Files** — update paths if files moved, were deleted, or new files were created
-4. **Last Updated** — set to today's date
-
-Do NOT modify `**Status:**` or `## Implementation Notes` — those are `/spec-update`'s responsibility.
-
-#### Next Action
-
-Based on the review outcome, recommend:
-
-- **All clear**: "Run `/spec-update` to close the loop and mark the spec as implemented."
-- **Issues found**: "Fix the issues listed above, then re-run `/spec-review` to verify."
-- **User input needed**: Present specific questions and wait for direction.
-
----
-
-## Ambiguity Policy
-
-- If `$ARGUMENTS` matches multiple specs, list them and ask which to review.
-- If a spec has no acceptance criteria, warn and offer to review requirements only.
-- If Key Files reference paths that don't exist, flag them as stale in the report and update the spec's Key Files section.
-- If the spec has no requirements section, warn that there's nothing to audit against and suggest running `/spec-new` or `/spec-update` to add requirements.
-- If all criteria are already `[x]`, still run the full review — regressions happen.
-
----
-
-## Anti-Patterns
-
-- **Skipping test verification**: Marking criteria as `[x]` without running actual tests. Every `[x]` must be backed by a passing test or confirmed behavior.
-- **Reviewing without reading code**: The review must read the implementation files, not just check metadata. That's what `/spec-check` is for.
-- **Modifying implementation**: `/spec-review` is a review, not a fix. Report issues; don't fix them. The user decides what to do next.
-- **Changing spec status**: `/spec-review` records findings. `/spec-update` changes status. Respect the boundary.
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-update/SKILL.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-update/SKILL.md
deleted file mode 100644
index 884a254..0000000
--- a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec-update/SKILL.md
+++ /dev/null
@@ -1,151 +0,0 @@
----
-name: spec-update
-description: >-
-  Performs the as-built spec update after implementation, closing the loop
-  between what was planned and what was built by setting status, checking
-  off acceptance criteria, and adding implementation notes. USE WHEN the
-  user asks to "update the spec", "mark spec as implemented", "as-built
-  update", "finish the spec", "close the spec", "update spec status",
-  "sync spec with code", or works with post-implementation documentation.
-  DO NOT USE for verifying code against a spec (use spec-review first)
-  or for creating new specs (use spec-new).
-version: 0.2.0
-argument-hint: "[spec-path]"
----
-
-# As-Built Spec Update
-
-## Mental Model
-
-Specs that say "planned" after code ships cause the next AI session to re-plan already-done work. The as-built update is the final step of every implementation — it closes the loop between what was planned and what was built.
-
-This is not optional. Every implementation ends with a spec update.
-
----
-
-## Approval Gate
-
-Before performing an as-built update, check the spec's `**Approval:**` status:
-- If `user-approved` → proceed with the update
-- If `draft` → warn the user: "This spec is still `draft`. It should have gone through `/spec-refine` before implementation. Run `/spec-refine` now to validate, or proceed with the as-built update if the user confirms."
-
-This is a warning, not a blocker — the user decides whether to refine first or update as-is.
-
-For manually-implemented features (not using `/spec-build`), consider running `/spec-review` first to verify implementation adherence before updating the spec.
-
----
-
-## The 6-Step Workflow
-
-### Step 1: Find the Spec
-
-```
-Glob: .specs/**/*.md
-```
-
-Search for the feature name in spec file names and content. If the user provides a spec path or feature name as `$ARGUMENTS`, use that directly.
-
-If no spec exists:
-- For substantial changes: create one using `/spec-new`
-- For trivial changes (bug fixes, config): note "spec not needed" and stop
-
-### Step 2: Set Status
-
-Update the `**Status:**` field:
-- `implemented` — all acceptance criteria are met
-- `partial` — some criteria met, work ongoing or deferred
-
-Never leave status as `planned` after implementation work has been done.
-
-### Step 3: Check Off Acceptance Criteria
-
-Review each acceptance criterion in the spec:
-- Mark as `[x]` if the criterion is met and verified (tests pass, behavior confirmed)
-- Leave as `[ ]` if not yet implemented
-- Add a note next to deferred criteria explaining why
-- If a criterion is marked `[~]` (implemented but not yet verified from a `/spec-build` run), treat it as `[ ]` — verify it now and upgrade to `[x]` if confirmed, or leave as `[ ]` if unverifiable
-
-If criteria were met through different means than originally planned, note the deviation.
-
-### Step 4: Add Implementation Notes
-
-In the `## Implementation Notes` section, document:
-- **Deviations from the original spec** — what changed and why
-- **Key design decisions made during implementation** — choices that weren't in the spec
-- **Surprising findings** — edge cases discovered, performance characteristics, limitations
-- **Trade-offs accepted** — what was sacrificed and why
-
-Keep notes concise. Reference file paths, not code.
-
-### Step 5: Update File Paths
-
-In the `## Key Files` section:
-- Add files that were created during implementation
-- Remove files that no longer exist
-- Update paths that moved
-
-Verify paths exist before listing them. Use absolute project-relative paths.
-
-### Step 6: Update Metadata
-
-- Set `**Last Updated:**` to today's date (YYYY-MM-DD)
-- Verify `**Domain:**` is correct
-- Preserve the `**Approval:**` status — do NOT downgrade `user-approved` to `draft`
-- If the as-built update introduces new decisions not in the original spec, add them to `## Resolved Questions` if the user confirmed them, or `## Open Questions` if they were assumed during implementation
-
----
-
-## Handling Edge Cases
-
-### Spec Already "Implemented"
-
-If the spec is already marked `implemented` and new changes affect the feature:
-1. Check if acceptance criteria still hold
-2. Update Implementation Notes with the new changes
-3. Add any new Discrepancies between spec and current code
-4. Update Last Updated date
-
-### No Spec Exists
-
-If there is no spec for the feature:
-1. Ask: is this a substantial feature or a minor fix?
-2. For substantial features: create one with `/spec-new`, then update it
-3. For minor fixes: no spec needed — report this and stop
-
-### Spec Has Unresolved Discrepancies
-
-If the `## Discrepancies` section has open items:
-1. Check if the current implementation resolves any of them
-2. Remove resolved discrepancies
-3. Add any new discrepancies discovered
-
----
-
-## Validation Checklist
-
-Before finishing the update:
-- [ ] Status reflects the actual implementation state
-- [ ] All implemented acceptance criteria are checked off
-- [ ] Implementation Notes document deviations from original spec
-- [ ] File paths in Key Files are accurate and verified
-- [ ] Last Updated date is today
-- [ ] `**Domain:**` is correct for the spec's location
-- [ ] `**Approval:**` status is preserved (not downgraded)
-- [ ] New implementation decisions are tracked in Resolved Questions or Open Questions
-- [ ] If the spec has grown past ~200 lines, note it and suggest splitting into separate specs in the domain folder
-- [ ] If `**Approval:**` is still `draft`, user was warned and confirmed proceeding
-- [ ] No source code was pasted inline (references only)
-
----
-
-## Ambiguity Policy
-
-- If unclear which spec to update, list all candidates and ask the user.
-- If the implementation deviated significantly from the spec, document it
-  honestly in Implementation Notes — do not retroactively change the original
-  requirements to match what was built.
-- If acceptance criteria are ambiguous about whether they're met, note the
-  ambiguity in Discrepancies rather than checking them off optimistically.
-- A spec-reminder advisory hook fires at Stop when code was modified but
-  specs weren't updated. If you see "[Spec Reminder]" in context, that's
-  the trigger — use this skill to resolve it.
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/SKILL.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/SKILL.md
new file mode 100644
index 0000000..75d2a4d
--- /dev/null
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/SKILL.md
@@ -0,0 +1,271 @@
+---
+name: spec
+description: >-
+  Creates and refines feature specifications as directory-based "spec packages"
+  with AI-driven decision-making. Handles the full pre-implementation lifecycle:
+  codebase analysis, spec drafting, decision surfacing, human refinement, and
+  approval. Auto-bootstraps .specs/ and Constitution on first use.
+  USE WHEN the user asks to "create a spec", "spec this feature", "new spec",
+  "plan a feature", "refine the spec", "approve the spec", "set up specs",
+  "create a constitution", or works with .specs/ directory and feature planning.
+  DO NOT USE for implementing specs (use /build) or checking spec health
+  (use /specs).
+version: 1.0.0
+argument-hint: "[feature-name] or [constitution]"
+---
+
+# Create & Refine Spec Package
+
+## Mental Model
+
+A spec package is a directory containing everything needed to implement a feature without human intervention during build. The human's job is to provide intent, make trade-off decisions, and confirm scope. The AI's job is everything else: drafting acceptance criteria, writing examples, identifying invariants, designing schema, and planning parallel decomposition.
+
+The spec package has two audiences:
+- **Human** reads `index.md` (~50-80 lines): intent, decisions, AC summary, scope
+- **AI** reads everything: `context.md` (invariants, schema, integration details) + `groups/*.md` (full ACs with examples)
+
+Every spec is a directory. No exceptions.
+
+```
+.specs/{domain}/{feature-name}/
+  index.md              # Human entry point
+  context.md            # AI-facing shared context
+  groups/
+    a-{name}.md         # AC group with frontmatter
+    b-{name}.md
+    ...
+```
+
+---
+
+## Workflow
+
+### Step 0: Bootstrap Check
+
+Before anything else, check if `.specs/` exists at the project root.
+
+**If missing:** Create it silently with:
+- `.specs/CONSTITUTION.md` — from `references/constitution-template.md`
+- `.specs/BACKLOG.md` — from `references/backlog-template.md`
+
+Inform the user: "Created `.specs/` with a Constitution template and Backlog. The Constitution captures project-level decisions so feature specs don't repeat them. Fill it in when you're ready — `/spec constitution` can help."
+
+**If `.specs/CONSTITUTION.md` exists but is still the empty template:** Note this for Step 3 — the AI will need to infer cross-cutting decisions from the codebase instead.
+
+### Step 1: Parse Intent
+
+If `$ARGUMENTS` is `constitution`:
+- Jump to the **Constitution Flow** (below)
+
+Otherwise, extract the feature name from `$ARGUMENTS` or conversation:
+- Normalize to kebab-case (e.g., `webhook delivery` → `webhook-delivery`)
+- If missing, ask the user what they want to build
+
+Check if a spec package already exists at `.specs/*/{feature-name}/`:
+- If yes: load it and enter **Refinement Mode** (Step 5)
+- If no: proceed to **Creation Mode** (Step 2)
+
+### Step 2: Codebase Analysis
+
+Before drafting anything, build context:
+
+1. **Read Constitution** (if populated) — extract all cross-cutting decisions
+2. **Scan codebase** for:
+   - Existing patterns: file structure, naming conventions, frameworks in use
+   - Related code: files that touch the feature's domain
+   - Dependencies: services, models, utilities the feature will likely use
+   - Test patterns: existing test structure and conventions
+3. **Read existing specs** — scan `.specs/` for related features to avoid conflicts
+
+This analysis informs every subsequent step. The AI doesn't ask the human questions it could answer by reading the codebase.
+
+### Step 3: Draft Spec Package
+
+Create the full directory structure in one pass:
+
+**Determine domain:** Infer from the feature name + existing `.specs/` domains. Confirm with user if ambiguous.
+
+**Create `.specs/{domain}/{feature-name}/`** with all files:
+
+#### 3a: Draft `index.md`
+
+Use `references/index-template.md`. Fill in:
+
+- **Frontmatter:** feature name, domain, status=planned, approval=draft, size estimate, group list
+- **Intent:** Draft 2-3 sentences from the user's description
+- **Decisions — Needs Your Input:** Identify genuine trade-offs where the human's judgment matters. These are decisions where:
+  - Multiple viable options exist with different trade-offs
+  - The choice affects user experience or business logic
+  - The AI cannot determine the right answer from codebase context alone
+- **Decisions — Already Decided:** Decisions the AI made because:
+  - Only one sane option exists (e.g., HMAC-SHA256 for webhook signing)
+  - The Constitution already specifies the choice
+  - The codebase already establishes a pattern
+- **AC Summary Table:** One-liner per acceptance criterion — enough for the human to check completeness
+- **Out of Scope:** Explicit non-goals
+
+**Decision surfacing is the critical skill here.** The AI must distinguish "this has real trade-offs the human should weigh" from "this is obvious and I should just decide it." Err on the side of deciding — the human can always override in the "Already Decided" section.
+
+#### 3b: Draft `context.md`
+
+Use `references/context-template.md`. Fill in:
+
+- **Invariants:** Things that must ALWAYS be true regardless of which AC is being implemented
+- **Anti-Patterns:** Explicit "do NOT" examples that prevent specification gaming
+- **Integration Context:** Dependency details inline — methods, object shapes, behavioral notes from related code discovered in Step 2
+- **Schema Intent:** Data model design — column names, types, constraints, indexes. NOT DDL.
+- **Constraints:** File paths grouped by area, pattern references, prohibitions, dependencies
+
+#### 3c: Draft Group Files
+
+Use `references/group-template.md`. For each logical group:
+
+- **Identify natural groupings** from AC dependencies and file ownership
+- **Name groups** with letter prefix for ordering: `a-registration.md`, `b-delivery.md`
+- **Write frontmatter:** group letter, name, criteria list, status=pending, depends_on, files_owned
+- **Write full ACs** with:
+  - EARS-format criterion text (When/If/While/Where patterns)
+  - Given/When/Then test clarity
+  - Inline examples (concrete I/O per AC)
+
+Use `references/ears-patterns.md` for EARS format guidance.
+
+**File ownership must not overlap between groups.** If two groups need to modify the same file, either:
+- Assign the file to one group and have the other read-only
+- Split the file's responsibilities more clearly
+- Note the coordination requirement in both group files
+
+### Step 4: Present to Human
+
+Show the user `index.md` content. Frame the review around three questions:
+
+1. **"Here are the decisions I need your input on."** Present the "Needs Your Input" table. Use `AskUserQuestion` with concrete options for each decision.
+
+2. **"Here are the decisions I already made."** Present the "Already Decided" table. Ask: "Any of these you'd change?"
+
+3. **"Here's what I'll build — anything missing?"** Present the AC summary table. This is a completeness check, not a detailed review.
+
+Also present: Intent (for accuracy) and Out of Scope (for confirmation).
+
+**Do NOT:**
+- Walk through every AC in detail — the human trusts the AI to write good ACs
+- Ask about decisions the Constitution already covers
+- Present decisions with only one viable option as "needs your input"
+- Ask multiple rounds of 15 questions — batch efficiently, 1-4 questions per round
+
+### Step 5: Refine & Finalize
+
+After the human responds:
+
+1. **Update decisions** based on human input
+2. **Propagate changes** to context.md and group files if decisions affect them
+3. **Add new ACs** if the human identified gaps
+4. **Update Resolved Questions** in index.md with the decision trail
+5. **Set `approval: approved`** and update `last_updated`
+
+If the human says "more refinement needed" — loop back to Step 4 with updated content.
+
+If the human wants to stop early — leave `approval: draft`.
+
+### Step 6: Completion
+
+Print:
+```
+Spec package created: .specs/{domain}/{feature-name}/
+
+  index.md     — {N} decisions, {M} acceptance criteria
+  context.md   — invariants, schema, integration context
+  groups/      — {K} groups ready for parallel build
+
+Status: approved. Run /build {feature-name} to implement.
+```
+
+---
+
+## Constitution Flow
+
+When `$ARGUMENTS` is `constitution`:
+
+1. Read the existing `.specs/CONSTITUTION.md` (if any)
+2. Analyze the codebase comprehensively:
+   - Package files (package.json, pyproject.toml, Cargo.toml, go.mod)
+   - Framework usage and patterns
+   - File structure and naming conventions
+   - Existing test setup
+   - Auth patterns
+   - Error handling patterns
+   - Logging setup
+   - Database and ORM usage
+3. Draft a Constitution from `references/constitution-template.md`, filling every section with concrete findings
+4. Present to the human for review — this is the ONE document worth careful human review since it affects every future spec
+5. Save to `.specs/CONSTITUTION.md`
+
+---
+
+## Refinement Mode
+
+When a spec package already exists (detected in Step 1):
+
+1. Read `index.md` frontmatter — check approval status
+2. If `draft` — present current state and ask what to change
+3. If `approved` — warn this is already approved, ask if they want to re-refine
+4. Apply changes through the same Step 4-5 flow
+5. Update `last_updated` in frontmatter
+
+---
+
+## Spec-Level Approval
+
+There is NO per-requirement approval tagging. The spec is either `draft` or `approved` as a whole.
+
+- **`draft`** — not ready for implementation. `/build` will reject it.
+- **`approved`** — human has reviewed decisions and scope. Ready for `/build`.
+
+The AI makes obvious decisions and tags them in the "Already Decided" section. The human can override any of them during refinement. Once the human says "looks good" — the whole spec is approved.
+
+---
+
+## AC Markers
+
+Acceptance criteria use three states during implementation by `/build`:
+
+| Marker | Meaning |
+|--------|---------|
+| `[ ]` | Not started |
+| `[~]` | Implemented, not yet verified — code written, tests not confirmed |
+| `[x]` | Verified — tests pass, behavior confirmed |
+
+`/spec` creates all ACs as `[ ]`. Only `/build` changes these markers.
+
+---
+
+## Ambiguity Policy
+
+- If the feature name matches an existing spec, enter Refinement Mode
+- If the domain is ambiguous, present 2-3 options and ask
+- If the feature scope is unclear, draft a minimal spec and flag gaps for the human
+- If the Constitution is empty/template, infer cross-cutting decisions from the codebase and note them in the "Already Decided" section — suggest the user run `/spec constitution` to formalize them
+
+---
+
+## Anti-Patterns
+
+- **Over-asking:** Presenting 10+ decisions that have obvious answers. The human's time is the most expensive resource. Ask only genuine trade-offs.
+- **Under-deciding:** Leaving decisions vague because "the human should decide." If the codebase or Constitution gives a clear answer, just decide it.
+- **Scope inflation:** Adding ACs for "nice to have" features the user didn't ask for. Stick to the stated intent.
+- **Empty context.md:** If you can't identify invariants, anti-patterns, or integration context, the feature is too vague — go back to the human for clarity.
+- **Overlapping file ownership:** Two groups owning the same file causes merge conflicts during parallel build. Always resolve ownership.
+
+---
+
+## Reference Files
+
+| File | Contents |
+|------|----------|
+| `references/index-template.md` | index.md template with frontmatter schema |
+| `references/context-template.md` | context.md template with all sections |
+| `references/group-template.md` | Group file template with frontmatter schema |
+| `references/constitution-template.md` | CONSTITUTION.md template |
+| `references/backlog-template.md` | BACKLOG.md template |
+| `references/ears-patterns.md` | EARS format patterns and examples |
+| `references/example-webhook/` | Complete example spec package (webhook delivery system) |
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/backlog-template.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/backlog-template.md
new file mode 100644
index 0000000..cd658fe
--- /dev/null
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/backlog-template.md
@@ -0,0 +1,21 @@
+# Backlog Template
+
+Simple idea parking lot. No priority grading, no complexity estimates — just a list of things to build someday.
+
+---
+
+```markdown
+# Backlog
+
+Feature ideas for future specs. Pick one and run `/spec <feature>` to flesh it out.
+
+## Ideas
+
+- [feature idea] — [one-line description]
+
+## Parked
+
+[Features explicitly deferred with a reason.]
+
+- [deferred feature] — [why it was deferred]
+```
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/constitution-template.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/constitution-template.md
new file mode 100644
index 0000000..f548d3f
--- /dev/null
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/constitution-template.md
@@ -0,0 +1,98 @@
+# Constitution Template
+
+The Constitution captures project-level cross-cutting decisions that every feature spec inherits. Written once, evolved as the project grows. Every section should contain concrete decisions, not placeholders.
+
+When running `/spec constitution`, the AI analyzes the codebase and fills in as much as possible, then presents to the human for review.
+
+---
+
+```markdown
+# Project Constitution
+
+**Project:** [name]
+**Last Updated:** YYYY-MM-DD
+
+## Tech Stack
+
+- **Language:** [e.g., Python 3.12]
+- **Framework:** [e.g., FastAPI 0.115]
+- **Database:** [e.g., PostgreSQL 16 via SQLAlchemy 2.0 async]
+- **Key Libraries:** [e.g., Pydantic 2.x, structlog, httpx, alembic]
+- **Package Manager:** [e.g., uv, npm, bun]
+
+## Architecture
+
+- **File Structure:**
+  ```
+  src/
+    api/          # Route handlers
+    models/       # SQLAlchemy models
+    schemas/      # Pydantic request/response schemas
+    services/     # Business logic
+    repos/        # Data access (repository pattern)
+    jobs/         # Background jobs
+  tests/
+    unit/         # Unit tests mirroring src/ structure
+    integration/  # API-level tests
+  ```
+- **Module Boundaries:** [what can import what]
+- **Data Access Pattern:** [repository pattern, direct ORM, raw SQL]
+
+## API Conventions
+
+- **Format:** [REST, GraphQL, RPC]
+- **Error Responses:** [RFC 7807, custom format, include example]
+- **Pagination:** [cursor-based, offset-based, keyset]
+- **Versioning:** [URL prefix /v1, header, none]
+- **ID Format:** [prefixed UUIDs like usr_, tsk_, sequential integers]
+- **Timestamps:** [ISO 8601, UTC always]
+
+## Auth & Security
+
+- **Mechanism:** [JWT with refresh tokens, session cookies, OAuth2]
+- **Token Lifecycle:** [access expiry, refresh expiry, rotation policy]
+- **Password Hashing:** [bcrypt cost 12, argon2id]
+- **Input Validation:** [where and how — Pydantic at handler boundary, etc.]
+- **CORS:** [policy per environment]
+
+## Testing
+
+- **Framework:** [pytest + httpx, vitest, jest]
+- **Directory:** [tests/ mirroring src/, flat, colocated]
+- **Naming:** [test_{module}_{scenario}_{expected}, describe/it blocks]
+- **Coverage Target:** [80% ideal, 60% minimum]
+- **Test Database:** [SQLite in-memory for unit, PostgreSQL for integration]
+- **Fixtures:** [shared in conftest.py, factory pattern]
+
+## Code Patterns
+
+- **Error Handling:** [domain exceptions in services, handler catches and converts]
+- **Logging:** [structlog JSON, log at service boundaries]
+- **Configuration:** [env vars via Pydantic Settings, no hardcoded values]
+- **Naming:** [snake_case functions, PascalCase classes, UPPER_SNAKE constants]
+- **Async:** [all I/O async, asyncio]
+
+## Boundaries
+
+### Always
+- [Run full test suite before declaring work complete]
+- [Include type hints on all function signatures]
+- [Handle errors at appropriate boundaries]
+- [Use existing utilities before creating new ones]
+
+### Never
+- [Add new dependencies without justification in the spec]
+- [Hardcode secrets, URLs, or environment-specific values]
+- [Modify database schema without a migration file]
+- [Skip writing tests for new code paths]
+- [Use type: ignore or noqa without explaining why]
+```
+
+---
+
+## Usage Notes
+
+- **Sizing:** As detailed as needed. 200-400 lines is normal for a mature project. Context is free in a 1M token window.
+- **Evolution:** When `/build` produces `[ai-decided]` items that reveal gaps (e.g., "AI chose cursor pagination because Constitution doesn't specify"), promote those to Constitution entries.
+- **Override:** Feature specs can override Constitution decisions by noting it explicitly in their Decisions table: "D-N: Override Constitution — using X instead of Y because [reason]"
+- **Not a style guide:** The Constitution captures architectural decisions, not coding style preferences. Use linters and formatters for style.
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/context-template.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/context-template.md
new file mode 100644
index 0000000..60c8532
--- /dev/null
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/context-template.md
@@ -0,0 +1,86 @@
+# context.md Template
+
+Shared AI context read by every implementing agent. Contains cross-cutting concerns for THIS feature — not project-level concerns (those live in the Constitution).
+
+No frontmatter needed. Pure content.
+
+---
+
+```markdown
+## Invariants
+
+[Things that must ALWAYS be true, regardless of which AC is being implemented.
+Every agent should verify their work against this list. These are defensive
+assertions that catch specification gaming and cross-agent inconsistencies.]
+
+- [invariant 1]
+- [invariant 2]
+
+## Anti-Patterns
+
+[Explicit "do NOT" examples. Prevent the AI from satisfying literal spec
+requirements while violating the design intent.]
+
+- **Do NOT [thing]** — because [why]
+- **Do NOT [thing]** — because [why]
+
+## Integration Context
+
+[Dependency details inline so implementing agents don't need to read other files.
+Include method signatures, object shapes, and behavioral notes for every
+dependency the feature interacts with.]
+
+**[Dependency Name]** (`path/to/file`):
+- `method_name(params) -> return_type` — [behavioral notes]
+- Object shape: [key fields and their types]
+- [Important behavioral note, e.g., "raises XError on failure"]
+
+## Schema Intent
+
+[Data model design — specific enough to build models and migrations without
+guessing, but NOT raw DDL. Column names, types, constraints, indexes.]
+
+**[table_name]:**
+
+| Column | Type | Constraints | Notes |
+|--------|------|-------------|-------|
+| id | text | PK | Prefixed UUID, e.g. `whk_` + UUID4 |
+| [col] | [type] | [constraints] | [notes] |
+
+Indexes: [list]
+Unique constraints: [list]
+
+## Constraints
+
+[Architectural boundaries, file locations, patterns, and prohibitions.]
+
+**Files (by area):**
+
+*[Area 1]:*
+- `src/path/to/file.py` — [purpose]
+
+*[Area 2]:*
+- `src/path/to/file.py` — [purpose]
+
+**Patterns:**
+- Follow `src/path/to/reference.py` for [pattern type]
+
+**Must NOT:**
+- [prohibition 1]
+- [prohibition 2]
+
+**Depends on:**
+- [spec, feature, or system this depends on]
+```
+
+---
+
+## Section Guidelines
+
+| Section | Purpose | When to Include |
+|---------|---------|----------------|
+| Invariants | Global assertions all agents must respect | Always — even simple features have invariants |
+| Anti-Patterns | Prevent specification gaming | When there are non-obvious "wrong" implementations |
+| Integration Context | Inline dependency details | When agents interact with existing code |
+| Schema Intent | Data model design | When the feature creates or modifies tables |
+| Constraints | File paths, patterns, prohibitions | Always — every feature has file locations |
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/ears-patterns.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/ears-patterns.md
new file mode 100644
index 0000000..3c0137f
--- /dev/null
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/ears-patterns.md
@@ -0,0 +1,124 @@
+# EARS Patterns for Acceptance Criteria
+
+EARS (Easy Approach to Requirements Syntax) provides five structured patterns for writing unambiguous requirements. Use these in AC criterion text within group files.
+
+---
+
+## The Five EARS Patterns
+
+### 1. Ubiquitous (always true)
+
+**Pattern:** The system shall [action].
+
+```
+The system shall log all authentication events with timestamp and user ID.
+The system shall encrypt data at rest using AES-256.
+```
+
+**Use for:** System-wide behaviors, security invariants, logging requirements. Often better placed in Constitution or Invariants than individual ACs.
+
+### 2. Event-Driven (when something happens)
+
+**Pattern:** When [event], the system shall [action].
+
+```
+When a user submits valid login credentials, the system shall return a JWT token.
+When a webhook endpoint returns a non-2xx status, the system shall schedule a retry.
+```
+
+**Use for:** Most acceptance criteria. Triggered behaviors with clear cause and effect.
+
+### 3. State-Driven (while a condition holds)
+
+**Pattern:** While [condition], the system shall [action].
+
+```
+While the endpoint has two active secrets, the system shall include both signatures.
+While the user session is active, the system shall refresh the token automatically.
+```
+
+**Use for:** Behaviors that depend on ongoing state, not a one-time trigger.
+
+### 4. Unwanted Behavior (if bad thing, then protection)
+
+**Pattern:** If [unwanted condition], then the system shall [response].
+
+```
+If a webhook URL points to a private IP range, then the system shall reject registration.
+If the request body exceeds 10MB, then the system shall return 413 Payload Too Large.
+```
+
+**Use for:** Error handling, validation, security boundaries, edge cases.
+
+### 5. Optional Feature (where configured)
+
+**Pattern:** Where [feature is configured], the system shall [action].
+
+```
+Where event filtering is enabled on an endpoint, the system shall deliver only matching events.
+Where rate limiting is configured, the system shall enforce the configured threshold.
+```
+
+**Use for:** Configurable behaviors, feature flags, per-tenant settings.
+
+---
+
+## Combining EARS with Given/When/Then
+
+Each AC should have:
+1. **EARS criterion** — the requirement (what the system SHALL do)
+2. **Given/When/Then** — the test scenario (how to verify it)
+3. **Example** — concrete I/O (what it looks like in practice)
+
+```markdown
+## AC-3: Private IP Rejection
+
+If a webhook URL points to a private IP range (10.0.0.0/8, 172.16.0.0/12,
+192.168.0.0/16, 127.0.0.0/8, ::1), then the system shall reject registration
+with a clear error message.
+
+→ Given URL "https://192.168.1.1/hook",
+  when POST /api/v1/projects/proj_abc/webhooks,
+  then 422 with RFC 7807 error
+
+Example:
+→ `POST /api/v1/projects/proj_abc/webhooks`
+  ```json
+  { "url": "https://192.168.1.1/hook", "events": ["task.created"] }
+  ```
+→ `422 Unprocessable Entity`
+  ```json
+  {
+    "type": "taskforge/invalid-webhook-url",
+    "title": "URL must not point to a private network",
+    "status": 422,
+    "detail": "192.168.1.1 is in a private range (192.168.0.0/16)."
+  }
+  ```
+```
+
+---
+
+## Compound Patterns
+
+Combine patterns when needed:
+
+```
+When [event] AND while [condition], the system shall [action].
+When [event], if [guard], then the system shall [action].
+```
+
+Example:
+```
+When a delivery attempt fails AND while the retry count is below 5,
+the system shall schedule the next retry with exponential backoff.
+```
+
+---
+
+## Common Mistakes
+
+- **Vague verbs:** "handle", "manage", "process" — replace with specific actions: "return", "store", "enqueue", "reject"
+- **Missing actor:** "The endpoint should be validated" — by whom? "The system shall validate the endpoint URL"
+- **Untestable criteria:** "The system shall be fast" — replace with measurable: "The system shall respond within 200ms at the 95th percentile"
+- **Implementation details:** "The system shall use a Redis sorted set" — that's a decision, not a requirement. Put it in the Decisions table.
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/example-webhook/context.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/example-webhook/context.md
new file mode 100644
index 0000000..c841c02
--- /dev/null
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/example-webhook/context.md
@@ -0,0 +1,135 @@
+## Invariants
+
+- `delivery_id` is globally unique — UUID4, prefixed `del_`. Generated once per event-endpoint pair, reused across retries.
+- No HTTP requests to private IP ranges, ever — defense in depth even if URL validation was bypassed.
+- All webhook endpoint URLs are HTTPS — no HTTP, no exceptions, no fallback.
+- Signing secrets (`whsec_*`) are NEVER logged, NEVER in API error responses, NEVER in delivery logs, NEVER returned after initial creation.
+- Event `timestamp` = time the event occurred in the source system, NOT delivery/retry time.
+- Response bodies in delivery logs are truncated to 1,024 bytes. Never store more.
+- The `ping` event type is reserved for system use (AC-11) and cannot be subscribed to.
+- All IDs use prefixed UUIDs per Constitution: `whk_` (endpoints), `del_` (deliveries), `wdl_` (log entries).
+- Endpoints in `disabled` status receive zero deliveries. Check before enqueue, not after.
+
+## Anti-Patterns
+
+- **Do NOT retry on 4xx client errors** — 4xx means consumer rejected it. Only retry on 5xx and timeouts.
+- **Do NOT block event producers on delivery** — task creation completes immediately. Delivery is async via job queue.
+- **Do NOT store signing secrets in delivery logs** — log delivery_id, status, timing — never secrets or outbound headers.
+- **Do NOT use sequential integer IDs** — prefixed UUIDs per Constitution.
+- **Do NOT send deliveries to disabled endpoints** — check status BEFORE enqueuing.
+- **Do NOT deliver events the endpoint didn't subscribe to** — filter BEFORE enqueuing.
+- **Do NOT include signing secret in any API response except initial creation (AC-1).**
+- **Do NOT implement delivery synchronously in API handlers** — always via job queue.
+- **Do NOT use `requests` library** — use `httpx` (async) per Constitution.
+
+## Integration Context
+
+**Auth Middleware** (`src/middleware/auth.py`):
+- Extracts JWT from `Authorization: Bearer <token>` header
+- Sets `request.state.user` with: `id: str`, `email: str`, `projects: list[ProjectRole]`
+- `ProjectRole`: `{"project_id": "proj_abc", "role": "owner"}`
+- Role check for webhooks: `role == "owner"` for the target project
+- Returns 401 (no token), 403 (wrong role)
+
+**Email Service** (`src/services/email.py`):
+- `await email_service.send(to: str, subject: str, body: str) -> str`
+- Returns email delivery ID. Handles retries internally.
+- Raises `EmailDeliveryError` on permanent failure. Plain text only.
+
+**Job Runner** (`src/jobs/base.py`):
+- Subclass `BaseJob`, implement `async def execute(self, payload: dict) -> None`
+- Enqueue: `await job_queue.enqueue(job_name, payload, run_at=None)`
+- `run_at` supports future scheduling (use for retry backoff delays)
+- Failed jobs logged but NOT auto-retried by runner. Retry logic is explicit.
+
+**Event System** (`src/services/event_bus.py` — NEW, part of this feature):
+- `event_bus.emit(event_type: str, project_id: str, data: dict)`
+- `event_bus.subscribe("*", handler)` or `event_bus.subscribe("task.created", handler)`
+- Synchronous in-process dispatch. Subscribers must be fast (just enqueue a job).
+
+## Schema Intent
+
+**webhook_endpoints:**
+
+| Column | Type | Constraints | Notes |
+|--------|------|-------------|-------|
+| id | text | PK | `whk_` + UUID4 |
+| project_id | text | FK → projects.id, NOT NULL | |
+| url | text | NOT NULL | Validated: HTTPS, no private IPs |
+| description | text | nullable | Human-readable label |
+| events | jsonb | NOT NULL | Array of event type strings |
+| secret_current | text | NOT NULL | `whsec_` + 32-byte base64url |
+| secret_previous | text | nullable | During rotation window |
+| status | text | NOT NULL, CHECK IN ('active','disabled') | Default: 'active' |
+| disabled_at | timestamptz | nullable | Set by health check |
+| failure_streak_start | timestamptz | nullable | Reset on success |
+| created_at | timestamptz | NOT NULL, default now() | |
+| updated_at | timestamptz | NOT NULL, default now() | |
+
+Indexes: `(project_id)`, `(project_id, status)`
+
+**webhook_deliveries:**
+
+| Column | Type | Constraints | Notes |
+|--------|------|-------------|-------|
+| id | text | PK | `wdl_` + UUID4 |
+| endpoint_id | text | FK → webhook_endpoints.id, NOT NULL | |
+| delivery_id | text | NOT NULL | `del_` + UUID4, same across retries |
+| event_type | text | NOT NULL | |
+| attempt_number | integer | NOT NULL | 1-based, max 6 |
+| status | text | NOT NULL, CHECK IN ('pending','success','failed') | |
+| http_status | integer | nullable | Null if timeout/connection error |
+| response_body | text | nullable | Truncated to 1024 chars |
+| response_time_ms | integer | nullable | |
+| error_message | text | nullable | |
+| payload_snapshot | jsonb | NOT NULL | Full envelope for replay |
+| created_at | timestamptz | NOT NULL, default now() | |
+
+Indexes: `(endpoint_id, created_at DESC)`, `(delivery_id)`, `(created_at)`
+Unique constraint: `(delivery_id, attempt_number)`
+
+## Constraints
+
+**Files (by area):**
+
+*Models & Schemas:*
+- `src/models/webhook.py` — SQLAlchemy: WebhookEndpoint, WebhookDelivery
+- `src/schemas/webhook.py` — Pydantic schemas for all request/response types
+
+*Data Access:*
+- `src/repos/webhook_repo.py` — WebhookRepo: endpoint CRUD, delivery log queries, retention
+
+*Business Logic:*
+- `src/services/webhook_service.py` — Registration, URL validation, secret management
+- `src/services/webhook_delivery_service.py` — Delivery, signing, retry, failure tracking
+- `src/services/event_bus.py` — NEW: pub/sub event system
+
+*API Layer:*
+- `src/api/webhooks.py` — All webhook endpoints
+
+*Background Jobs:*
+- `src/jobs/webhook_delivery_job.py` — Async delivery worker
+- `src/jobs/webhook_health_check_job.py` — Daily 14-day failure check
+- `src/jobs/webhook_log_retention_job.py` — Daily log cleanup
+
+*Database:*
+- `alembic/versions/xxx_add_webhooks.py` — Migration
+
+**Patterns:**
+- Service class: follow `src/services/project_service.py`
+- Job class: follow `src/jobs/base.py` (BaseJob subclass)
+- Repository: follow `src/repos/project_repo.py`
+- URL validation: `ipaddress` stdlib for private range checks
+
+**Must NOT:**
+- Add new infrastructure dependencies (no Redis, RabbitMQ, Celery)
+- Store response bodies > 1KB
+- Allow delivery to non-HTTPS URLs or private IPs
+- Expose signing secrets after initial creation
+- Block API handlers on webhook delivery
+
+**Depends on:**
+- Auth middleware (`src/middleware/auth.py`) — existing
+- Email service (`src/services/email.py`) — existing
+- Job runner (`src/jobs/base.py`) — existing
+- Projects model (`src/models/project.py`) — FK target
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/example-webhook/groups/a-registration.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/example-webhook/groups/a-registration.md
new file mode 100644
index 0000000..135a9f6
--- /dev/null
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/example-webhook/groups/a-registration.md
@@ -0,0 +1,101 @@
+---
+group: A
+name: Registration & Configuration
+criteria: [AC-1, AC-2, AC-3]
+status: pending
+owner: null
+depends_on: []
+files_owned:
+  - src/models/webhook.py
+  - src/schemas/webhook.py
+  - src/repos/webhook_repo.py
+  - src/services/webhook_service.py
+  - src/api/webhooks.py
+  - alembic/versions/xxx_add_webhooks.py
+  - tests/unit/services/test_webhook_service.py
+  - tests/integration/test_webhooks_api.py
+---
+
+## AC-1: Webhook Endpoint Registration
+
+When a project owner registers a webhook endpoint, the system shall validate the URL (HTTPS required, no private IPs), generate a signing secret, and return the endpoint ID and secret.
+
+→ Given authenticated owner of project proj_abc,
+  when POST /api/v1/projects/proj_abc/webhooks with valid HTTPS URL,
+  then 201 with endpoint object including id and secret
+
+Example:
+→ `POST /api/v1/projects/proj_abc/webhooks`
+  ```json
+  {
+    "url": "https://partner.example.com/taskforge-hook",
+    "events": ["task.created", "task.status_changed"],
+    "description": "Sync tasks to Partner CRM"
+  }
+  ```
+→ `201 Created`
+  ```json
+  {
+    "id": "whk_a1b2c3d4",
+    "url": "https://partner.example.com/taskforge-hook",
+    "events": ["task.created", "task.status_changed"],
+    "description": "Sync tasks to Partner CRM",
+    "secret": "whsec_dGhpcyBpcyBhIDMyLWJ5dGUgcmFuZG9tIHZhbHVl",
+    "status": "active",
+    "created_at": "2026-03-13T12:00:00Z"
+  }
+  ```
+
+Note: The secret is ONLY returned on creation. Subsequent GET requests omit it.
+
+## AC-2: Event Filter Configuration
+
+When a project owner configures event filters on an endpoint, the system shall deliver only matching event types to that endpoint.
+
+→ Given endpoint whk_a1b2c3d4 subscribed to ["task.created"],
+  when a task.completed event fires for the same project,
+  then no delivery attempt is made to that endpoint
+
+Example:
+→ `PATCH /api/v1/projects/proj_abc/webhooks/whk_a1b2c3d4`
+  ```json
+  { "events": ["task.created", "task.completed"] }
+  ```
+→ `200 OK`
+  ```json
+  {
+    "id": "whk_a1b2c3d4",
+    "events": ["task.created", "task.completed"],
+    "status": "active"
+  }
+  ```
+
+## AC-3: Private IP Rejection
+
+If a webhook URL points to a private IP range (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 127.0.0.0/8, ::1), then the system shall reject registration with a clear error message.
+
+→ Given URL "https://192.168.1.1/hook",
+  when POST /api/v1/projects/proj_abc/webhooks,
+  then 422 with RFC 7807 error
+
+Example:
+→ `POST /api/v1/projects/proj_abc/webhooks`
+  ```json
+  { "url": "https://192.168.1.1/hook", "events": ["task.created"] }
+  ```
+→ `422 Unprocessable Entity`
+  ```json
+  {
+    "type": "taskforge/invalid-webhook-url",
+    "title": "URL must not point to a private network",
+    "status": 422,
+    "detail": "192.168.1.1 is in a private range (192.168.0.0/16). Webhook URLs must resolve to public IP addresses."
+  }
+  ```
+
+---
+
+## AI Decisions
+
+| # | Decision | Choice | Reasoning |
+|---|----------|--------|-----------|
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/example-webhook/groups/b-delivery.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/example-webhook/groups/b-delivery.md
new file mode 100644
index 0000000..5f28cf9
--- /dev/null
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/example-webhook/groups/b-delivery.md
@@ -0,0 +1,141 @@
+---
+group: B
+name: Delivery & Signing
+criteria: [AC-4, AC-5, AC-6, AC-7, AC-15]
+status: pending
+owner: null
+depends_on:
+  - a-registration
+files_owned:
+  - src/services/webhook_delivery_service.py
+  - src/services/event_bus.py
+  - src/jobs/webhook_delivery_job.py
+  - tests/unit/services/test_webhook_delivery_service.py
+  - tests/unit/services/test_event_bus.py
+---
+
+## AC-4: Signed Payload Delivery
+
+When an event occurs that matches an active endpoint's subscription, the system shall deliver a signed payload within 30 seconds of the event.
+
+→ Given active endpoint subscribed to task.created,
+  when a task is created in the project,
+  then POST to endpoint URL within 30s with signed envelope payload
+
+Example (what the consumer's server receives):
+```
+POST https://partner.example.com/taskforge-hook
+Content-Type: application/json
+X-TaskForge-Signature: sha256=a1b2c3d4e5f6...
+X-TaskForge-Delivery-ID: del_x1y2z3
+X-TaskForge-Event: task.created
+```
+```json
+{
+  "event": "task.created",
+  "timestamp": "2026-03-13T12:05:00Z",
+  "delivery_id": "del_x1y2z3",
+  "project_id": "proj_abc",
+  "data": {
+    "id": "tsk_789",
+    "title": "Implement webhook system",
+    "status": "in_progress",
+    "assignee_id": "usr_456",
+    "created_at": "2026-03-13T12:05:00Z"
+  }
+}
+```
+
+Consumer responds `200 OK` → delivery marked successful.
+
+## AC-5: HMAC-SHA256 Signature
+
+The system shall sign every delivery with HMAC-SHA256 using the endpoint's current secret, included in the X-TaskForge-Signature header as `sha256=<hex>`.
+
+→ Given delivery payload body as bytes,
+  when signature computed,
+  then header value = "sha256=" + hex(hmac_sha256(secret_bytes, body_bytes))
+
+Example (consumer verification pseudocode):
+```python
+import hmac, hashlib
+expected = hmac.new(secret.encode(), request.body, hashlib.sha256).hexdigest()
+received = request.headers["X-TaskForge-Signature"].removeprefix("sha256=")
+is_valid = hmac.compare_digest(expected, received)
+```
+
+## AC-6: Idempotency Key
+
+The system shall include an idempotency key (X-TaskForge-Delivery-ID) in every delivery. Retries of the same event to the same endpoint shall use the same delivery ID.
+
+→ Given a failed delivery del_x1y2z3 being retried,
+  when the retry is sent,
+  then X-TaskForge-Delivery-ID = "del_x1y2z3" (unchanged)
+
+Example:
+```
+Attempt 1: X-TaskForge-Delivery-ID: del_x1y2z3
+Attempt 2: X-TaskForge-Delivery-ID: del_x1y2z3  (same)
+Attempt 3: X-TaskForge-Delivery-ID: del_x1y2z3  (same)
+```
+
+## AC-7: Dual-Signature During Rotation
+
+While an endpoint has two active secrets (rotation window), the system shall sign with the newer secret and include both signatures comma-separated.
+
+→ Given endpoint with secret_current (v2) and secret_previous (v1),
+  when delivery sent,
+  then X-TaskForge-Signature = "sha256=<v2_sig>,sha256=<v1_sig>"
+
+Example:
+```
+X-TaskForge-Signature: sha256=abc123...,sha256=def456...
+```
+Consumer verifies against their known secret. If EITHER matches, accept.
+
+## AC-15: Envelope Payload Format
+
+Every webhook payload shall follow the envelope format with event, timestamp, delivery_id, project_id, and data fields.
+
+→ Given any event type,
+  when delivered to any endpoint,
+  then payload matches the envelope schema exactly
+
+Example (task.status_changed):
+```json
+{
+  "event": "task.status_changed",
+  "timestamp": "2026-03-13T14:30:00Z",
+  "delivery_id": "del_sts_456",
+  "project_id": "proj_abc",
+  "data": {
+    "id": "tsk_789",
+    "previous_status": "in_progress",
+    "new_status": "completed",
+    "changed_by": "usr_456"
+  }
+}
+```
+
+Example (comment.created):
+```json
+{
+  "event": "comment.created",
+  "timestamp": "2026-03-13T15:00:00Z",
+  "delivery_id": "del_cmt_789",
+  "project_id": "proj_abc",
+  "data": {
+    "id": "cmt_321",
+    "task_id": "tsk_789",
+    "author_id": "usr_456",
+    "body": "Looks good, shipping tomorrow."
+  }
+}
+```
+
+---
+
+## AI Decisions
+
+| # | Decision | Choice | Reasoning |
+|---|----------|--------|-----------|
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/example-webhook/groups/c-retry.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/example-webhook/groups/c-retry.md
new file mode 100644
index 0000000..8a93bd7
--- /dev/null
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/example-webhook/groups/c-retry.md
@@ -0,0 +1,112 @@
+---
+group: C
+name: Retry & Failure Handling
+criteria: [AC-8, AC-9, AC-10, AC-11]
+status: pending
+owner: null
+depends_on:
+  - b-delivery
+files_owned:
+  - src/jobs/webhook_health_check_job.py
+---
+
+Note: Group C adds retry methods to `src/services/webhook_delivery_service.py` (owned by Group B). Coordinate: Group B implements `deliver()`, `sign_payload()`, `build_envelope()`. Group C adds `schedule_retry()`, `record_failure()`, `check_failure_streak()`, `handle_re_enable()`.
+
+## AC-8: Exponential Backoff Retry
+
+If an endpoint returns a non-2xx status or the request times out (>10s), then the system shall retry with exponential backoff: 1min, 5min, 30min, 2hr, 12hr.
+
+→ Given endpoint returns 500 on first attempt at T,
+  when retry schedule executes,
+  then attempts at T+1m, T+6m, T+36m, T+2h36m, T+14h36m
+
+Example (delivery log progression):
+```
+Attempt 1: T+0s      → 500 Internal Server Error    (42ms)
+Attempt 2: T+1m      → 503 Service Unavailable       (108ms)
+Attempt 3: T+6m      → timeout                       (10,001ms)
+Attempt 4: T+36m     → 200 OK ✓                      (89ms)
+```
+
+## AC-9: Exhausted Retries
+
+If all 5 retries fail for a delivery, the system shall mark it as "failed" and store the last error response (status code + first 1KB of body).
+
+→ Given 6 total attempts (initial + 5 retries) all fail for del_x1y2z3,
+  when last retry fails,
+  then delivery status = "failed", error details stored
+
+Example (delivery record after exhausting retries):
+```json
+{
+  "delivery_id": "del_x1y2z3",
+  "status": "failed",
+  "attempts": 6,
+  "last_http_status": 503,
+  "last_response_body": "Service Temporarily Unavailable...",
+  "error_message": "All 6 attempts failed. Last: timeout after 10001ms"
+}
+```
+
+## AC-10: Auto-Disable After 14-Day Failure
+
+If an endpoint has failed every delivery for 14 consecutive days, the system shall disable the endpoint and send a notification email to the project owner.
+
+→ Given endpoint whk_a1b2c3d4 with failure_streak_start 14+ days ago
+  and zero successful deliveries in that window,
+  when the daily health check job runs,
+  then endpoint status = "disabled", disabled_at set, email sent
+
+Example (email):
+```
+Subject: Webhook endpoint disabled — partner.example.com
+
+Your webhook endpoint "Sync tasks to Partner CRM" has been automatically
+disabled after 14 consecutive days of delivery failures.
+
+Endpoint: https://partner.example.com/taskforge-hook
+Last error: 503 Service Unavailable
+Failed deliveries in last 14 days: 47
+
+To re-enable: PATCH /api/v1/projects/proj_abc/webhooks/whk_a1b2c3d4
+{ "status": "active" }
+```
+
+## AC-11: Ping Test on Re-Enable
+
+When a disabled endpoint is re-enabled by the owner, the system shall send a test ping event before resuming normal delivery.
+
+→ Given disabled endpoint whk_a1b2c3d4,
+  when owner PATCH /webhooks/whk_a1b2c3d4 { "status": "active" },
+  then system sends "ping" event, waits for 2xx, then resumes delivery
+
+Example (success):
+→ System sends ping:
+  ```json
+  {
+    "event": "ping",
+    "timestamp": "2026-03-27T09:00:00Z",
+    "delivery_id": "del_ping_abc",
+    "project_id": "proj_abc",
+    "data": { "message": "Webhook re-enabled. This is a test delivery." }
+  }
+  ```
+→ Ping gets 200 → endpoint re-enabled, delivery resumes
+
+Example (failure):
+→ Ping gets 503 → endpoint stays disabled:
+  ```json
+  {
+    "type": "taskforge/webhook-ping-failed",
+    "title": "Endpoint did not respond to test ping",
+    "status": 409,
+    "detail": "Sent test ping but received 503. Fix the endpoint and try again."
+  }
+  ```
+
+---
+
+## AI Decisions
+
+| # | Decision | Choice | Reasoning |
+|---|----------|--------|-----------|
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/example-webhook/groups/d-logs.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/example-webhook/groups/d-logs.md
new file mode 100644
index 0000000..fe51237
--- /dev/null
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/example-webhook/groups/d-logs.md
@@ -0,0 +1,94 @@
+---
+group: D
+name: Delivery Logs & Debugging
+criteria: [AC-12, AC-13, AC-14]
+status: pending
+owner: null
+depends_on:
+  - b-delivery
+files_owned:
+  - src/jobs/webhook_log_retention_job.py
+---
+
+Note: Group D adds delivery log query methods to `src/repos/webhook_repo.py` (owned by Group A): `list_deliveries()`, `get_delivery()`, `delete_old_deliveries()`. Also adds delivery log + replay endpoints to `src/api/webhooks.py` (owned by Group A). Coordinate with Group A on shared files.
+
+## AC-12: Delivery Log Storage
+
+The system shall store a delivery log for each attempt including: timestamp, HTTP status, response time, delivery ID, event type, attempt number, and response body (first 1KB).
+
+→ Given any delivery attempt (success or failure),
+  when GET /webhooks/:id/deliveries,
+  then paginated list with full attempt details
+
+Example:
+→ `GET /api/v1/projects/proj_abc/webhooks/whk_a1b2c3d4/deliveries?limit=20`
+→ `200 OK`
+  ```json
+  {
+    "deliveries": [
+      {
+        "id": "wdl_log1",
+        "delivery_id": "del_x1y2z3",
+        "event_type": "task.created",
+        "attempt_number": 1,
+        "status": "failed",
+        "http_status": 500,
+        "response_time_ms": 342,
+        "response_body": "{\"error\": \"internal server error\"}",
+        "created_at": "2026-03-13T12:05:01Z"
+      },
+      {
+        "id": "wdl_log2",
+        "delivery_id": "del_x1y2z3",
+        "event_type": "task.created",
+        "attempt_number": 2,
+        "status": "success",
+        "http_status": 200,
+        "response_time_ms": 89,
+        "response_body": "OK",
+        "created_at": "2026-03-13T12:06:01Z"
+      }
+    ],
+    "next_cursor": "wdl_log0",
+    "has_more": true
+  }
+  ```
+
+## AC-13: Delivery Replay
+
+When a project owner requests a delivery replay, the system shall re-send the original payload with a new delivery ID and fresh signature.
+
+→ Given past delivery del_x1y2z3,
+  when POST /webhooks/:id/deliveries/del_x1y2z3/replay,
+  then new delivery with original event data, new del_ ID, fresh signature
+
+Example:
+→ `POST /api/v1/projects/proj_abc/webhooks/whk_a1b2c3d4/deliveries/del_x1y2z3/replay`
+→ `202 Accepted`
+  ```json
+  {
+    "delivery_id": "del_replay_n3w",
+    "original_delivery_id": "del_x1y2z3",
+    "status": "pending",
+    "message": "Replay enqueued. Original payload re-sent with new delivery ID."
+  }
+  ```
+
+## AC-14: 30-Day Log Retention
+
+The system shall retain delivery logs for 30 days, then delete them.
+
+→ Given delivery log entries older than 30 days,
+  when the daily retention job runs,
+  then entries with created_at < (now - 30 days) are deleted
+
+Testing note: No API example — background job. Verify:
+- Create log with created_at = 31 days ago → deleted after job
+- Create log with created_at = 29 days ago → NOT deleted
+
+---
+
+## AI Decisions
+
+| # | Decision | Choice | Reasoning |
+|---|----------|--------|-----------|
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/example-webhook/index.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/example-webhook/index.md
new file mode 100644
index 0000000..a059400
--- /dev/null
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/example-webhook/index.md
@@ -0,0 +1,84 @@
+---
+feature: Webhook Delivery System
+domain: integrations
+status: planned
+approval: approved
+size: L
+last_updated: 2026-03-13
+groups:
+  - a-registration
+  - b-delivery
+  - c-retry
+  - d-logs
+---
+
+# Webhook Delivery System
+
+## Intent
+
+Third-party integrations need real-time notifications when events occur in TaskForge (task created, status changed, comment added). Currently integrations must poll the API, which wastes resources and introduces latency. Webhooks are the #1 partner request and a blocker for the Zapier partnership.
+
+## Decisions
+
+### Needs Your Input
+
+| # | Question | Options | AI Recommendation |
+|---|----------|---------|-------------------|
+| D-1 | Delivery guarantee | At-least-once with idempotency keys / Exactly-once / At-most-once | At-least-once (industry standard — Stripe, GitHub, Shopify) |
+| D-2 | Retry policy | Exponential backoff (1m,5m,30m,2h,12h) / Fixed 5min interval / No retries | Exponential backoff over ~15hr |
+| D-3 | Queue backend | Existing PG job queue / Add Redis+Bull | PG job queue (no new infra, handles our scale) |
+
+### Already Decided
+
+| # | Decision | Choice | Why |
+|---|----------|--------|-----|
+| D-4 | Signature mechanism | HMAC-SHA256 per endpoint | Industry standard, no real alternative |
+| D-5 | Payload format | JSON envelope (event, timestamp, delivery_id, project_id, data) | Only sane choice for typed event routing |
+| D-6 | Secret format | `whsec_` + 32-byte base64url | Follows project's prefix convention |
+| D-7 | Delivery timeout | 10 seconds | Fast enough for queue, generous for consumers |
+| D-8 | Event filtering | Per-endpoint subscription list | Partners care about 2-3 event types, not firehose |
+| D-9 | Secret rotation | Dual-secret window with both signatures | Zero-downtime (Stripe pattern) |
+| D-10 | Delivery log retention | 30 days | Enough for debugging, within storage budget |
+| D-11 | Failure threshold | Disable after 14 days of 100% failure | Long enough for outages, short enough to stop waste |
+
+## Acceptance Criteria
+
+| AC | Group | Summary |
+|----|-------|---------|
+| AC-1 | Registration | Endpoint registration with URL validation and secret generation |
+| AC-2 | Registration | Event filter configuration per endpoint |
+| AC-3 | Registration | Private IP rejection (SSRF prevention) |
+| AC-4 | Delivery | Signed payload delivery within 30s of event |
+| AC-5 | Delivery | HMAC-SHA256 signature in X-TaskForge-Signature header |
+| AC-6 | Delivery | Idempotency key (delivery ID) stable across retries |
+| AC-7 | Delivery | Dual-signature during secret rotation window |
+| AC-8 | Retry | Exponential backoff on non-2xx / timeout |
+| AC-9 | Retry | Mark failed after exhausting 5 retries, store last error |
+| AC-10 | Retry | Auto-disable endpoint after 14-day failure streak |
+| AC-11 | Retry | Ping test on re-enable before resuming delivery |
+| AC-12 | Logs | Delivery log with full attempt details |
+| AC-13 | Logs | Delivery replay with new ID and fresh signature |
+| AC-14 | Logs | 30-day log retention cleanup |
+| AC-15 | Delivery | Envelope payload format for all event types |
+
+## Out of Scope
+
+- Webhook management UI (separate spec)
+- GraphQL subscriptions (different pattern, backlog)
+- Payload transformation per endpoint (v2 feature)
+- Rate limiting deliveries (future if abuse occurs)
+- Custom HTTP headers per endpoint (not requested)
+
+## Resolved Questions
+
+1. **Delivery guarantee** — At-least-once with idempotency keys (approved, 2026-03-13)
+   Considered: exactly-once (requires client tracking), at-most-once (data loss risk).
+
+2. **Queue backend** — PostgreSQL job queue (approved, 2026-03-13)
+   Considered: Redis+Bull (new dependency), RabbitMQ (overkill), SQS (vendor lock-in).
+
+3. **SSRF prevention** — URL validation at registration time (approved, 2026-03-13)
+   Considered: runtime DNS check (race condition), egress proxy (infra complexity).
+
+4. **Secret rotation** — Dual-secret window (approved, 2026-03-13)
+   Considered: single secret with downtime, versioned signatures (consumer complexity).
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/group-template.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/group-template.md
new file mode 100644
index 0000000..04097ec
--- /dev/null
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/group-template.md
@@ -0,0 +1,88 @@
+# Group File Template
+
+AC group files are the unit of work for parallel agents. Each group contains related acceptance criteria with full detail: EARS criterion text, Given/When/Then clarity, and inline examples.
+
+---
+
+```markdown
+---
+group: [A]
+name: [Human-Readable Group Name]
+criteria: [AC-1, AC-2, AC-3]
+status: pending
+owner: null
+depends_on: []
+files_owned:
+  - [src/path/to/file.py]
+  - [src/path/to/other.py]
+---
+
+## AC-1: [Short Title]
+
+[EARS-format criterion: When/If/While/Where pattern]
+
+→ Given [setup],
+  when [trigger],
+  then [expected result]
+
+Example:
+→ `[METHOD] [path]`
+  ```json
+  [request body]
+  ```
+→ `[STATUS]`
+  ```json
+  [response body]
+  ```
+
+[Additional notes if needed — edge cases, behavioral clarifications]
+
+## AC-2: [Short Title]
+
+[Next criterion with same structure...]
+
+---
+
+## AI Decisions
+
+[Post-implementation only. Filled by /build when the AI encounters decisions
+not covered by Constitution or spec.]
+
+| # | Decision | Choice | Reasoning |
+|---|----------|--------|-----------|
+```
+
+---
+
+## Frontmatter Schema
+
+| Field | Type | Required | Values |
+|-------|------|----------|--------|
+| `group` | string | yes | Letter identifier: A, B, C... |
+| `name` | string | yes | Human-readable group name |
+| `criteria` | list | yes | AC IDs in this group (e.g., `[AC-1, AC-2]`) |
+| `status` | enum | yes | `pending` \| `in_progress` \| `verified` |
+| `owner` | string/null | yes | Agent name during build, null when unassigned |
+| `depends_on` | list | yes | Group file stems this group depends on (e.g., `[a-registration]`) |
+| `files_owned` | list | yes | File paths this group exclusively owns |
+
+## File Ownership Rules
+
+- **One owner per file.** No two groups should list the same file in `files_owned`.
+- **Read vs write:** A group may READ files owned by other groups but must NOT MODIFY them.
+- **Shared file coordination:** If two groups genuinely need to modify the same file, either:
+  - Split responsibilities (Group A creates the file, Group B adds methods later)
+  - Note the coordination in both group files with clear method boundaries
+  - Assign the file to one group and have the other depend on it via `depends_on`
+
+## AC Writing Guidelines
+
+Each AC should include:
+
+1. **Title** — short, descriptive (`## AC-1: Webhook Endpoint Registration`)
+2. **EARS criterion** — structured requirement text using When/If/While/Where
+3. **Given/When/Then** — test-friendly format below the criterion
+4. **Example** — concrete I/O showing request and response (for API features) or input/output (for logic)
+5. **Notes** — edge cases or behavioral clarifications (optional)
+
+Use `references/ears-patterns.md` for EARS format guidance.
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/index-template.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/index-template.md
new file mode 100644
index 0000000..d004318
--- /dev/null
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/spec/references/index-template.md
@@ -0,0 +1,88 @@
+# index.md Template
+
+Standard template for the human-facing entry point of a spec package.
+
+---
+
+```markdown
+---
+feature: [Feature Name]
+domain: [domain-folder-name]
+status: planned
+approval: draft
+size: S | M | L
+last_updated: YYYY-MM-DD
+groups:
+  - a-[group-name]
+  - b-[group-name]
+---
+
+# [Feature Name]
+
+## Intent
+
+[2-3 sentences: What problem does this solve? Who has this problem? Why solve it now?]
+
+## Decisions
+
+### Needs Your Input
+
+[Genuine trade-offs where the human's judgment matters. Only decisions with multiple viable options.]
+
+| # | Question | Options | AI Recommendation |
+|---|----------|---------|-------------------|
+| D-1 | [what needs deciding] | [option A] / [option B] / [option C] | [recommended + why] |
+
+### Already Decided
+
+[Decisions the AI made because only one sane option exists, the Constitution specifies it, or the codebase establishes a clear pattern. Human can override any of these.]
+
+| # | Decision | Choice | Why |
+|---|----------|--------|-----|
+| D-N | [what was decided] | [chosen option] | [rationale] |
+
+## Acceptance Criteria
+
+[One-liner per AC. Enough for the human to check completeness, not detail.]
+
+| AC | Group | Summary |
+|----|-------|---------|
+| AC-1 | [group name] | [one-line description of what's tested] |
+
+## Out of Scope
+
+- [Non-goal 1 — prevents scope creep]
+- [Non-goal 2]
+
+## Resolved Questions
+
+[Decision trail from refinement. Populated during /spec refinement rounds.]
+
+1. **[Decision topic]** — [Chosen option] (approved, YYYY-MM-DD)
+   Considered: [alternatives]. Rationale: [why].
+```
+
+---
+
+## Frontmatter Schema
+
+| Field | Type | Required | Values |
+|-------|------|----------|--------|
+| `feature` | string | yes | Human-readable feature name |
+| `domain` | string | yes | Domain folder name (kebab-case) |
+| `status` | enum | yes | `planned` \| `partial` \| `implemented` |
+| `approval` | enum | yes | `draft` \| `approved` |
+| `size` | enum | yes | `S` (hours) \| `M` (1-2 days) \| `L` (3-5 days) |
+| `last_updated` | date | yes | YYYY-MM-DD format |
+| `groups` | list | yes | Ordered list of group file stems (without .md) |
+
+## Section Guidelines
+
+| Section | Human Reviews? | Guidelines |
+|---------|---------------|------------|
+| Intent | Yes | 2-3 sentences. Problem + audience + urgency. |
+| Decisions — Needs Input | Yes (primary focus) | Only genuine trade-offs. 2-4 options per decision. |
+| Decisions — Already Decided | Glances | AI explains choices. Human overrides if needed. |
+| AC Summary | Yes (completeness check) | One-liners only. No examples, no Given/When/Then. |
+| Out of Scope | Yes | Explicit non-goals. Prevents scope creep during build. |
+| Resolved Questions | No (reference) | Auto-populated from refinement session. |
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/specification-writing/SKILL.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/specification-writing/SKILL.md
deleted file mode 100644
index 39625e4..0000000
--- a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/specification-writing/SKILL.md
+++ /dev/null
@@ -1,327 +0,0 @@
----
-name: specification-writing
-description: >-
-  Teaches EARS requirement formats, Given/When/Then acceptance criteria,
-  and structured specification patterns for feature definitions. USE WHEN
-  the user asks to "write requirements", "use EARS format", "define
-  acceptance criteria", "write Given/When/Then scenarios", "create a
-  feature spec", "structure requirements", "write user stories", or works
-  with Gherkin syntax, FR/NFR numbering, and completeness checklists.
-  DO NOT USE for managing the spec lifecycle (create, refine, build,
-  review, update) — use the dedicated spec-* skills instead.
-version: 0.2.0
----
-
-# Specification Writing
-
-## Mental Model
-
-Specifications are **contracts between humans** -- between the person requesting a feature and the person building it. The goal is to eliminate ambiguity so that both parties agree on what "done" means before work begins.
-
-A specification is not prose. It's a structured document with testable claims. Every requirement should be verifiable: can you write a test (automated or manual) that proves the requirement is met? If you can't test it, it's not a requirement -- it's a wish.
-
-The most common source of project failure is not bad code but bad specifications. Specifically:
-- **Missing edge cases** -- "What happens when the list is empty?"
-- **Ambiguous language** -- "The system should respond quickly" (how quickly?)
-- **Implicit assumptions** -- "Users will authenticate" (how? OAuth? Password? SSO?)
-- **Missing error cases** -- "The system saves the file" (what if the disk is full?)
-
-Write specifications with a hostile reader in mind -- someone who will interpret every ambiguity in the worst possible way. If a requirement can be misunderstood, it will be.
-
----
-
-## Spec Sizing Guidelines
-
-Specifications are loaded into AI context windows with limited capacity. Design for consumption.
-
-**Recommended target:** ~200 lines per spec file. When a spec grows beyond that, consider splitting into sub-specs (one per sub-feature) with a concise overview linking them. Complex features may justify longer specs — completeness matters more than hitting a number.
-
-**Reference, don't reproduce:** Never inline source code, SQL DDL, Pydantic models, or TypeScript interfaces. Reference the file path and line range instead. The code is the source of truth — duplicated snippets go stale silently.
-
-**Structure for independent loading:** Each spec file must be useful on its own. Include: version, status, last-updated date, intent, key file paths, and acceptance criteria in every spec.
-
----
-
-## EARS Requirement Formats
-
-EARS (Easy Approach to Requirements Syntax) provides five templates that eliminate the most common ambiguities in natural-language requirements. Each template has a specific trigger pattern.
-
-### Ubiquitous
-
-Requirements that are always active, with no trigger condition:
-
-```
-The <system> shall <action>.
-```
-
-**Example:** The API shall return responses in JSON format.
-
-### Event-Driven
-
-Requirements triggered by a specific event:
-
-```
-When <event>, the <system> shall <action>.
-```
-
-**Example:** When a user submits a login form with invalid credentials, the system shall display an error message and increment the failed login counter.
-
-### State-Driven
-
-Requirements that apply while the system is in a specific state:
-
-```
-While <state>, the <system> shall <action>.
-```
-
-**Example:** While the system is in maintenance mode, the API shall return HTTP 503 for all non-health-check endpoints.
-
-### Unwanted Behavior
-
-Requirements for handling error conditions and edge cases:
-
-```
-If <condition>, then the <system> shall <action>.
-```
-
-**Example:** If the database connection pool is exhausted, then the system shall queue incoming requests for up to 30 seconds before returning HTTP 503.
-
-### Optional Feature
-
-Requirements that depend on a configurable feature:
-
-```
-Where <feature is enabled>, the <system> shall <action>.
-```
-
-**Example:** Where two-factor authentication is enabled, the system shall require a TOTP code after successful password verification.
-
-> **Deep dive:** See `references/ears-templates.md` for EARS format templates with filled examples for each pattern type.
-
----
-
-## Acceptance Criteria Patterns
-
-Acceptance criteria define when a requirement is satisfied. Use these patterns to write criteria that are directly testable.
-
-### Given/When/Then (Gherkin)
-
-The most structured pattern. Each scenario is a test case:
-
-```gherkin
-Feature: User Login
-
-  Scenario: Successful login with valid credentials
-    Given a registered user with email "alice@example.com"
-    And the user has a verified account
-    When the user submits the login form with correct credentials
-    Then the system returns a 200 response with an auth token
-    And the auth token expires in 24 hours
-
-  Scenario: Failed login with invalid password
-    Given a registered user with email "alice@example.com"
-    When the user submits the login form with an incorrect password
-    Then the system returns a 401 response
-    And the failed login attempt is logged
-    And the response does not reveal whether the email exists
-```
-
-**When to use:** Complex workflows with multiple actors, preconditions, or state transitions. Best for user-facing features.
-
-### Checklist
-
-A flat list of verifiable statements. Simpler than Gherkin but less precise:
-
-```markdown
-## Acceptance Criteria: Password Reset
-
-- [ ] User receives reset email within 60 seconds of request
-- [ ] Reset link expires after 1 hour
-- [ ] Reset link is single-use (invalidated after first use)
-- [ ] Password must meet strength requirements (min 12 chars, 1 uppercase, 1 number)
-- [ ] All existing sessions are invalidated after password change
-- [ ] User receives confirmation email after successful reset
-```
-
-**When to use:** Simpler features where the preconditions are obvious and each criterion is independent.
-
-### Table-Driven
-
-For requirements with multiple input/output combinations:
-
-```markdown
-## Discount Rules
-
-| Customer Type | Order Total | Discount | Notes |
-|---------------|-------------|----------|-------|
-| Standard      | < $50       | 0%       | |
-| Standard      | >= $50      | 5%       | |
-| Premium       | < $50       | 5%       | Minimum premium discount |
-| Premium       | >= $50      | 10%     | |
-| Premium       | >= $200     | 15%     | Max discount cap |
-| Employee      | any         | 25%      | Requires valid employee ID |
-```
-
-**When to use:** Business rules with multiple conditions and outcomes. The table format makes gaps and overlaps visible.
-
-> **Deep dive:** See `references/criteria-patterns.md` for acceptance criteria examples across different domains.
-
----
-
-## Specification Structure
-
-A complete specification follows this structure. Not every section is needed for every feature -- scale the document to the complexity.
-
-Every spec file starts with metadata:
-
-```
-# Feature: [Name]
-**Domain:** [domain-name]
-**Status:** implemented | partial | planned
-**Last Updated:** YYYY-MM-DD
-**Approval:** draft | user-approved
-```
-
-Status tells you whether to trust it, version tells you where it belongs, last-updated tells you when it was last verified. Approval tells you whether decisions in the spec have been explicitly validated by the user (`user-approved`) or are AI-generated hypotheses (`draft`).
-
-### 1. Problem Statement
-What problem does this feature solve? Who has this problem? What's the cost of not solving it? (2-3 sentences)
-
-### 2. Scope
-What's in scope and what's explicitly out of scope? Out-of-scope items prevent scope creep.
-
-```markdown
-## Scope
-
-**In scope:**
-- User-initiated password reset via email
-- Password strength validation
-- Session invalidation on reset
-
-**Out of scope:**
-- Admin-initiated password reset (separate spec)
-- Password expiration policies
-- Account recovery without email access
-```
-
-### 3. User Stories
-Who are the actors and what do they want to achieve?
-
-```markdown
-As a [registered user], I want to [reset my password via email]
-so that [I can regain access to my account when I forget my password].
-
-As a [security admin], I want to [see password reset audit logs]
-so that [I can detect suspicious reset patterns].
-```
-
-### 4. Functional Requirements
-Use EARS format. Number each requirement for traceability:
-
-```markdown
-- FR-1 [assumed]: When a user requests a password reset, the system shall send a reset email
-  to the registered email address within 60 seconds.
-- FR-2 [assumed]: The reset link shall contain a cryptographically random token (min 32 bytes).
-- FR-3 [assumed]: If the reset token is expired or already used, then the system shall display
-  an error message and offer to send a new reset email.
-
-Tag each requirement `[assumed]` when first written. Requirements become `[user-approved]` only after explicit user validation via `/spec-refine`.
-```
-
-### 5. Non-Functional Requirements
-Performance, security, scalability, accessibility:
-
-```markdown
-- NFR-1 [assumed]: The password reset endpoint shall respond within 200ms (p95).
-- NFR-2 [assumed]: Reset tokens shall be stored as bcrypt hashes, not plaintext.
-- NFR-3 [assumed]: The reset flow shall be accessible with screen readers (WCAG 2.1 AA).
-```
-
-### 6. Edge Cases
-The cases nobody thinks about until they happen:
-
-```markdown
-- What if the user requests multiple resets before using any link?
-  → Only the most recent token is valid; previous tokens are invalidated.
-- What if the email is associated with multiple accounts?
-  → Send separate reset links for each account.
-- What if the user's email provider is down?
-  → The system logs the failure and retries up to 3 times over 5 minutes.
-```
-
-### 7. Out of Scope
-Explicit non-goals to prevent scope creep (can reference the Scope section or expand here).
-
-### 8. Resolved Questions
-Decisions explicitly approved by the user via `/spec-refine`. Each entry: decision topic, chosen option, options considered, date, brief rationale. This section starts empty and is populated during the refinement process.
-
-### 9. Key Files
-Source files most relevant to this feature — paths an implementer should read.
-
-### 10. Implementation Notes
-Post-implementation only. Capture deviations from the original spec — what changed and why.
-
-### 11. Discrepancies
-Gaps between spec intent and actual build. Prevents the next session from re-planning decided work.
-
----
-
-## Completeness Checklist
-
-Before marking a specification as ready for implementation, verify:
-
-**Happy path:**
-- [ ] Primary use case described with acceptance criteria
-- [ ] All actors identified (user, admin, system, external service)
-- [ ] Success response/outcome defined
-
-**Error cases:**
-- [ ] Invalid input handled (empty, too long, wrong type, malicious)
-- [ ] External service failures handled (timeout, 500, unavailable)
-- [ ] Concurrent access conflicts addressed
-- [ ] Rate limiting defined for public-facing endpoints
-
-**Boundary conditions:**
-- [ ] Empty collections (zero items)
-- [ ] Maximum limits defined (max file size, max items, max length)
-- [ ] Pagination for unbounded lists
-- [ ] Time zones and date boundaries
-
-**Performance:**
-- [ ] Response time targets (p50, p95, p99)
-- [ ] Throughput requirements (requests per second)
-- [ ] Data volume expectations (rows, storage size)
-
-**Security:**
-- [ ] Authentication required? Which methods?
-- [ ] Authorization rules per role
-- [ ] Data sensitivity classification
-- [ ] Audit logging requirements
-
-**Accessibility:**
-- [ ] WCAG compliance level specified
-- [ ] Keyboard navigation requirements
-- [ ] Screen reader compatibility
-
----
-
-## Ambiguity Policy
-
-These defaults apply when the user does not specify a preference. State the assumption when making a choice:
-
-- **Format:** Default to EARS format for requirements and Given/When/Then for acceptance criteria. Use checklists for simple features with obvious preconditions.
-- **Detail level:** Default to enough detail that a developer unfamiliar with the codebase could implement the feature without asking clarifying questions.
-- **Non-functional requirements:** Always include response time targets (default: 200ms p95 for API endpoints, 3s for page loads) and note when these are assumptions.
-- **Edge cases:** Always include at least: empty input, maximum input, concurrent access, and external service failure.
-- **Out of scope:** Always include an out-of-scope section, even if brief, to establish boundaries.
-- **Numbering:** Number all requirements (FR-1, NFR-1) for traceability in code reviews and tests.
-- **Approval markers:** All requirements start as `[assumed]`. Only `/spec-refine` with explicit user validation upgrades them to `[user-approved]`. Spec-level `**Approval:**` starts as `draft` and becomes `user-approved` only when all requirements are `[user-approved]`.
-
----
-
-## Reference Files
-
-| File | Contents |
-|------|----------|
-| `references/ears-templates.md` | EARS format templates with filled examples for each pattern type, including compound requirements and requirement hierarchies |
-| `references/criteria-patterns.md` | Acceptance criteria examples organized by domain: authentication, payments, file upload, search, notifications, and data import |
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/specification-writing/references/criteria-patterns.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/specification-writing/references/criteria-patterns.md
deleted file mode 100644
index 1be3344..0000000
--- a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/specification-writing/references/criteria-patterns.md
+++ /dev/null
@@ -1,245 +0,0 @@
-# Acceptance Criteria Patterns by Domain
-
-Examples of acceptance criteria organized by common feature domains.
-
-## Contents
-
-- [Authentication](#authentication)
-- [Payments](#payments)
-- [File Upload](#file-upload)
-- [Search](#search)
-- [Notifications](#notifications)
-- [Data Import](#data-import)
-- [Cross-Domain Edge Cases](#cross-domain-edge-cases)
-
----
-
-## Authentication
-
-### Login Flow
-
-```gherkin
-Feature: User Login
-
-  Scenario: Successful login with email and password
-    Given a verified user with email "alice@example.com"
-    When the user submits the login form with correct credentials
-    Then the system returns a 200 response with an auth token
-    And the token expires in 24 hours
-    And a "login_success" event is logged with the user ID
-
-  Scenario: Login with unverified email
-    Given a user with email "bob@example.com" who has not verified their email
-    When the user submits the login form with correct credentials
-    Then the system returns a 403 response
-    And the response body contains "Please verify your email address"
-    And a new verification email is sent
-
-  Scenario: Login with invalid credentials
-    Given no user exists with email "unknown@example.com"
-    When the user submits the login form with email "unknown@example.com"
-    Then the system returns a 401 response
-    And the response body contains "Invalid email or password"
-    And the response time is similar to a valid-email failure (timing-safe)
-
-  Scenario: Account lockout after repeated failures
-    Given a user with email "alice@example.com"
-    When the user submits 5 incorrect passwords within 10 minutes
-    Then the account is locked for 15 minutes
-    And subsequent login attempts return "Account temporarily locked"
-    And a security alert email is sent to the user
-```
-
-### Password Reset
-
-**Checklist format:**
-
-- [ ] Reset email sent within 60 seconds of request
-- [ ] Reset token is a cryptographically random 32-byte value
-- [ ] Token expires after 1 hour
-- [ ] Token is single-use (invalidated after first use)
-- [ ] Using an expired or used token shows "This link has expired" with option to request a new one
-- [ ] New password must meet strength requirements
-- [ ] All existing sessions invalidated after successful reset
-- [ ] Reset request for non-existent email returns success (no enumeration)
-- [ ] Rate limited to 3 reset requests per email per hour
-
----
-
-## Payments
-
-### Checkout Flow
-
-```gherkin
-Feature: Order Checkout
-
-  Scenario: Successful payment
-    Given a cart with items totaling $49.99
-    And the user has a valid payment method on file
-    When the user confirms the checkout
-    Then the payment is authorized for $49.99
-    And the order status changes to "confirmed"
-    And a confirmation email is sent with the order details
-    And inventory is decremented for each item
-
-  Scenario: Payment declined
-    Given a cart with items totaling $49.99
-    When the payment gateway returns "card_declined"
-    Then the order status remains "pending"
-    And the user sees "Your card was declined. Please try another payment method."
-    And inventory is NOT decremented
-    And no confirmation email is sent
-
-  Scenario: Payment gateway timeout
-    Given a cart with items totaling $49.99
-    When the payment gateway does not respond within 10 seconds
-    Then the system retries once after 3 seconds
-    And if the retry also fails, shows "Payment processing is delayed"
-    And the order enters "payment_pending" status
-    And a background job checks payment status every 60 seconds for 30 minutes
-```
-
-### Discount Rules
-
-**Table-driven format:**
-
-| Customer Type | Order Total | Coupon | Expected Discount | Final Price |
-|---------------|-------------|--------|-------------------|-------------|
-| Standard      | $30.00      | None   | 0%                | $30.00      |
-| Standard      | $30.00      | SAVE10 | 10%               | $27.00      |
-| Premium       | $30.00      | None   | 5%                | $28.50      |
-| Premium       | $30.00      | SAVE10 | 10% (higher wins) | $27.00      |
-| Premium       | $100.00     | SAVE10 | 15% (premium tier) | $85.00     |
-| Any           | $0.00       | SAVE10 | 0%                | $0.00       |
-| Standard      | $30.00      | EXPIRED| 0% + error shown  | $30.00      |
-
----
-
-## File Upload
-
-### Image Upload
-
-```gherkin
-Feature: Profile Image Upload
-
-  Scenario: Successful image upload
-    Given the user is on the profile settings page
-    When the user uploads a valid JPEG image under 5MB
-    Then the image is resized to 256x256 pixels
-    And the image is stored in the CDN
-    And the user's profile displays the new image within 5 seconds
-
-  Scenario: File too large
-    When the user uploads an image larger than 5MB
-    Then the upload is rejected before the file is fully transferred
-    And the error message reads "Image must be under 5MB. Your file is [X]MB."
-
-  Scenario: Invalid file type
-    When the user uploads a .exe file renamed to .jpg
-    Then the system validates the file's MIME type (not just extension)
-    And rejects the upload with "Supported formats: JPEG, PNG, WebP"
-
-  Scenario: Concurrent uploads
-    When the user uploads two images simultaneously
-    Then only the last uploaded image is saved as the profile picture
-    And both uploads complete without errors
-```
-
----
-
-## Search
-
-### Full-Text Search
-
-**Checklist format:**
-
-- [ ] Empty search query returns validation error, not all results
-- [ ] Search results appear within 500ms for queries across 1M documents
-- [ ] Results are ranked by relevance (BM25 or equivalent)
-- [ ] Search highlights matching terms in results with `<mark>` tags
-- [ ] Queries with no results show "No results found" with spelling suggestions
-- [ ] Special characters in queries are escaped (no injection)
-- [ ] Results are paginated with 20 items per page
-- [ ] Search query is preserved in the URL for shareability
-- [ ] Minimum query length: 2 characters
-- [ ] Maximum query length: 200 characters
-
----
-
-## Notifications
-
-### Email Notifications
-
-```gherkin
-Feature: Notification Preferences
-
-  Scenario: User opts out of marketing emails
-    Given a user subscribed to all notification types
-    When the user unchecks "Marketing updates" in notification preferences
-    Then marketing emails stop within 24 hours
-    And transactional emails (receipts, password resets) continue normally
-    And the preference change is logged for compliance
-
-  Scenario: Notification delivery failure
-    Given a notification is queued for delivery
-    When the email provider returns a 5xx error
-    Then the system retries after 1 minute, 5 minutes, and 30 minutes
-    And after 3 failures, marks the notification as "failed"
-    And does NOT send further retries for this notification
-    And the failure is recorded in the admin dashboard
-```
-
----
-
-## Data Import
-
-### CSV Import
-
-```gherkin
-Feature: User Data Import
-
-  Scenario: Valid CSV import
-    Given an admin uploads a CSV with 500 valid user records
-    When the import is processed
-    Then all 500 users are created with correct field mapping
-    And the admin sees a summary: "500 created, 0 skipped, 0 errors"
-    And each user receives a welcome email
-
-  Scenario: CSV with validation errors
-    Given a CSV where row 3 has an invalid email and row 7 has a duplicate email
-    When the import is processed
-    Then valid rows (498) are imported successfully
-    And invalid rows are skipped with error details:
-      | Row | Field | Error |
-      | 3   | email | "not.valid" is not a valid email format |
-      | 7   | email | "alice@example.com" already exists |
-    And the admin can download an error report CSV
-
-  Scenario: Large file import
-    Given a CSV with 100,000 records
-    When the import is initiated
-    Then the import runs asynchronously (not blocking the UI)
-    And the admin sees a progress indicator
-    And the import completes within 5 minutes
-    And the system sends an email when import finishes
-```
-
----
-
-## Cross-Domain Edge Cases
-
-These edge cases apply to most features and should be checked:
-
-```markdown
-## Universal Edge Cases
-
-- [ ] Empty input: What happens when required fields are blank?
-- [ ] Maximum length: What happens at the field's max length? At max + 1?
-- [ ] Unicode: Does the feature handle emoji, CJK characters, RTL text?
-- [ ] Concurrent access: What if two users edit the same resource simultaneously?
-- [ ] Network interruption: What if connectivity drops mid-operation?
-- [ ] Timezone: Do date-dependent features work correctly across timezones?
-- [ ] Pagination boundary: What happens when viewing the last page as items are deleted?
-- [ ] Authorization: Can the feature be accessed without authentication? With wrong role?
-- [ ] Idempotency: What happens if the same request is sent twice?
-```
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/specification-writing/references/ears-templates.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/specification-writing/references/ears-templates.md
deleted file mode 100644
index af049be..0000000
--- a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/specification-writing/references/ears-templates.md
+++ /dev/null
@@ -1,239 +0,0 @@
-# EARS Requirement Templates
-
-Templates and filled examples for each EARS (Easy Approach to Requirements Syntax) pattern type.
-
-## Contents
-
-- [Ubiquitous Requirements](#ubiquitous-requirements)
-- [Event-Driven Requirements](#event-driven-requirements)
-- [State-Driven Requirements](#state-driven-requirements)
-- [Unwanted Behavior Requirements](#unwanted-behavior-requirements)
-- [Optional Feature Requirements](#optional-feature-requirements)
-- [Compound Requirements](#compound-requirements)
-- [Writing Tips](#writing-tips)
-
----
-
-## Ubiquitous Requirements
-
-**Template:**
-```
-The <system> shall <action>.
-```
-
-Requirements that are always active, with no trigger condition. These define invariant behaviors.
-
-### Examples
-
-```
-The API shall return responses in JSON format with UTF-8 encoding.
-
-The system shall log all authentication events with timestamp, user ID, and outcome.
-
-The application shall enforce HTTPS for all client-server communication.
-
-The database shall store timestamps in UTC.
-
-The API shall include a request-id header in every response.
-```
-
-### Anti-patterns
-
-```
-❌ The system should be fast.
-   → Not testable. How fast? Measured how?
-
-❌ The system shall be user-friendly.
-   → Not testable. Define specific interaction requirements.
-
-✅ The API shall respond to health check requests within 50ms (p99).
-✅ The login form shall support keyboard navigation (tab order: email → password → submit).
-```
-
----
-
-## Event-Driven Requirements
-
-**Template:**
-```
-When <event>, the <system> shall <action>.
-```
-
-Requirements triggered by a specific, detectable event. The event is the precondition.
-
-### Examples
-
-```
-When a user submits a registration form, the system shall validate all fields
-and return validation errors within 200ms.
-
-When a payment transaction fails, the system shall:
-  1. Log the failure with transaction ID, error code, and timestamp.
-  2. Send a failure notification to the user within 60 seconds.
-  3. Release the reserved inventory.
-
-When a file upload exceeds 50MB, the system shall reject the upload with
-HTTP 413 and a message indicating the maximum file size.
-
-When a user's session has been inactive for 30 minutes, the system shall
-invalidate the session and redirect to the login page.
-
-When the system receives a webhook event with an unrecognized event type,
-the system shall log the event payload and return HTTP 200 (acknowledge but ignore).
-```
-
-### Anti-patterns
-
-```
-❌ When the user does something wrong, show an error.
-   → What action? What error? How displayed?
-
-✅ When the user submits a form with an invalid email format,
-   the system shall display an inline error message below the email field
-   stating "Please enter a valid email address".
-```
-
----
-
-## State-Driven Requirements
-
-**Template:**
-```
-While <state>, the <system> shall <action>.
-```
-
-Requirements that apply continuously while the system is in a specific state.
-
-### Examples
-
-```
-While the system is in maintenance mode, the API shall return HTTP 503
-with a "Retry-After" header for all endpoints except /health.
-
-While a user account is locked, the system shall reject all login attempts
-and display a message with the unlock time.
-
-While the message queue depth exceeds 10,000 messages, the system shall
-activate the secondary consumer group.
-
-While the database is performing a backup, the system shall serve read
-requests from the read replica and queue write requests.
-
-While the system is operating in degraded mode, the dashboard shall display
-a banner indicating limited functionality and estimated recovery time.
-```
-
----
-
-## Unwanted Behavior Requirements
-
-**Template:**
-```
-If <unwanted condition>, then the <system> shall <action>.
-```
-
-Requirements for handling errors, failures, and edge cases. These cover what happens when things go wrong.
-
-### Examples
-
-```
-If the external payment gateway does not respond within 5 seconds,
-then the system shall retry once after 2 seconds, and if the retry
-also fails, return a "payment processing delayed" message to the user.
-
-If the database connection pool is exhausted, then the system shall
-queue incoming requests for up to 30 seconds before returning HTTP 503.
-
-If a user attempts to access a resource they do not own, then the system
-shall return HTTP 403, log the access attempt with the user ID and resource ID,
-and increment the security audit counter.
-
-If the uploaded file contains an unsupported MIME type, then the system shall
-reject the file with a message listing the supported types.
-
-If the disk usage exceeds 90%, then the system shall send an alert to the
-operations team and begin purging temporary files older than 24 hours.
-```
-
----
-
-## Optional Feature Requirements
-
-**Template:**
-```
-Where <feature is enabled>, the <system> shall <action>.
-```
-
-Requirements that depend on a configurable feature flag or setting.
-
-### Examples
-
-```
-Where two-factor authentication is enabled, the system shall require
-a TOTP code after successful password verification.
-
-Where the audit log feature is enabled, the system shall record all
-CRUD operations with the actor, action, resource, and timestamp.
-
-Where dark mode is enabled, the system shall render all pages using
-the dark color palette defined in the theme configuration.
-
-Where rate limiting is configured, the system shall enforce the configured
-request limit per API key per minute and return HTTP 429 when exceeded.
-
-Where email notifications are enabled for a user, the system shall send
-a daily digest of unread notifications at the user's configured time.
-```
-
----
-
-## Compound Requirements
-
-Complex requirements often combine multiple EARS patterns:
-
-### Event + Unwanted Behavior
-
-```
-When a user submits a password reset request:
-  - The system shall send a reset email within 60 seconds.
-  - If the email address is not associated with an account, then the system
-    shall still return a success message (to prevent email enumeration).
-  - If the email service is unavailable, then the system shall queue the
-    email for retry and inform the user that the email may be delayed.
-```
-
-### State + Event
-
-```
-While the system is in read-only mode:
-  - When a user attempts a write operation, the system shall return HTTP 503
-    with a message indicating when write access will be restored.
-  - When an admin issues a "restore write access" command, the system shall
-    exit read-only mode and process any queued write operations in order.
-```
-
-### Requirement Hierarchies
-
-For complex features, use parent-child numbering:
-
-```
-FR-1: User Registration
-  FR-1.1: When a user submits the registration form, the system shall
-          create an account and send a verification email.
-  FR-1.2: If the email is already registered, then the system shall
-          display "An account with this email already exists".
-  FR-1.3: The system shall require passwords of at least 12 characters
-          with at least one uppercase letter and one digit.
-  FR-1.4: Where CAPTCHA is enabled, the registration form shall include
-          a CAPTCHA challenge before submission.
-```
-
----
-
-## Writing Tips
-
-1. **One requirement per statement.** Don't combine multiple behaviors in one sentence.
-2. **Use "shall" for requirements, "should" for recommendations, "may" for optional.** This is standard requirement language (RFC 2119).
-3. **Be specific about quantities.** Not "quickly" but "within 200ms". Not "many" but "up to 1000".
-4. **Name the actor.** "The system shall..." or "The user shall..." -- never the passive "It should be done".
-5. **State the observable behavior.** Requirements describe what the system does, not how it does it internally.
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/specs/SKILL.md b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/specs/SKILL.md
new file mode 100644
index 0000000..c1efa61
--- /dev/null
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/spec-workflow/skills/specs/SKILL.md
@@ -0,0 +1,115 @@
+---
+name: specs
+description: >-
+  Dashboard showing the health and status of all spec packages in the project.
+  Scans .specs/ directories, parses frontmatter, and reports on spec status,
+  staleness, draft specs, and unresolved AI decisions. USE WHEN the user asks
+  to "check specs", "show spec status", "spec dashboard", "which specs are
+  stale", "audit specs", "list specs", "spec health", or works with .specs/
+  directory overview. DO NOT USE for creating specs (use /spec) or building
+  specs (use /build).
+version: 1.0.0
+argument-hint: ""
+---
+
+# Spec Dashboard
+
+## Mental Model
+
+A quick health check across all specs in the project. Scans `.specs/` directories, parses `index.md` frontmatter from each spec package, and presents a summary. No modifications — read-only.
+
+---
+
+## Workflow
+
+### Step 1: Scan
+
+```
+Glob: .specs/**/index.md
+```
+
+For each `index.md` found, parse the YAML frontmatter.
+
+Also check:
+- `.specs/CONSTITUTION.md` — exists and is populated (not just template)?
+- `.specs/BACKLOG.md` — exists?
+
+### Step 2: Collect Metrics
+
+For each spec package, extract from frontmatter:
+- `feature` — name
+- `domain` — domain folder
+- `status` — planned / partial / implemented
+- `approval` — draft / approved
+- `size` — S / M / L
+- `last_updated` — date
+- `groups` — count
+
+Also scan group files for:
+- AC marker counts: `[ ]`, `[~]`, `[x]`
+- AI Decision count (rows in `## AI Decisions` tables)
+- Unresolved AI decisions (no `User Verdict` entry)
+
+### Step 3: Present Dashboard
+
+```
+## Spec Dashboard
+
+**Project:** {project name from Constitution or directory}
+**Constitution:** {populated / template-only / missing}
+**Backlog:** {N ideas / missing}
+**Total Specs:** {count}
+
+### By Status
+
+| Status | Count | Specs |
+|--------|-------|-------|
+| Planned | N | feature-a, feature-b |
+| Partial | N | feature-c |
+| Implemented | N | feature-d, feature-e |
+
+### Attention Needed
+
+| Spec | Issue | Action |
+|------|-------|--------|
+| feature-x | Draft — not approved | Run `/spec feature-x` to refine |
+| feature-y | Stale — last updated 45 days ago | Review and update |
+| feature-z | 3 unresolved AI decisions | Review AI decisions in group files |
+| feature-w | Partial — 2/5 ACs unverified | Resume `/build feature-w` |
+
+### Summary
+
+- {N} specs ready to build (planned + approved)
+- {N} specs in progress (partial)
+- {N} specs complete (implemented)
+- {N} AI decisions awaiting review
+```
+
+### Step 4: Recommendations
+
+Based on findings, suggest actions:
+
+- **No Constitution:** "Run `/spec constitution` to capture project-level decisions."
+- **Draft specs:** "Run `/spec {feature}` to refine and approve."
+- **Stale specs:** "Review — specs not updated in 30+ days may be outdated."
+- **Partial builds:** "Resume `/build {feature}` to complete implementation."
+- **Unresolved AI decisions:** "Review and approve, override, or promote to Constitution."
+
+---
+
+## Staleness Rules
+
+| Condition | Stale? |
+|-----------|--------|
+| `status: implemented`, any age | No — completed specs don't go stale |
+| `status: planned`, `last_updated` > 30 days | Yes — may be abandoned |
+| `status: partial`, `last_updated` > 14 days | Yes — build may be stuck |
+| `approval: draft`, `last_updated` > 7 days | Yes — needs refinement |
+
+---
+
+## Ambiguity Policy
+
+- If `.specs/` doesn't exist: "No specs found. Run `/spec {feature}` to create your first spec package."
+- If spec packages have malformed frontmatter: report the error, skip the spec, continue scanning
+- If a spec directory lacks `context.md` or `groups/`: flag as incomplete structure
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/workspace-scope-guard/scripts/guard-workspace-scope.py b/container/.devcontainer/plugins/devs-marketplace/plugins/workspace-scope-guard/scripts/guard-workspace-scope.py
index 45d20a8..d33b9fa 100755
--- a/container/.devcontainer/plugins/devs-marketplace/plugins/workspace-scope-guard/scripts/guard-workspace-scope.py
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/workspace-scope-guard/scripts/guard-workspace-scope.py
@@ -184,6 +184,37 @@ def resolve_scope_root(cwd: str) -> str:
     return cwd
 
 
+def get_stable_scope_root(session_id: str | None) -> str:
+    """Return the persisted scope root, computing and caching on first call.
+
+    Uses a session-scoped temp file so the scope root survives CWD drift
+    caused by Bash ``cd`` commands within the session.
+    """
+    tmp_path = f"/tmp/claude-scope-root-{session_id}" if session_id else None
+
+    if tmp_path:
+        try:
+            with open(tmp_path, "r") as f:
+                cached = f.read().strip()
+            if cached:
+                return cached
+        except FileNotFoundError:
+            pass
+
+    # First invocation (or no session_id): compute from actual CWD
+    cwd = os.path.realpath(os.getcwd())
+    scope_root = resolve_scope_root(cwd)
+
+    if tmp_path:
+        try:
+            with open(tmp_path, "w") as f:
+                f.write(scope_root)
+        except OSError:
+            pass  # Best-effort; fall back to computed value
+
+    return scope_root
+
+
 def get_target_path(tool_name: str, tool_input: dict) -> str | None:
     """Extract the target path from tool input.
 
@@ -321,6 +352,8 @@ def check_bash_scope(command: str, cwd: str) -> None:
 
         if not skip_layer1:
             for target, resolved in resolved_targets:
+                if any(resolved.startswith(sp) for sp in SYSTEM_PATH_PREFIXES):
+                    continue
                 if not is_in_scope(resolved, cwd) and not is_allowlisted(resolved):
                     detail = f" (resolved: {resolved})" if resolved != target else ""
                     print(
@@ -353,11 +386,10 @@ def main():
         input_data = json.load(sys.stdin)
         tool_name = input_data.get("tool_name", "")
         tool_input = input_data.get("tool_input", {})
+        session_id = input_data.get("session_id")
 
-        # Resolve CWD with realpath for consistent comparison with resolved targets
-        cwd = os.path.realpath(os.getcwd())
-        # Expand scope to project root when running inside a worktree
-        scope_root = resolve_scope_root(cwd)
+        # Use persisted scope root to prevent CWD drift from Bash cd
+        scope_root = get_stable_scope_root(session_id)
 
         # --- Bash tool: separate code path ---
         if tool_name == "Bash":
@@ -404,11 +436,7 @@ def main():
 
         # Out of scope — BLOCK for ALL tools
         detail = f" (resolved: {resolved})" if resolved != target_path else ""
-        scope_info = (
-            f"scope root ({scope_root})"
-            if scope_root != cwd
-            else f"working directory ({scope_root})"
-        )
+        scope_info = f"scope root ({scope_root})"
         print(
             f"Blocked: {tool_name} targets '{target_path}'{detail} which is outside "
             f"the {scope_info}. Move to that project's directory "
diff --git a/container/.devcontainer/plugins/devs-marketplace/plugins/workspace-scope-guard/scripts/inject-workspace-cwd.py b/container/.devcontainer/plugins/devs-marketplace/plugins/workspace-scope-guard/scripts/inject-workspace-cwd.py
index 1c3730d..6c1d3bd 100644
--- a/container/.devcontainer/plugins/devs-marketplace/plugins/workspace-scope-guard/scripts/inject-workspace-cwd.py
+++ b/container/.devcontainer/plugins/devs-marketplace/plugins/workspace-scope-guard/scripts/inject-workspace-cwd.py
@@ -49,6 +49,37 @@ def resolve_scope_root(cwd: str) -> str:
     return cwd
 
 
+def get_stable_scope_root(session_id: str | None) -> str:
+    """Return the persisted scope root, computing and caching on first call.
+
+    Shares the same temp file as guard-workspace-scope.py so both scripts
+    agree on the scope root for a given session.
+    """
+    tmp_path = f"/tmp/claude-scope-root-{session_id}" if session_id else None
+
+    if tmp_path:
+        try:
+            with open(tmp_path, "r") as f:
+                cached = f.read().strip()
+            if cached:
+                return cached
+        except FileNotFoundError:
+            pass
+
+    # First invocation (or no session_id): compute from actual CWD
+    cwd = os.path.realpath(os.getcwd())
+    scope_root = resolve_scope_root(cwd)
+
+    if tmp_path:
+        try:
+            with open(tmp_path, "w") as f:
+                f.write(scope_root)
+        except OSError:
+            pass  # Best-effort; fall back to computed value
+
+    return scope_root
+
+
 def main():
     cwd = os.path.realpath(os.getcwd())
     try:
@@ -56,13 +87,15 @@ def main():
         # Some hook events provide cwd override
         cwd = os.path.realpath(input_data.get("cwd", cwd))
         hook_event = input_data.get("hook_event_name", "PreToolUse")
+        session_id = input_data.get("session_id")
     except (json.JSONDecodeError, ValueError):
         hook_event = "PreToolUse"
+        session_id = None
 
-    scope_root = resolve_scope_root(cwd)
+    scope_root = get_stable_scope_root(session_id)
 
     context = (
-        f"Working Directory: {cwd} — restrict all file operations to this directory unless explicitly instructed otherwise.\n"
+        f"Working Directory: {scope_root} — restrict all file operations to this directory unless explicitly instructed otherwise.\n"
         f"All file operations and commands MUST target paths within {scope_root}. "
         f"Do not read, write, or execute commands against paths outside this directory."
     )
diff --git a/container/.devcontainer/scripts/setup-aliases.sh b/container/.devcontainer/scripts/setup-aliases.sh
index c20fac8..f19f3b5 100755
--- a/container/.devcontainer/scripts/setup-aliases.sh
+++ b/container/.devcontainer/scripts/setup-aliases.sh
@@ -104,7 +104,7 @@ cc-tools() {
   printf "  %-20s %s\n" "COMMAND" "STATUS"
   echo "  ────────────────────────────────────"
   for cmd in claude cc ccw ccraw cc-orc codeforge ccusage ccburn claude-monitor \\
-             ccms ct cargo ruff biome dprint shfmt shellcheck hadolint \\
+             ct cargo ruff biome dprint shfmt shellcheck hadolint \\
              ast-grep tree-sitter pyright typescript-language-server \\
              agent-browser gh docker git jq tmux bun go infocmp; do
     if command -v "\$cmd" >/dev/null 2>&1; then
diff --git a/docs/src/content/docs/customization/rules.md b/docs/src/content/docs/customization/rules.md
index bd80b17..42c1d39 100644
--- a/docs/src/content/docs/customization/rules.md
+++ b/docs/src/content/docs/customization/rules.md
@@ -40,13 +40,13 @@ This rule works in concert with the [Workspace Scope Guard](../plugins/workspace
 
 ### Spec Workflow (`spec-workflow.md`)
 
-Mandates specification-driven development. Every non-trivial feature requires a spec before implementation begins, and every implementation ends with an as-built spec update.
+Mandates specification-driven development using directory-based spec packages. Every non-trivial feature requires a spec before implementation begins, and every implementation ends with automated spec closure.
 
 Key requirements:
-- Use `/spec-new` to create specs from the standard template
-- Use `/spec-update` after implementation to close the loop
-- Specs live in `.specs/` organized by domain folders
-- Run `/spec-check` before starting new milestones
+- Use `/spec` to create, refine, and approve spec packages
+- Use `/build` to implement from spec — includes review and closure
+- Specs live in `.specs/` as directory packages organized by domain
+- Use `/specs` to check spec health across the project
 
 ### Session Search (`session-search.md`)
 
diff --git a/docs/src/content/docs/features/agents.md b/docs/src/content/docs/features/agents.md
index 75abb67..c98693d 100644
--- a/docs/src/content/docs/features/agents.md
+++ b/docs/src/content/docs/features/agents.md
@@ -124,14 +124,14 @@ A technical writing specialist that creates and maintains README files, API docu
 
 ### documenter
 
-<span class="badge badge--orange">Full</span> <span class="badge badge--blue">Opus</span> <span class="badge badge--purple">documentation-patterns</span> <span class="badge badge--purple">specification-writing</span>
+<span class="badge badge--orange">Full</span> <span class="badge badge--blue">Opus</span> <span class="badge badge--purple">documentation-patterns</span> <span class="badge badge--purple">spec</span> <span class="badge badge--purple">build</span> <span class="badge badge--purple">specs</span>
 
-A documentation and specification lifecycle agent. Handles READMEs, API docs, inline documentation, and architectural guides alongside the full spec workflow — creating, refining, reviewing, and closing specifications. Carries 7 frontloaded skills covering both documentation patterns and all spec operations. Unlike doc-writer, the documenter works directly (no worktree isolation) and owns the spec lifecycle. Never modifies source code logic.
+A documentation and specification lifecycle agent. Handles READMEs, API docs, inline documentation, and architectural guides alongside the full spec workflow — creating, refining, and building from spec packages. Carries 4 frontloaded skills covering documentation patterns and all 3 spec operations. Unlike doc-writer, the documenter works directly (no worktree isolation) and owns the spec lifecycle. Never modifies source code logic.
 
 **When activated:** "Document this module," "write a README," "create a spec and document the feature," specification lifecycle tasks that combine docs and specs.
 
 :::tip[documenter vs doc-writer]
-Use **doc-writer** for pure documentation tasks (READMEs, docstrings, API docs) where worktree isolation is preferred. Use **documenter** when documentation and specification work are interleaved — it has the full spec skill set (spec-new, spec-refine, spec-review, spec-update, spec-check) built in.
+Use **doc-writer** for pure documentation tasks (READMEs, docstrings, API docs) where worktree isolation is preferred. Use **documenter** when documentation and specification work are interleaved — it has the full spec skill set (`/spec`, `/build`, `/specs`) built in.
 :::
 
 ### explorer
@@ -172,7 +172,7 @@ A git history forensics specialist that traces code evolution, finds when bugs w
 
 ### implementer
 
-<span class="badge badge--orange">Full (worktree)</span> <span class="badge badge--blue">Opus</span> <span class="badge badge--purple">refactoring-patterns</span> <span class="badge badge--purple">migration-patterns</span> <span class="badge badge--purple">spec-update</span>
+<span class="badge badge--orange">Full (worktree)</span> <span class="badge badge--blue">Opus</span> <span class="badge badge--purple">refactoring-patterns</span> <span class="badge badge--purple">migration-patterns</span> <span class="badge badge--purple">build</span>
 
 A full-stack implementation agent that handles all code modifications: writing new features, fixing bugs, refactoring existing code, and executing migrations. Runs tests after every edit via a Stop hook to catch regressions immediately. Works in a git worktree so your main branch stays clean. The broadest-scope implementation agent — use the more focused refactorer, migrator, or test-writer when the task is clearly within one domain.
 
@@ -256,9 +256,9 @@ A senior application security engineer that audits codebases for vulnerabilities
 
 ### spec-writer
 
-<span class="badge badge--green">Read-only</span> <span class="badge badge--blue">Opus</span> <span class="badge badge--purple">specification-writing</span>
+<span class="badge badge--green">Read-only</span> <span class="badge badge--blue">Opus</span> <span class="badge badge--purple">spec</span> <span class="badge badge--purple">specs</span>
 
-A requirements engineer that creates structured technical specifications using the EARS (Easy Approach to Requirements Syntax) format for requirements and Given/When/Then patterns for acceptance criteria. Grounds every specification in the actual codebase state — reads existing code, tests, and interfaces before writing requirements.
+A requirements engineer that creates structured spec packages using the EARS (Easy Approach to Requirements Syntax) format for acceptance criteria and Given/When/Then patterns for test clarity. Grounds every specification in the actual codebase state — reads existing code, tests, and interfaces before writing requirements.
 
 **When activated:** "Write a spec for," "define requirements," "create acceptance criteria," specification authoring.
 
@@ -280,14 +280,14 @@ A specialist for configuring the Claude Code terminal statusline. Converts shell
 
 ### tester
 
-<span class="badge badge--orange">Full (worktree)</span> <span class="badge badge--blue">Opus</span> <span class="badge badge--purple">testing</span> <span class="badge badge--purple">spec-update</span>
+<span class="badge badge--orange">Full (worktree)</span> <span class="badge badge--blue">Opus</span> <span class="badge badge--purple">testing</span> <span class="badge badge--purple">build</span>
 
-A test suite creation and verification agent. Analyzes existing code, writes comprehensive tests, and ensures all tests pass before completing via a Stop hook. Supports pytest, Vitest, Jest, Go testing, and Rust test frameworks. Works in a git worktree. Functionally equivalent to test-writer with the addition of the spec-update skill for closing the spec loop after writing tests.
+A test suite creation and verification agent. Analyzes existing code, writes comprehensive tests, and ensures all tests pass before completing via a Stop hook. Supports pytest, Vitest, Jest, Go testing, and Rust test frameworks. Works in a git worktree. Functionally equivalent to test-writer with the addition of the build skill for closing the spec loop after writing tests.
 
 **When activated:** "Write tests for," "add test coverage," "create integration tests," test creation tasks.
 
 :::tip[tester vs test-writer]
-These agents are nearly identical. **tester** includes the spec-update skill for spec-driven workflows. **test-writer** is the built-in replacement agent. Both produce the same quality of tests — use whichever surfaces based on your request.
+These agents are nearly identical. **tester** includes the `/build` skill for spec-driven workflows. **test-writer** is the built-in replacement agent. Both produce the same quality of tests — use whichever surfaces based on your request.
 :::
 
 ### test-writer
@@ -312,20 +312,20 @@ A senior test engineer that analyzes existing code and writes comprehensive test
 | debug-logs | Read-only | Sonnet | -- | -- | debugging |
 | dependency-analyst | Read-only | Haiku | -- | Yes | dependency-management |
 | doc-writer | Full | Opus | Worktree | -- | documentation-patterns  |
-| documenter | Full | Opus | -- | -- | documentation-patterns, specification-writing, spec-new, spec-update, spec-review, spec-refine, spec-check |
+| documenter | Full | Opus | -- | -- | documentation-patterns, spec, build, specs |
 | explorer | Read-only | Haiku | -- | -- | ast-grep-patterns |
 | generalist | Full | Inherited | -- | -- | spec workflow |
 | git-archaeologist | Read-only | Haiku | -- | -- | git-forensics |
-| implementer | Full | Opus | Worktree | -- | refactoring-patterns, migration-patterns, spec-update |
+| implementer | Full | Opus | Worktree | -- | refactoring-patterns, migration-patterns, build |
 | investigator | Read-only | Sonnet | -- | -- | documentation-patterns, git-forensics, performance-profiling, debugging, dependency-management, ast-grep-patterns |
 | migrator | Full | Opus | Worktree | -- | migration-patterns |
 | perf-profiler | Read-only | Sonnet | -- | Yes | performance-profiling |
 | refactorer | Full | Opus | Worktree | -- | refactoring-patterns |
 | researcher | Read-only | Sonnet | -- | -- | documentation-patterns |
 | security-auditor | Read-only | Sonnet | -- | Yes | security-checklist |
-| spec-writer | Read-only | Opus | -- | -- | specification-writing |
+| spec-writer | Read-only | Opus | -- | -- | spec, specs |
 | statusline-config | Full | Sonnet | -- | -- | -- |
-| tester | Full | Opus | Worktree | -- | testing, spec-update |
+| tester | Full | Opus | Worktree | -- | testing, build |
 | test-writer | Full | Opus | Worktree | -- | testing |
 
 ## Access Levels at a Glance
diff --git a/docs/src/content/docs/features/skills.md b/docs/src/content/docs/features/skills.md
index e1ea8e5..2cfae32 100644
--- a/docs/src/content/docs/features/skills.md
+++ b/docs/src/content/docs/features/skills.md
@@ -187,11 +187,11 @@ Git history analysis techniques for tracing code evolution and investigating cha
 
 **Auto-suggested when:** You mention git history, git blame, bisect, "when was this changed," or code archaeology.
 
-### specification-writing
+### spec (spec-workflow plugin)
 
-EARS format specification authoring for structured technical requirements. Covers the five requirement types (ubiquitous, event-driven, state-driven, optional, unwanted) and Given/When/Then acceptance criteria patterns.
+Spec package creation and refinement. Creates directory-based spec packages with EARS acceptance criteria, decision tables, invariants, and parallel decomposition groups. Includes EARS patterns reference, constitution template, and a complete example spec package.
 
-**Key topics:** EARS requirement syntax, Given/When/Then acceptance criteria, requirement traceability, spec organization in `.specs/` directories, approval workflow (draft to approved).
+**Key topics:** EARS requirement syntax, Given/When/Then acceptance criteria, spec package structure (`index.md`, `context.md`, `groups/`), Constitution, AI decision workflow.
 
 **Auto-suggested when:** You mention writing specs, defining requirements, acceptance criteria, or EARS format.
 
@@ -225,7 +225,7 @@ Git worktree creation, management, and cleanup for parallel development workflow
 |----------|--------|-------|
 | **Frameworks** | fastapi, svelte5, pydantic-ai, docker, docker-py, sqlite | Framework-specific patterns and APIs |
 | **Practices** | testing, debugging, refactoring-patterns, security-checklist, api-design, documentation-patterns, performance-profiling, dependency-management, migration-patterns | Methodologies and established patterns |
-| **Claude & CodeForge** | claude-code-headless, claude-agent-sdk, skill-building, git-forensics, specification-writing, ast-grep-patterns, team, worktree | Building on and extending the Claude ecosystem |
+| **Claude & CodeForge** | claude-code-headless, claude-agent-sdk, skill-building, git-forensics, ast-grep-patterns, team, worktree | Building on and extending the Claude ecosystem |
 
 :::note[Skills vs Agents]
 Skills and agents serve different purposes. An **agent** is a specialized Claude instance with specific tools and constraints — it *does work*. A **skill** is a knowledge pack that *informs work* — it provides the patterns, best practices, and domain knowledge that make an agent (or the main Claude session) more effective. Many agents have associated skills that load automatically when the agent is spawned.
@@ -235,4 +235,4 @@ Skills and agents serve different purposes. An **agent** is a specialized Claude
 
 - [Skill Engine Plugin](../plugins/skill-engine/) — how the skill engine works
 - [Agents](./agents/) — agents that leverage skills
-- [Spec Workflow](../plugins/spec-workflow/) — the specification-writing skill in action
+- [Spec Workflow](../plugins/spec-workflow/) — the spec lifecycle skills (`/spec`, `/build`, `/specs`)
diff --git a/docs/src/content/docs/features/tools.md b/docs/src/content/docs/features/tools.md
index 8cf13b4..0f243e5 100644
--- a/docs/src/content/docs/features/tools.md
+++ b/docs/src/content/docs/features/tools.md
@@ -5,7 +5,7 @@ sidebar:
   order: 4
 ---
 
-CodeForge installs 22 tools and utilities in your DevContainer, covering session management, code quality, language runtimes, and development infrastructure. Every tool is on your `PATH` from the first terminal session — no manual installation required.
+CodeForge installs 23 tools and utilities in your DevContainer, covering session management, code quality, language runtimes, and development infrastructure. Every tool is on your `PATH` from the first terminal session — no manual installation required.
 
 ## Session & Claude Tools
 
@@ -150,6 +150,30 @@ cc-tools
 
 This displays a formatted table showing every tool, whether it is installed, and its version number.
 
+### codeforge — CodeForge CLI (Experimental)
+
+:::caution[Experimental]
+The CodeForge CLI is under active development. Commands and interfaces may change between releases.
+:::
+
+Multi-command CLI for development workflows — session search, plugin management, configuration, codebase indexing, and devcontainer management.
+
+```bash
+# Search session history
+codeforge session search "authentication approach"
+
+# List plugins and their status
+codeforge plugin list
+
+# Build a codebase symbol index
+codeforge index build
+
+# Manage devcontainers
+codeforge container ls
+```
+
+When run outside the container, commands auto-proxy into the running devcontainer. Use `--local` to run against the host filesystem.
+
 ## Code Quality Tools
 
 These tools are used both manually and automatically by the [Auto Code Quality Plugin](../plugins/auto-code-quality/) to maintain code standards.
@@ -230,6 +254,7 @@ CodeForge uses `uv` as the default Python package manager. It is significantly f
 | 20 | `tree-sitter` | Intelligence | Syntax tree parsing |
 | 21 | `pyright` | Intelligence | Python LSP server |
 | 22 | `typescript-language-server` | Intelligence | TypeScript/JS LSP server |
+| 23 | `codeforge` | Session | CodeForge CLI — session search, plugins, indexing _(experimental)_ |
 
 ## Related
 
diff --git a/docs/src/content/docs/getting-started/first-session.md b/docs/src/content/docs/getting-started/first-session.md
index 9da25e1..4568c07 100644
--- a/docs/src/content/docs/getting-started/first-session.md
+++ b/docs/src/content/docs/getting-started/first-session.md
@@ -72,10 +72,10 @@ The **test writer agent** generates tests that follow your project's existing pa
 ### Start a Feature with a Spec
 
 ```
-/spec-new
+/spec my-feature
 ```
 
-This skill walks you through creating a feature specification. Specs bring structure to development — you define what you're building before writing code. See the [Spec Workflow plugin](../plugins/spec-workflow/) for the full lifecycle.
+This skill creates a spec package for your feature — the AI drafts everything, presents decisions that need your input, and gets your approval. See the [Spec Workflow plugin](../plugins/spec-workflow/) for the full lifecycle.
 
 ### Check Your Tools
 
@@ -100,7 +100,7 @@ claude-dashboard
 CodeForge includes **21 specialized agents** and **38 skills** that activate automatically based on what you're working on. You don't need to memorize names — just describe what you want, and Claude delegates to the right specialist. The examples in "What to Try First" above show this in action.
 
 - **[Agents](../features/agents/)** — specialized AI personas for architecture, debugging, testing, security, migrations, and more
-- **[Skills](../features/skills/)** — domain-specific knowledge packs (FastAPI, Docker, Svelte, debugging patterns, etc.) that the skill engine suggests automatically or you invoke with slash commands like `/spec-new`
+- **[Skills](../features/skills/)** — domain-specific knowledge packs (FastAPI, Docker, Svelte, debugging patterns, etc.) that the skill engine suggests automatically or you invoke with slash commands like `/spec`
 
 ## Understanding the Status Line
 
@@ -115,7 +115,7 @@ Instead of "fix the bug," try "the login endpoint returns 500 when the email fie
 :::
 
 :::tip[Use the spec workflow for features]
-For anything beyond a simple bug fix, start with `/spec-new`. Writing a spec first helps Claude (and you) think through the design before writing code. The spec becomes a living document that tracks what was built and why.
+For anything beyond a simple bug fix, start with `/spec`. Writing a spec first helps Claude (and you) think through the design before writing code. The spec becomes a living document that tracks what was built and why.
 :::
 
 :::tip[Let agents do their thing]
diff --git a/docs/src/content/docs/plugins/agent-system.md b/docs/src/content/docs/plugins/agent-system.md
index 72cdbde..1fa7c17 100644
--- a/docs/src/content/docs/plugins/agent-system.md
+++ b/docs/src/content/docs/plugins/agent-system.md
@@ -122,7 +122,7 @@ These agents investigate, analyze, and report — they never modify files.
 | **git-archaeologist** | Git history analysis, blame, bisect, forensics | Haiku | git-forensics |
 | **perf-profiler** | Performance profiling, bottleneck identification | Sonnet | performance-profiling |
 | **security-auditor** | Security audit, vulnerability assessment, OWASP checks | Sonnet | security-checklist |
-| **spec-writer** | Specification authoring and refinement | Opus | specification-writing |
+| **spec-writer** | Specification authoring and refinement | Opus | spec, specs |
 
 ### Full-Access Agents
 
@@ -172,7 +172,7 @@ model: opus
 permissionMode: plan
 skills:
   - api-design
-  - spec-new
+  - spec
 hooks:
   PreToolUse:
     - matcher: Bash
diff --git a/docs/src/content/docs/plugins/skill-engine.md b/docs/src/content/docs/plugins/skill-engine.md
index 9ec2c34..78f0026 100644
--- a/docs/src/content/docs/plugins/skill-engine.md
+++ b/docs/src/content/docs/plugins/skill-engine.md
@@ -81,7 +81,7 @@ Skills for working with Claude Code itself and extending CodeForge.
 | **worktree** | Git worktree lifecycle, EnterWorktree tool, `.worktreeinclude` setup, parallel workflows |
 
 :::note[Cross-Plugin Skills]
-The `specification-writing` skill and the spec lifecycle skills (`spec-new`, `spec-build`, etc.) live in the [Spec Workflow](./spec-workflow/) plugin, not the skill engine. However, the skill-suggester registers keywords for them so they are auto-suggested alongside skill-engine skills.
+The spec lifecycle skills (`/spec`, `/build`, `/specs`) live in the [Spec Workflow](./spec-workflow/) plugin, not the skill engine. However, the skill-suggester registers keywords for them so they are auto-suggested alongside skill-engine skills.
 :::
 
 ## Skill Activation Patterns
@@ -95,8 +95,8 @@ Here is a sampling of the phrases and terms that trigger each category of skill,
 | "Refactor this function" | refactoring-patterns |
 | "Check for security vulnerabilities" | security-checklist |
 | "Profile this code" or "find the bottleneck" | performance-profiling |
-| "Create a spec for this feature" | spec-new |
-| "Build from the spec" | spec-build |
+| "Create a spec for this feature" | spec |
+| "Build from the spec" | build |
 | "Debug the logs" or "what went wrong" | debugging |
 | "Spawn a team" or "work in parallel" | team |
 | "Search with ast-grep" | ast-grep-patterns |
@@ -179,5 +179,5 @@ skill-engine/
 
 - [Skills Reference](../features/skills/) — detailed per-skill documentation
 - [Agent System](./agent-system/) — agents that carry pre-loaded skills
-- [Spec Workflow](./spec-workflow/) — the specification-writing skill powers spec authoring
+- [Spec Workflow](./spec-workflow/) — spec lifecycle skills (`/spec`, `/build`, `/specs`)
 - [Hooks](../customization/hooks/) — how the skill suggester hook integrates
diff --git a/docs/src/content/docs/plugins/spec-workflow.md b/docs/src/content/docs/plugins/spec-workflow.md
index c9c901d..785874a 100644
--- a/docs/src/content/docs/plugins/spec-workflow.md
+++ b/docs/src/content/docs/plugins/spec-workflow.md
@@ -1,224 +1,184 @@
 ---
 title: Spec Workflow
-description: The specification workflow plugin manages the full lifecycle of feature specifications — from creation through implementation to as-built closure.
+description: Directory-based specification packages for AI-first development — create specs, build from specs, and track spec health with 3 commands.
 sidebar:
   order: 4
 ---
 
-The spec workflow plugin enforces a specification-driven development process. Every non-trivial feature gets a spec before implementation begins, and every implementation ends with an as-built spec update that documents what was actually built. This creates a reliable loop: plan the work, do the work, record what happened.
+The spec workflow plugin enforces a specification-driven development process optimized for AI-first development with large context windows and parallel agent teams. Every non-trivial feature gets a spec package before implementation begins, and every implementation ends with automated closure that documents what was actually built.
 
-Why does this matter? Specs force you to think through edge cases, acceptance criteria, and scope boundaries while changes are cheap — before any code exists. The as-built closure step catches drift between what was planned and what was delivered. Together, they give you a living record of every feature in your project.
+## Why Spec Packages?
 
-## The Specification Lifecycle
+Specs have two audiences with fundamentally different needs:
+- **Humans** need to review decisions and confirm scope (~2 minutes)
+- **AI agents** need complete implementation detail (invariants, schema, examples)
 
-The spec workflow follows a clear seven-stage lifecycle. Each stage has a dedicated slash command, and each stage feeds into the next:
+Spec packages separate these concerns. The human reads `index.md` (~50-80 lines). The AI reads everything else. No one wastes time on content meant for the other audience.
 
-```
-/spec-init  -->  /spec-new  -->  /spec-refine  -->  /spec-build  -->  /spec-review  -->  /spec-update
-   |               |                |                   |                 |                  |
-Bootstrap      Create a         Validate           Implement         Verify code       Close the
-.specs/        draft spec       assumptions        the feature       vs. spec          loop
-
-                                              /spec-check (audit health — runs independently)
-```
-
-### Stage 1: Initialize (`/spec-init`)
-
-Bootstrap the `.specs/` directory at your project root. This creates the directory structure, `MILESTONES.md` for tracking releases, and `BACKLOG.md` for capturing deferred work. You only run this once per project.
+## The Three Commands
 
-### Stage 2: Create (`/spec-new`)
+The entire spec lifecycle uses three commands:
 
-Create a new feature specification from the standard template. The command infers a domain folder from the feature name (e.g., `auth/`, `search/`, `api/`) and generates a structured Markdown file with sections for intent, acceptance criteria, requirements, dependencies, and scope boundaries.
-
-New specs always start with:
-- **Status:** `planned`
-- **Approval:** `draft`
-- All requirements tagged `[assumed]`
-
-This is intentional. Draft specs contain unvalidated assumptions — they should not be implemented until those assumptions are confirmed.
+| Command | Purpose | When to Use |
+|---------|---------|-------------|
+| `/spec <feature>` | Create, refine, and approve a spec package | Starting any non-trivial feature |
+| `/build <feature>` | Implement from spec — plan, build, review, close | When spec is approved and ready |
+| `/specs` | Health dashboard across all specs | Anytime — check project health |
 
-### Stage 3: Refine (`/spec-refine`)
+### `/spec` — Create & Refine
 
-Walk through every `[assumed]` requirement with the user, validating tech decisions and scope boundaries. As each requirement is confirmed, it upgrades from `[assumed]` to `[user-approved]`. The spec's approval status changes to `user-approved` only after all requirements pass review.
+Creates a directory-based spec package. The AI analyzes your codebase, drafts everything, and presents only the decisions that need human input.
 
-:::caution[Do Not Skip Refinement]
-The `/spec-build` command enforces a hard gate: it refuses to implement any spec that is not `user-approved`. Building against draft specs with unvalidated assumptions risks wasted work. Always refine first.
-:::
+```
+/spec webhook-delivery
+```
 
-### Stage 4: Build (`/spec-build`)
+The AI will:
+1. Analyze your codebase for patterns and context
+2. Create `.specs/integrations/webhook-delivery/` with all files
+3. Present decisions that need your input (genuine trade-offs only)
+4. Make obvious decisions itself (you can override any)
+5. Show the AC list for completeness checking
+6. Finalize and approve on your confirmation
 
-The most powerful command in the workflow. It orchestrates the full implementation lifecycle in five phases:
+**First-time use?** `/spec` auto-creates `.specs/` with a Constitution template and Backlog. No separate setup needed.
 
-1. **Discovery and Gate Check** — reads the spec, verifies approval status, builds context from key files
-2. **Implementation Planning** — creates a structured plan mapping requirements to file changes, enters plan mode for user approval
-3. **Implementation** — executes the plan step by step, flipping acceptance criteria from `[ ]` to `[~]` as each is addressed
-4. **Comprehensive Review** — audits every requirement, verifies acceptance criteria with tests, checks code quality and spec consistency
-5. **Spec Closure** — updates status, adds implementation notes, documents discrepancies
+**Constitution?** Run `/spec constitution` to capture project-level decisions (tech stack, patterns, conventions) that every spec inherits.
 
-Because Phase 5 performs full as-built closure, you do not need a separate `/spec-update` run after using `/spec-build`.
+### `/build` — Implement & Close
 
-:::tip[Team Spawning for Complex Specs]
-When a spec has 8+ requirements or spans multiple layers (backend, frontend, tests), `/spec-build` automatically recommends spawning a team of specialist agents. A researcher explores patterns, a test-writer creates tests in a worktree, and a doc-writer updates documentation — all working in parallel.
-:::
+Takes an approved spec and builds everything autonomously:
 
-### Stage 5: Review (`/spec-review`)
+```
+/build webhook-delivery
+```
 
-Standalone implementation verification. Use this after manual implementation, for post-change regression checks, or during pre-release audits. It reads the code, verifies every requirement and acceptance criterion against the implementation, and recommends `/spec-update` when done.
+Five phases, zero human intervention:
+1. **Discovery** — reads Constitution + spec package, verifies approval
+2. **Planning** — decomposes groups into tasks, recommends team or solo
+3. **Building** — spec-first testing, implements each AC, marks `[~]`
+4. **Review** — self-healing fix loop (finds issues, fixes them, re-tests), marks `[x]`
+5. **Closure** — generates Completion Summary Report, updates spec status
 
-### Stage 6: Update (`/spec-update`)
+The Summary Report shows: AC results, AI decisions made, concerns, and discrepancies. That's your review surface.
 
-Close the as-built loop. Updates the spec to reflect what was actually built — sets status, checks off acceptance criteria, adds implementation notes for deviations, and updates file paths. Use this after manual implementation or when the spec-reminder hook nudges you.
+### `/specs` — Dashboard
 
-### Stage 7: Check (`/spec-check`)
+Quick health check across all specs:
 
-Audit spec health across the project. Run this before starting a new milestone to ensure all specs are current, acceptance criteria are complete, and no specs have gone stale.
+```
+/specs
+```
 
-## Slash Commands Reference
+Shows specs by status, flags stale specs, lists draft specs awaiting approval, and surfaces unresolved AI decisions.
 
-| Command | Purpose | When to Use |
-|---------|---------|-------------|
-| `/spec-init` | Bootstrap `.specs/` directory | Once per project, at project start |
-| `/spec-new <feature>` | Create a new feature spec | Starting any non-trivial feature |
-| `/spec-refine <feature>` | Validate and approve requirements | After creating a draft spec, before implementation |
-| `/spec-build <feature>` | Full implementation from spec | When the spec is approved and ready to build |
-| `/spec-review <feature>` | Verify implementation vs. spec | After manual implementation or for regression checks |
-| `/spec-update` | As-built closure | After implementation (if not using `/spec-build`) |
-| `/spec-check` | Health audit of all specs | Before milestones, during planning |
+## Spec Package Structure
 
-## Directory Convention
-
-Specs live in `.specs/` at the project root, organized by domain. Each domain gets its own folder, and each feature gets its own Markdown file within that folder.
+Every spec is a directory:
 
 ```
-.specs/
-├── MILESTONES.md          # Milestone tracker linking to feature specs
-├── BACKLOG.md             # Deferred items not yet scheduled
-├── auth/                  # Domain folder
-│   ├── login-flow.md      # Feature spec
-│   └── oauth-providers.md # Feature spec
-├── search/                # Domain folder
-│   └── full-text-search.md
-└── onboarding/
-    └── user-signup.md
+.specs/integrations/webhook-delivery/
+  index.md              # Human entry point (~50-80 lines)
+  context.md            # AI-facing shared context
+  groups/
+    a-registration.md   # AC group with frontmatter
+    b-delivery.md       # AC group with frontmatter
+    c-retry.md
+    d-logs.md
 ```
 
-Only `MILESTONES.md` and `BACKLOG.md` live at the `.specs/` root. Everything else goes in domain subfolders.
+### `index.md` — Human Review Surface
+
+The ONLY file humans need to read. Contains:
+- **Intent** — what and why (2-3 sentences)
+- **Decisions — Needs Your Input** — genuine trade-offs for human judgment
+- **Decisions — Already Decided** — obvious choices the AI made (overridable)
+- **AC Summary** — one-liner per criterion (completeness check)
+- **Out of Scope** — explicit non-goals
 
-:::note[Spec Sizing]
-Aim for roughly 200 lines per spec. If a feature needs significantly more, split it into separate specs in the domain folder. Completeness matters more than hitting a number, but very long specs are hard to load and review.
-:::
+### `context.md` — AI Context
 
-## The Approval Workflow
+Read by every implementing agent. Contains:
+- **Invariants** — always-true assertions
+- **Anti-Patterns** — "do NOT" examples preventing specification gaming
+- **Integration Context** — dependency details inline
+- **Schema Intent** — data model design (not DDL)
+- **Constraints** — file paths, patterns, prohibitions
 
-The approval workflow prevents premature implementation. Here is how a spec progresses from idea to approved contract:
+### Group Files — Work Units
 
-**1. Draft with assumptions** — When you create a spec with `/spec-new`, every requirement is tagged `[assumed]`. This signals that the requirement reflects the spec author's best guess, not confirmed user intent.
+Each group file has YAML frontmatter driving parallel decomposition:
 
-```markdown
-## Requirements
-- FR-1: The system shall send email notifications on order completion. [assumed]
-- FR-2: WHEN a notification fails, the system shall retry 3 times. [assumed]
-- NFR-1: Notification latency shall not exceed 5 seconds. [assumed]
+```yaml
+---
+group: A
+name: Registration & Configuration
+criteria: [AC-1, AC-2, AC-3]
+status: pending
+owner: null
+depends_on: []
+files_owned:
+  - src/models/webhook.py
+  - src/schemas/webhook.py
+---
 ```
 
-**2. Refinement** — Running `/spec-refine` walks through each `[assumed]` requirement interactively. You confirm, modify, or reject each one. Confirmed requirements upgrade to `[user-approved]`.
+Followed by full acceptance criteria with EARS patterns, Given/When/Then, and inline examples.
 
-```markdown
-## Requirements
-- FR-1: The system shall send email notifications on order completion. [user-approved]
-- FR-2: WHEN a notification fails, the system shall retry 3 times with exponential backoff. [user-approved]
-- NFR-1: Notification latency shall not exceed 10 seconds. [user-approved]
-```
+## The Constitution
 
-**3. Gate check** — `/spec-build` verifies `**Approval:** user-approved` before proceeding. If any requirements remain `[assumed]`, the gate check fails and implementation is blocked.
+`.specs/CONSTITUTION.md` captures project-level decisions:
+- Tech stack and frameworks
+- Architecture and file structure
+- API conventions (error format, pagination, versioning)
+- Auth and security patterns
+- Testing conventions
+- Code patterns and naming
+- Boundaries (always do / never do)
 
-## Acceptance Criteria Markers
+`/build` reads the Constitution before any feature spec. Feature specs inherit these decisions — they don't repeat them.
 
-During implementation, acceptance criteria use three states that track progress from "not started" through "implemented" to "verified":
+## Acceptance Criteria Markers
 
 | Marker | Meaning | Set By |
 |--------|---------|--------|
-| `[ ]` | Not started — criterion has not been addressed in code | `/spec-new` (initial state) |
-| `[~]` | Implemented, not yet verified — code is written but tests are not confirmed | `/spec-build` Phase 3 |
-| `[x]` | Verified — tests pass, behavior confirmed | `/spec-build` Phase 4 |
+| `[ ]` | Not started | `/spec` |
+| `[~]` | Implemented, not yet verified | `/build` Phase 3 |
+| `[x]` | Verified — tests pass | `/build` Phase 4 |
 
-This three-state system prevents false confidence. A criterion marked `[~]` means "someone wrote code for this but nobody has verified it works." Only after a test passes (or behavior is confirmed) does it graduate to `[x]`.
+## AI Decisions
 
-If `/spec-update` runs after manual implementation, any `[~]` markers that were never verified revert to `[ ]`.
+During `/build`, when the AI encounters a decision not in the spec or Constitution:
+1. Makes its best choice
+2. Records it in the group file's AI Decisions table
+3. Continues building (does NOT stop)
+4. Summary Report presents all decisions
+5. You approve, override, or promote to Constitution
 
 ## The Spec Reminder Hook
 
-The plugin includes a `spec-reminder.py` hook that fires on the `Stop` event. When Claude finishes a turn, the hook checks two conditions:
-
-1. Were source code files modified? (files in `src/`, `lib/`, `app/`, `tests/`, `api/`, and other standard code directories)
-2. Were any `.specs/` files also modified?
-
-If code changed but specs did not, the hook injects a reminder:
-
-> *[Spec Reminder] Code was modified in src/, tests/ but no specs were updated. Use /spec-review to verify implementation against the spec, then /spec-update to close the loop. Use /spec-new if no spec exists for this feature, or /spec-refine if the spec is still in draft status.*
-
-This ensures the as-built loop is always closed. The reminder only fires when a `.specs/` directory exists (meaning the project uses the spec system).
-
-:::note[The Reminder Is Advisory]
-The spec reminder blocks the turn to surface the message, but it is not destructive. It gives you the opportunity to update specs before moving on. You can address it immediately or note it for later.
-:::
+A `Stop` hook fires when code was modified but specs weren't updated. It's advisory — reminds you to close the loop with `/build` or `/spec`.
 
 ## A Practical Example
 
-Here is a typical workflow for implementing a "user notification preferences" feature:
-
-```
-1. /spec-new notification-preferences
-   → Creates .specs/notifications/notification-preferences.md
-   → Status: planned, Approval: draft
-   → All requirements tagged [assumed]
-
-2. /spec-refine notification-preferences
-   → Walks through each requirement interactively
-   → User confirms email preferences, rejects SMS for now
-   → Requirements upgrade to [user-approved]
-   → Approval: user-approved
-
-3. /spec-build notification-preferences
-   → Phase 1: Reads spec, verifies approval, explores key files
-   → Phase 2: Creates implementation plan, gets user approval
-   → Phase 3: Implements step by step, flips [ ] to [~]
-   → Phase 4: Runs tests, verifies criteria, upgrades [~] to [x]
-   → Phase 5: Updates spec status to "implemented"
-
-4. Done! Spec reflects what was actually built.
 ```
-
-## Spec Template
-
-Every spec follows a standard structure. Here are the key sections:
-
-```markdown
-# Feature: [Name]
-**Domain:** [domain-name]
-**Status:** implemented | partial | planned
-**Last Updated:** YYYY-MM-DD
-**Approval:** draft | user-approved
-
-## Intent
-## Acceptance Criteria
-## Key Files
-## Schema / Data Model (reference file paths only)
-## API Endpoints (Method | Path | Description)
-## Requirements (EARS format: FR-1, NFR-1)
-## Dependencies
-## Out of Scope
-## Resolved Questions
-## Implementation Notes (post-implementation only)
-## Discrepancies (spec vs reality gaps)
+1. /spec webhook-delivery
+   → Creates .specs/integrations/webhook-delivery/
+   → AI presents 3 trade-off decisions + AC list
+   → You make decisions, confirm scope → approved
+
+2. /build webhook-delivery
+   → Reads Constitution + spec package
+   → Decomposes into 4 parallel groups
+   → Builds, tests, reviews, fixes issues
+   → Generates Summary Report with 15/15 ACs verified
+   → Presents 2 AI decisions for your review
+
+3. Done. Smoke test the feature. Review AI decisions.
 ```
 
-Requirements use the EARS (Easy Approach to Requirements Syntax) format with five patterns: Ubiquitous, Event-Driven, State-Driven, Unwanted Behavior, and Optional Feature. The `specification-writing` skill provides detailed guidance and templates for EARS format.
-
 ## Related
 
-- [Specification Writing Skill](../features/skills/) — EARS format guidance and spec templates
-- [Agent System](./agent-system/) — the spec-writer and architect agents support spec creation
+- [Agent System](./agent-system/) — specialist agents support parallel spec builds
 - [Ticket Workflow](./ticket-workflow/) — tickets complement specs with issue tracking
 - [Hooks](../customization/hooks/) — how the spec-reminder hook integrates
-- [Commands Reference](../reference/commands/) — full command reference
diff --git a/docs/src/content/docs/reference/changelog.md b/docs/src/content/docs/reference/changelog.md
index a8b7332..e6df265 100644
--- a/docs/src/content/docs/reference/changelog.md
+++ b/docs/src/content/docs/reference/changelog.md
@@ -47,6 +47,39 @@ For minor and patch updates, you can usually just rebuild the container. Check t
 
 ## Version History
 
+## v2.1.1 — 2026-03-13
+
+### Workspace Scope Guard
+
+- Fix `/dev/null` false positive — redirects to system paths (`/dev/`, `/proc/`, `/sys/`, etc.) are now allowed regardless of the primary command, not just for system commands like `git` or `pip`
+- Fix CWD drift — scope root is now persisted on first invocation per session, preventing `cd` commands in Bash from silently changing the enforced scope boundary
+- CWD context injector now uses the same persisted scope root, keeping advisory context aligned with enforcement
+
+## v2.1.0 — 2026-03-13
+
+### Spec Workflow v2 — "Spec Packages"
+
+- **Breaking:** Replaced all 8 spec commands with 3: `/spec` (create & refine), `/build` (implement & close), `/specs` (dashboard)
+- Specs are now directory-based "spec packages" with separated human and AI content:
+  - `index.md` — human-facing entry point (~50-80 lines): intent, decisions, AC summary, scope
+  - `context.md` — AI-facing shared context: invariants, anti-patterns, schema intent, constraints
+  - `groups/*.md` — AC groups with YAML frontmatter for parallel agent decomposition
+- Added Constitution support (`.specs/CONSTITUTION.md`) for project-level cross-cutting decisions
+- Simplified approval model: spec-level `draft`/`approved` replaces per-requirement `[assumed]`/`[user-approved]` tagging
+- AI makes obvious decisions autonomously, presents only genuine trade-offs to the human
+- `[ai-decided]` workflow: AI records autonomous decisions during build for post-completion review
+- Group frontmatter (`depends_on`, `files_owned`) drives automatic task decomposition for team builds
+- Dropped MILESTONES.md and ROADMAP.md — replaced with simple BACKLOG.md idea parking lot
+- Updated all 8 agent skill lists, system prompts, orchestrator prompt, skill-suggester, and 8 docs pages
+- Ships with a complete example spec package (webhook delivery system) as reference
+
+### CLI v0.1.0 (Experimental)
+
+- Initial release of the `codeforge` CLI — session search, plugin management, config deployment, codebase indexing, and devcontainer management
+- New `codeforge index` command group — build and search a codebase symbol index (build, search, show, stats, tree, clean)
+- New `codeforge container` command group — manage devcontainers from the host (up, down, rebuild, exec, ls, shell)
+- Container proxy — CLI commands auto-proxy into the running devcontainer when run from the host
+
 ## v2.0.3 — 2026-03-03
 
 ### Workspace Scope Guard
diff --git a/docs/src/content/docs/reference/commands.md b/docs/src/content/docs/reference/commands.md
index 2f5f211..d21cd59 100644
--- a/docs/src/content/docs/reference/commands.md
+++ b/docs/src/content/docs/reference/commands.md
@@ -101,13 +101,10 @@ Slash commands for specification-driven development. These are used within Claud
 
 | Command | Purpose | Example |
 |---------|---------|---------|
-| `/spec-init` | Bootstrap the `.specs/` directory with templates | `/spec-init` |
-| `/spec-new <feature>` | Create a new feature specification from the standard template | `/spec-new user-signup` |
-| `/spec-refine <feature>` | Validate assumptions, get user approval before implementation | `/spec-refine user-signup` |
-| `/spec-build <feature>` | Orchestrate full implementation: plan, build, review, and close | `/spec-build user-signup` |
-| `/spec-review <feature>` | Verify implementation against spec requirements | `/spec-review user-signup` |
-| `/spec-update` | As-built spec closure after implementation | `/spec-update` |
-| `/spec-check` | Audit spec health -- find stale, incomplete, or unapproved specs | `/spec-check` |
+| `/spec <feature>` | Create, refine, and approve a spec package | `/spec user-signup` |
+| `/spec constitution` | Create or update the project Constitution | `/spec constitution` |
+| `/build <feature>` | Implement from spec — plan, build, review, and close | `/build user-signup` |
+| `/specs` | Dashboard: spec health across the project | `/specs` |
 
 ## Ticket Workflow Slash Commands
 
@@ -146,6 +143,24 @@ Standalone slash commands for git operations within Claude Code sessions. These
 4. Presents findings — user selects what to include in review, create as issues, or ignore
 5. Posts review comment to PR (never approves or merges)
 
+## CodeForge CLI Commands (Experimental)
+
+:::caution[Experimental]
+The `codeforge` CLI is under active development. Commands and interfaces may change between releases.
+:::
+
+The `codeforge` command provides development workflow tools. When run outside the container, commands auto-proxy into the running devcontainer. Use `--local` to bypass proxying.
+
+| Command Group | Subcommands | Description |
+|---------------|-------------|-------------|
+| `codeforge session` | `search`, `list`, `show` | Search and browse Claude Code session history |
+| `codeforge task` | `search` | Search tasks |
+| `codeforge plan` | `search` | Search plans |
+| `codeforge plugin` | `list`, `show`, `enable`, `disable`, `hooks`, `agents`, `skills` | Manage Claude Code plugins |
+| `codeforge config` | `show`, `apply` | View and deploy configuration |
+| `codeforge index` | `build`, `search`, `show`, `stats`, `tree`, `clean` | Build and search a codebase symbol index |
+| `codeforge container` | `up`, `down`, `rebuild`, `exec`, `ls`, `shell` | Manage CodeForge devcontainers |
+
 ## GitHub CLI
 
 The GitHub CLI (`gh`) is pre-installed for repository operations.
@@ -181,7 +196,8 @@ Commands come from different sources in the CodeForge setup:
 | Shell aliases | `cc`, `claude`, `ccw`, `ccraw`, `cc-orc`, `check-setup` | `setup-aliases.sh` writes to `.bashrc`/`.zshrc` |
 | Shell functions | `cc-tools` | `setup-aliases.sh` writes to `.bashrc`/`.zshrc` |
 | DevContainer features | `ccusage`, `ccburn`, `ruff`, `biome`, `sg`, `dbr`, etc. | `install.sh` in each feature directory |
-| Slash commands | `/spec-new`, `/ticket:new`, `/ship`, `/pr:review`, `/ps`, etc. | Skill SKILL.md files in plugin directories |
+| CodeForge CLI | `codeforge session`, `codeforge index`, `codeforge container`, etc. | `codeforge-cli` devcontainer feature |
+| Slash commands | `/spec`, `/build`, `/ticket:new`, `/ship`, `/pr:review`, `/ps`, etc. | Skill SKILL.md files in plugin directories |
 | External features | `gh`, `docker`, `node`, `bun` | Installed via `devcontainer.json` features |
 
 :::tip[Listing All Tools]
diff --git a/docs/src/content/docs/reference/index.md b/docs/src/content/docs/reference/index.md
index 426cf93..0327f71 100644
--- a/docs/src/content/docs/reference/index.md
+++ b/docs/src/content/docs/reference/index.md
@@ -44,12 +44,10 @@ This section is a lookup resource for CodeForge internals. Use it when you need
 
 | Command | What It Does |
 |---------|-------------|
-| `/spec-new <feature>` | Create a new feature spec |
-| `/spec-refine <feature>` | Validate assumptions, get user approval |
-| `/spec-build <feature>` | Full implementation lifecycle from spec |
-| `/spec-review <feature>` | Verify implementation against spec |
-| `/spec-update` | As-built spec closure |
-| `/spec-check` | Audit spec health |
+| `/spec <feature>` | Create, refine, and approve a spec package |
+| `/spec constitution` | Create or update project-level Constitution |
+| `/build <feature>` | Implement from spec — plan, build, review, close |
+| `/specs` | Dashboard: spec health across the project |
 
 ## Key Configuration Files