bump: fail the per-entry check-dispatch step when a dispatch fails

The dispatch step logged each failed gh workflow run as a warning and exited 0, so a transient API error or rate limit could leave a per-entry bump PR missing a required check while the bump run still showed green. The composite action skips slugs with an open PR, so the stranded PR was never retried. Attempt every dispatch (one failure must not strand the other branches), record failures via a temp file (the while loop runs in a pipe subshell), then emit an error and exit non-zero if any dispatch failed, so the bump run goes red and the affected PR can be re-dispatched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
bump: dispatch all three required checks per per-entry PR
2026-06-21 17:23:39 +00:00 · 2026-05-29 17:08:25 -05:00 · 2026-05-29 16:52:06 -05:00 · 2026-05-29 16:52:06 -05:00 · 2026-05-29 16:52:06 -05:00
20 changed files with 291 additions and 1716 deletions
--- a/.claude-plugin/marketplace.json
+++ b/.claude-plugin/marketplace.json
--- a/plugins/code-modernization/.claude-plugin/plugin.json
+++ b/plugins/code-modernization/.claude-plugin/plugin.json
@@ -1,6 +1,6 @@
 {
  "name": "code-modernization",
-  "description": "Modernize legacy codebases (COBOL, legacy Java/C++, monolith web apps) with a structured preflight / assess / map / extract-rules / brief / reimagine / transform / harden workflow, an interactive topology viewer, and specialist review agents",
+  "description": "Modernize legacy codebases (COBOL, legacy Java/C++, monolith web apps) with a structured assess → map → extract-rules → brief → reimagine/transform → harden workflow and specialist review agents",
  "author": {
    "name": "Anthropic",
    "email": "support@anthropic.com"
--- a/plugins/code-modernization/README.md
+++ b/plugins/code-modernization/README.md
@@ -7,7 +7,7 @@ A structured workflow and set of specialist agents for modernizing legacy codeba
 Legacy modernization fails most often not because the target technology is wrong, but because teams skip steps: they transform code before understanding it, reimagine architecture before extracting business rules, or ship without a harness that would catch behavior drift. This plugin enforces a sequence:

 ```
-preflight → assess → map → extract-rules → brief → reimagine | transform → harden
+assess → map → extract-rules → brief → reimagine | transform → harden
 ```

 The discovery commands (`assess`, `map`, `extract-rules`) build artifacts under `analysis/<system>/`. The `brief` command synthesizes them into an approval gate. The build commands (`reimagine`, `transform`) write new code under `modernized/`. The `harden` command audits the legacy system and produces a reviewable remediation patch. Each step has a dedicated slash command, and specialist agents (legacy analyst, business rules extractor, architecture critic, security auditor, test engineer) are invoked from within those commands — or directly — to keep the work honest.
@@ -20,36 +20,25 @@ Commands take a `<system-dir>` argument and assume the system being modernized l
 mkdir -p legacy && ln -s /path/to/your/legacy/codebase legacy/billing
 ```

-## What to give Claude
+## Optional tooling

-The commands degrade gracefully, but each of these makes the output meaningfully better — run `/modernize-preflight <system-dir>` to check all of them at once and get a readiness report:
-
- **Analysis tools**: [`scc`](https://github.com/boyter/scc) (LOC + complexity + COCOMO) or [`cloc`](https://github.com/AlDanial/cloc); [`lizard`](https://github.com/terryyin/lizard) for portfolio mode. Without them, metrics fall back to `find`/`wc` and get coarser.
- **A working build toolchain** for the legacy stack (e.g. GnuCOBOL for COBOL) — required before `/modernize-transform` can prove behavioral equivalence, and verified by preflight with a real smoke compile against your code.
- **The whole system in the tree**: deployment descriptors (JCL, CICS definitions, route configs), copybooks/includes, and DDL/schemas. Entry-point detection and data lineage in `/modernize-map` are guesswork without them.
- **Production telemetry** (optional): an observability MCP server or batch job logs enable the runtime overlay in `/modernize-assess` and timing annotations on critical paths.
+`/modernize-assess` works best with [`scc`](https://github.com/boyter/scc) (LOC + complexity + COCOMO) or [`cloc`](https://github.com/AlDanial/cloc), and falls back to `find`/`wc` if neither is installed. Portfolio mode also benefits from [`lizard`](https://github.com/terryyin/lizard) (cyclomatic complexity). The commands degrade gracefully without them, but the metrics will be coarser.

 ## Commands

 The commands are designed to be run in order, but each produces a standalone artifact so you can stop, review, and resume.

-### `/modernize-preflight <system-dir> [target-stack]`
-Environment readiness check, meant to run first: detects the legacy stack, checks analysis tooling, **smoke-compiles a real source file** with the legacy toolchain (the errors this surfaces — missing copybooks, wrong dialect flags — are the ones that otherwise appear mid-transform), inventories missing includes / deployment descriptors / binary-only artifacts, and probes for telemetry. Produces `analysis/<system>/PREFLIGHT.md` with a per-command Ready / Ready-with-gaps / Not-ready verdict.
-
 ### `/modernize-assess <system-dir>`  — or — `/modernize-assess --portfolio <parent-dir>`
 Inventory the legacy codebase: languages, line counts, complexity, build system, integrations, technical debt, security posture, documentation gaps, and a COCOMO-derived effort estimate. Produces `analysis/<system>/ASSESSMENT.md` and `analysis/<system>/ARCHITECTURE.mmd`. Spawns `legacy-analyst` (×2) and `security-auditor` in parallel for deep reads. With `--portfolio`, sweeps every subdirectory of a parent directory and writes a sequencing heat-map to `analysis/portfolio.html`.

 ### `/modernize-map <system-dir>`
-
-![Interactive topology map of AWS CardDemo — domains as containers, modules sized by lines of code, dependency edges colored by kind, entry points ringed](assets/topology-viewer-screenshot.jpg)
-
-Build a dependency and topology map of the **legacy** system: program/module call graph, data lineage (programs ↔ data stores), entry points, dead-end candidates, and 2–4 traced business flows each anchored to a persona (the claimant, the operator, the auditor — not the maintainer). Writes a re-runnable extraction script and produces `analysis/<system>/topology.json` plus `analysis/<system>/TOPOLOGY.html` — an **interactive zoomable map** (circle-pack of domains/modules sized by LOC, dependency edges with per-kind toggles, search, click-for-details sidebar, and a walkthrough mode that plays each persona flow as a numbered path with a plain-language narrative). Built from a template shipped with the plugin, so it works on systems far too dense for a static diagram. Small domain-level `call-graph.mmd`, `data-lineage.mmd`, and `critical-path.mmd` are still exported for docs and PRs.
+Build a dependency and topology map of the **legacy** system: program/module call graph, data lineage (programs ↔ data stores), entry points, dead-end candidates, and one traced critical-path business flow. Writes a re-runnable extraction script and produces `analysis/<system>/topology.json` (machine-readable), `analysis/<system>/TOPOLOGY.html` (rendered Mermaid + architect observations), and standalone `call-graph.mmd`, `data-lineage.mmd`, and `critical-path.mmd`.

 ### `/modernize-extract-rules <system-dir> [module-pattern]`
 Mine the business rules embedded in the legacy code — calculations, validations, eligibility, state transitions, policies — into Given/When/Then "Rule Cards" with `file:line` citations and confidence ratings. Spawns three `business-rules-extractor` agents in parallel (calculations, validations, lifecycle). Produces `analysis/<system>/BUSINESS_RULES.md` and `analysis/<system>/DATA_OBJECTS.md`.

 ### `/modernize-brief <system-dir> [target-stack]`
-Synthesize the discovery artifacts into a phased **Modernization Brief** — the single document a steering committee approves and engineering executes: target architecture, strangler-fig phase plan with entry/exit criteria, persona-based business walkthroughs (the section non-technical approvers actually read), behavior contract, validation strategy, open questions, and an approval block. Reads `ASSESSMENT.md`, `TOPOLOGY.html`, and `BUSINESS_RULES.md` and **stops if any are missing** — run the discovery commands first. Produces `analysis/<system>/MODERNIZATION_BRIEF.md` and enters plan mode as a human-in-the-loop gate.
+Synthesize the discovery artifacts into a phased **Modernization Brief** — the single document a steering committee approves and engineering executes: target architecture, strangler-fig phase plan with entry/exit criteria, behavior contract, validation strategy, open questions, and an approval block. Reads `ASSESSMENT.md`, `TOPOLOGY.html`, and `BUSINESS_RULES.md` and **stops if any are missing** — run the discovery commands first. Produces `analysis/<system>/MODERNIZATION_BRIEF.md` and enters plan mode as a human-in-the-loop gate.

 ### `/modernize-reimagine <system-dir> <target-vision>`
 Greenfield rebuild from extracted intent rather than a structural port. Mines a spec (`analysis/<system>/AI_NATIVE_SPEC.md`), designs a target architecture and has it adversarially reviewed (`analysis/<system>/REIMAGINED_ARCHITECTURE.md`), then **scaffolds services with executable acceptance tests** under `modernized/<system>-reimagined/` and writes a `CLAUDE.md` knowledge handoff for the new system. Two human-in-the-loop checkpoints. Spawns `business-rules-extractor`, `legacy-analyst` (×2), `architecture-critic`, and general-purpose scaffolding agents.
@@ -57,9 +46,6 @@ Greenfield rebuild from extracted intent rather than a structural port. Mines a
 ### `/modernize-transform <system-dir> <module> <target-stack>`
 Surgical, single-module strangler-fig rewrite. Plans first (HITL gate), then writes characterization tests via `test-engineer`, then an idiomatic target implementation under `modernized/<system>/<module>/`, proves equivalence by running the tests, and produces `TRANSFORMATION_NOTES.md` mapping legacy → modern with deliberate deviations called out. Reviewed by `architecture-critic`.

-### `/modernize-status <system-dir>`
-Read-only progress report: artifact inventory with timestamps per workflow stage, staleness flags (e.g. a brief older than the assessment it was built from), secrets-hygiene checks (quarantine file gitignored and never committed), and the single most useful next command. Run it anytime you come back to a modernization after a break.
-
 ### `/modernize-harden <system-dir>`
 Security hardening pass on the **legacy** system: OWASP/CWE scan, dependency CVEs, secrets, injection. Spawns `security-auditor`. Produces `analysis/<system>/SECURITY_FINDINGS.md` ranked Critical / High / Medium / Low and a reviewed `analysis/<system>/security_remediation.patch` with minimal fixes for the Critical/High findings. The patch is reviewed by a second `security-auditor` pass before you see it. **Never edits `legacy/`** — you review and apply the patch yourself when ready, then re-run to verify. Useful as a pre-modernization step when the legacy system will keep running in production during the migration.

@@ -95,21 +81,17 @@ This plugin ships commands and agents, but modernization projects benefit from a
      "Edit(modernized/**)"
    ],
    "deny": [
-      "Edit(legacy/**)",
-      "Write(legacy/**)"
+      "Edit(legacy/**)"
    ]
  }
 }
 ```

-Adjust `legacy/` and `modernized/` to match your actual layout. The key invariants: `Edit`/`Write` under `legacy/` are denied, and writes are scoped to `analysis/` (for documents) and `modernized/` (for the new code). Note this guards the file tools — shell commands that mutate files (`sed -i`, `git apply`) still go through the normal Bash permission prompt, so review those prompts with the same invariant in mind. Every command in this plugin respects this — `/modernize-harden` writes a patch to `analysis/` rather than editing `legacy/` in place.
+Adjust `legacy/` and `modernized/` to match your actual layout. The key invariants: `Edit` under `legacy/` is denied, and writes are scoped to `analysis/` (for documents) and `modernized/` (for the new code). Every command in this plugin respects this — `/modernize-harden` writes a patch to `analysis/` rather than editing `legacy/` in place.

 ## Typical Workflow

 ```bash
-# 0. Check the environment is ready (tools, toolchain, source completeness)
-/modernize-preflight billing
-
 # 1. Inventory the legacy system (or sweep a portfolio of them)
 /modernize-assess billing

@@ -130,9 +112,6 @@ Adjust `legacy/` and `modernized/` to match your actual layout. The key invarian

 # 6. Security-harden the legacy system that's still in production
 /modernize-harden billing
-
-# Anytime: where am I, what's stale, what's next
-/modernize-status billing
 ```

 ## License
--- a/plugins/code-modernization/assets/topology-viewer-screenshot.jpg
+++ b/plugins/code-modernization/assets/topology-viewer-screenshot.jpg
--- a/plugins/code-modernization/assets/topology-viewer.html
+++ b/plugins/code-modernization/assets/topology-viewer.html
--- a/plugins/code-modernization/commands/modernize-brief.md
+++ b/plugins/code-modernization/commands/modernize-brief.md
@@ -8,19 +8,10 @@ single document a steering committee approves and engineering executes.

 Target stack: `$2` (if blank, recommend one based on the assessment findings).

-Read `analysis/$1/ASSESSMENT.md`, `analysis/$1/topology.json` (plus the
-`.mmd` files alongside it — do NOT read `TOPOLOGY.html`, it's an
-interactive viewer with the data minified inside), and
-`analysis/$1/BUSINESS_RULES.md` first. If any are missing, say so and
-stop — they come from `/modernize-assess`, `/modernize-map`, and
-`/modernize-extract-rules` respectively. Run those first.
-
-**Staleness check:** compare modification times. If any input is newer
-than an existing `MODERNIZATION_BRIEF.md`, the brief is being justifiably
-regenerated; but if an existing brief is newer than all inputs and the
-user re-ran this command anyway, ask what changed. Either way, note the
-input timestamps in the brief's header so reviewers can see what it was
-built from.
+Read `analysis/$1/ASSESSMENT.md`, `analysis/$1/TOPOLOGY.html` (and the `.mmd`
+files alongside it), and `analysis/$1/BUSINESS_RULES.md` first. If any are
+missing, say so and stop — they come from `/modernize-assess`, `/modernize-map`,
+and `/modernize-extract-rules` respectively. Run those first.

 ## The Brief

@@ -40,38 +31,28 @@ fewest-dependencies first. For each phase:
 - Scope (which legacy modules, which target services)
 - Entry criteria (what must be true to start)
 - Exit criteria (what tests/metrics prove it's done)
- Estimated effort (person-months, same unit as the assessment's COCOMO
-  figure — convert deliberately if you present weeks)
+- Estimated effort (person-weeks, derived from COCOMO + complexity data)
 - Risk level + top 2 risks + mitigation

 Render the phases as a Mermaid `gantt` chart.

-### 4. Business Walkthroughs
-For each persona flow in `analysis/$1/topology.json` (`flows` — produced
-by `/modernize-map`), a short narrative table: persona, what happens in
-business language, which legacy modules implement it today, and which
-phase from §3 replaces each. This is the section non-technical approvers
-actually read — it connects "Phase 2" to "what happens when a customer
-files a claim". If topology.json has no flows, derive 2–3 walkthroughs
-from the entry points and say they need SME confirmation.
-
-### 5. Behavior Contract
+### 4. Behavior Contract
 List the **P0 rules** from BUSINESS_RULES.md (the ones tagged `Priority: P0` —
 money, regulatory, data integrity) that MUST be proven equivalent before any
 phase ships. These become the regression suite. Flag any P0 rule with
 Confidence < High as a blocker requiring SME confirmation before its phase
 starts.

-### 6. Validation Strategy
+### 5. Validation Strategy
 State which combination applies: characterization tests, contract tests,
 parallel-run / dual-execution diff, property-based tests, manual UAT.
 Justify per phase.

-### 7. Open Questions
+### 6. Open Questions
 Anything requiring human/SME decision before Phase 1 starts. Each as a
 checkbox the approver must tick.

-### 8. Approval Block
+### 7. Approval Block
 ```
 Approved by: ________________  Date: __________
 Approval covers: Phase 1 only | Full plan
@@ -79,7 +60,6 @@ Approval covers: Phase 1 only | Full plan

 ## Present

-Present a summary of the brief and **stop — write nothing further until
-the user explicitly approves** (use plan mode if the session supports
-it). This gate is the human-in-the-loop control point; "no objection" is
-not approval.
+Enter **plan mode** and present a summary of the brief. Do NOT proceed to any
+transformation until the user explicitly approves. This gate is the
+human-in-the-loop control point.
--- a/plugins/code-modernization/commands/modernize-map.md
+++ b/plugins/code-modernization/commands/modernize-map.md
@@ -55,124 +55,50 @@ re-run and audited. Have it write a machine-readable
 `analysis/$1/topology.json` and print a human summary. Run it; show the
 summary (cap at ~200 lines for very large estates).

-`topology.json` must follow this schema — it feeds the interactive viewer:
-
-```json
-{
-  "system": "<display name>",
-  "root": {
-    "id": "sys", "name": "<system>", "kind": "system",
-    "children": [
-      { "id": "dom:<domain>", "name": "<Domain>", "kind": "domain",
-        "children": [
-          { "id": "<MODULE>", "name": "<MODULE>", "kind": "module",
-            "language": "cobol", "loc": 1234, "file": "src/MODULE.cbl" }
-        ] },
-      { "id": "dom:data", "name": "Data stores", "kind": "domain",
-        "children": [
-          { "id": "ds:<NAME>", "name": "<NAME>", "kind": "datastore" }
-        ] }
-    ]
-  },
-  "edges": [
-    { "source": "<id>", "target": "<id>", "kind": "call" }
-  ],
-  "entryPoints": ["<id>", "..."],
-  "deadEnds": ["<id>", "..."],
-  "observations": ["<architect observation>", "..."],
-  "flows": [
-    { "name": "<business flow>", "persona": "<who experiences it>",
-      "description": "<one sentence, plain language>",
-      "steps": [
-        { "label": "<business-language step>", "nodes": ["<id>", "<id>"] }
-      ] }
-  ]
-}
-```
-
- Group leaf modules under `domain` containers (use the domains from
-  `/modernize-assess` if available). Leaf kinds: `module`, `datastore`,
-  `job`, `screen`. `loc` drives circle size — include it for modules.
- Edge kinds: `call` (direct), `dispatch` (dynamic/router), `read`,
-  `write`. Every edge endpoint must be a leaf id that exists in the tree.
- `deadEnds`: the dead-end candidates from the extraction, rendered with
-  a dashed outline in the viewer. Apply the suppression rules above —
-  anything that could be the target of an unresolved dynamic call does
-  NOT belong here; record that uncertainty in `observations` instead.
- **Datastore ids and names must be logical identifiers** — DD name,
-  dataset name, table/schema name, at most host:port. If the resolved
-  config value is a URL or DSN, strip userinfo and credential query
-  params before it goes anywhere in topology.json: the file gets
-  committed and the viewer displays names verbatim. Never copy raw
-  config values into `observations`.
- `observations`: 3–7 architect observations — tight coupling clusters,
-  single points of failure, service-extraction candidates, data stores
-  with too many writers, dispatch targets the extraction could not
-  resolve.
- `flows` is the **persona walkthrough** section — see below.
-
-## Persona flows
-
-Trace **2–4 end-to-end business flows**, each anchored to a persona —
-the people who experience the system, not the people who maintain it
-(e.g. for a benefits system: the claimant, the caseworker, the auditor;
-for billing: the customer, the billing operator). For each flow:
-
- `name` + one-sentence `description` in plain business language —
-  something a steering committee member relates to ("a claimant files a
-  weekly claim"), not a data-flow label ("CLM batch ingest").
- `steps`: 3–8 steps, each with a business-language `label` and the
-  `nodes` (programs + data stores) that implement that step, in
-  execution order.
-
-This is the bridge between the technical map and non-technical
-stakeholders: the same diagram answers "which program does X" for
-engineers and "what happens when someone files a claim" for everyone else.
-
 ## Render

-`analysis/$1/TOPOLOGY.html` is an **interactive map**: a zoomable
-circle-pack of the whole system (domains as containers, modules sized by
-LOC) with dependency edges, search, per-node detail sidebar, edge-kind
-toggles, and a flow-walkthrough mode that plays each persona flow as a
-numbered path. Build it from the template that ships with this plugin —
-do not hand-write the viewer:
+From the extracted data, generate **three Mermaid diagrams** and write them
+to `analysis/$1/TOPOLOGY.html` as a self-contained page that renders in any
+browser.

-```bash
-python3 - "${CLAUDE_PLUGIN_ROOT}/assets/topology-viewer.html" analysis/$1 <<'EOF'
-import json, sys
-tpl_path, out_dir = sys.argv[1], sys.argv[2]
-tpl = open(tpl_path).read()
-marker = "/*__TOPOLOGY_DATA__*/ null"
-assert marker in tpl, f"injection marker not found in {tpl_path}"
-data = json.dumps(json.load(open(f"{out_dir}/topology.json")))
-open(f"{out_dir}/TOPOLOGY.html", "w").write(
-    tpl.replace(marker, "/*__TOPOLOGY_DATA__*/ " + data))
-print(f"wrote {out_dir}/TOPOLOGY.html")
-EOF
+The HTML page must use: dark `#1e1e1e` background, `#d4d4d4` text,
+`#cc785c` for `<h2>`/accents, `system-ui` font, all CSS **inline** (no
+external stylesheets). Load Mermaid from a CDN in `<head>`:
+
+```html
+<script type="module">
+  import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@11/dist/mermaid.esm.min.mjs';
+  mermaid.initialize({ startOnLoad: true, theme: 'dark' });
+</script>
 ```

-The viewer is fully self-contained (the d3 subset it needs is inlined in
-the template) — it works offline and on air-gapped networks. If the
-`python3` invocation fails to find the template,
-`${CLAUDE_PLUGIN_ROOT}` was not substituted — report that rather than
-hand-writing a viewer.
+Each diagram goes in a `<pre class="mermaid">...</pre>` block. Do **not**
+wrap diagrams in markdown ` ``` ` fences inside the HTML.

-Mermaid stays for **small, exportable** diagrams. Generate standalone
-`.mmd` files for reuse in docs and PRs — but keep each under ~40 edges;
-collapse to domain level if the full graph is bigger (dense Mermaid
-becomes unreadable, which is exactly what the interactive map is for):
+1. **`graph TD` — Module call graph.** Cluster by domain (use `subgraph`).
+   Highlight entry points in a distinct style. Cap at ~40 nodes — if larger,
+   show domain-level with one expanded domain.

- `analysis/$1/call-graph.mmd` — domain-level `graph TD`, entry points
-  highlighted
- `analysis/$1/data-lineage.mmd` — `graph LR`, programs → data stores,
-  read vs write marked
- `analysis/$1/critical-path.mmd` — `flowchart TD` of the primary flow
-  from `flows`, annotated with p50/p99 wall-clock if telemetry is
-  available (see `/modernize-assess` Step 4)
+2. **`graph LR` — Data lineage.** Programs → data stores.
+   Mark read vs write edges.
+
+3. **`flowchart TD` — Critical path.** Trace ONE end-to-end business flow
+   (e.g., "monthly billing run" or "process payment") through every program
+   and data store it touches, in execution order. If production telemetry is
+   available (see `/modernize-assess` Step 4), annotate each step with its
+   p50/p99 wall-clock.
+
+Also export the three diagrams as standalone `.mmd` files for re-use:
+`analysis/$1/call-graph.mmd`, `analysis/$1/data-lineage.mmd`,
+`analysis/$1/critical-path.mmd`.
+
+## Annotate
+
+Below each `<pre class="mermaid">` block in TOPOLOGY.html, add a `<ul>`
+with 3-5 **architect observations**: tight coupling clusters, single
+points of failure, candidates for service extraction, data stores
+touched by too many writers.

 ## Present

-Tell the user to open `analysis/$1/TOPOLOGY.html` in a browser, and to
-try: search for a module, click it to see its connections, and pick a
-persona flow from the walkthrough dropdown.
+Tell the user to open `analysis/$1/TOPOLOGY.html` in a browser.
--- a/plugins/code-modernization/commands/modernize-preflight.md
+++ b/plugins/code-modernization/commands/modernize-preflight.md
@@ -1,98 +0,0 @@
---
-description: Environment readiness check — analysis tools, build toolchain, source completeness, telemetry access
-argument-hint: <system-dir> [target-stack]
---
-
-Check whether this environment is ready to analyze — and eventually
-transform — `legacy/$1`, and tell the user exactly what to fix before the
-other commands run into it. Modernization sessions fail late and
-confusingly when this isn't done: assessment metrics silently degrade
-without analysis tools, characterization tests can't run without a build
-toolchain, and dependency maps come out wrong when half the source isn't
-in the tree.
-
-Run every check even when an early one fails — the point is one complete
-readiness report, not the first error.
-
-## Check 1 — Detect the stack
-
-Fingerprint `legacy/$1` from file extensions and manifests: languages,
-build system, deployment/config descriptors. This drives which checks
-below apply. Report what was detected and the rough file split.
-
-## Check 2 — Analysis tooling
-
-For each, check availability (`command -v`) and report version, what it's
-used for, and what degrades without it:
-
-| Tool | Used by | Without it |
-|---|---|---|
-| `scc` (or `cloc`) | assess | LOC/complexity fall back to `find`+`wc`; COCOMO estimate gets coarser |
-| `lizard` | assess --portfolio | complexity estimated from decision-keyword counts |
-| `glow` | all | markdown artifacts render as plain text |
-| `delta` | transform | side-by-side diffs fall back to `diff -y` |
-
-Include the platform's install one-liner for anything missing
-(`brew install scc`, `apt install cloc`, `pip install lizard`, …).
-
-## Check 3 — Build toolchain (smoke test, not just presence)
-
-Identify the compiler/interpreter for the detected legacy stack — e.g.
-GnuCOBOL (`cobc`) for COBOL, JDK + Maven/Gradle for Java, `cc`/`make` for
-C, `dotnet` for .NET. Then **prove it works on this codebase**: pick one
-representative source file and run a syntax-only compile
-(`cobc -fsyntax-only`, `javac`, `gcc -fsyntax-only`, …).
-
-A failed smoke test is the most valuable output of this command — report
-the actual error and diagnose it: missing copybook/include path, missing
-dialect flag (`-std=ibm` etc.), fixed vs free format, missing dependency
-jar. These are the errors that otherwise surface mid-`/modernize-transform`
-with much less context.
-
-If the user passed a `[target-stack]`, do the same for it: runtime,
-package manager, test framework (`mvn -v`, `npm -v`, `pytest --version`, …).
-
-## Check 4 — Source completeness
-
-The dependency map is only as good as what's in the tree. Check for the
-detected stack's equivalents of:
-
- **Referenced-but-missing includes** — copybooks (`COPY X` with no
-  `X.cpy`), headers, imports that resolve nowhere. Count and list the top
-  missing names.
- **Deployment/config descriptors** — JCL for batch COBOL, CICS CSD
-  definitions, `web.xml`/route configs, cron/scheduler definitions.
-  Without these, entry-point detection and the code↔storage join in
-  `/modernize-map` are guesswork.
- **Data definitions** — DDL, schemas, copybook record layouts, ORM
-  mappings.
- **Binary-only artifacts** — load modules, jars, DLLs with no matching
-  source. These become unmappable black boxes; flag them now.
-
-## Check 5 — Optional context
-
- **Production telemetry** — is an observability/APM MCP server connected,
-  or are batch job logs / runtime exports available? (Enables the runtime
-  overlay in `/modernize-assess` Step 4 and timing annotations in
-  `/modernize-map`.)
- **Version control history** — is `legacy/$1` under git with meaningful
-  history? (Change-frequency data sharpens risk ranking.)
-
-## Report
-
-Write `analysis/$1/PREFLIGHT.md`: a status table — one row per check,
-status ✅ / ⚠️ / ❌, what was found, and the fix for anything not green —
-followed by a **Ready / Ready-with-gaps / Not ready** verdict per command:
-
- `assess` + `map` + `extract-rules` — need Checks 1–2 green-ish and
-  Check 4's missing-include count low
- `brief` — needs only the three discovery artifacts; no tooling
- `transform` + `reimagine` — additionally need Check 3 green for the
-  **target** stack. A red legacy toolchain downgrades these to
-  Ready-with-gaps, not Not-ready: equivalence testing falls back to
-  recorded traces / golden-master fixtures instead of dual execution
-  (common and expected for CICS/IMS code that has no local runtime)
- `harden` — needs Check 2 plus any stack-specific SAST tooling found
-
-Print the table in the session too, and end with the single most
-important fix if anything is red.
--- a/plugins/code-modernization/commands/modernize-reimagine.md
+++ b/plugins/code-modernization/commands/modernize-reimagine.md
@@ -3,11 +3,7 @@ description: Multi-agent greenfield rebuild — extract specs from legacy, desig
 argument-hint: <system-dir> <target-vision>
 ---

-The first token of `$ARGUMENTS` is the system dir (`$1`); **everything
-after it is the target vision** — it is usually multiple words, so do not
-truncate it to one token. Below, `<vision>` means that full remainder.
-
-**Reimagine** `legacy/$1` as: <vision>
+**Reimagine** `legacy/$1` as: $2

 This is not a port — it's a rebuild from extracted intent. The legacy system
 becomes the *specification source*, not the structural template. This command
@@ -23,8 +19,7 @@ Spawn concurrently and show the user that all three are running:
 2. **legacy-analyst** — "Catalog every external interface of legacy/$1:
   inbound (screens, APIs, batch triggers, queues) and outbound (reports,
   files, downstream calls, DB writes). For each: name, direction, payload
-   shape, frequency/SLA if discernible. Mask any credential embedded in
-   endpoints or payload examples per your secret-handling rules."
+   shape, frequency/SLA if discernible."

 3. **legacy-analyst** — "Identify the core domain entities in legacy/$1 and
   their relationships. Return as an entity list + Mermaid erDiagram."
@@ -37,9 +32,6 @@ Collect results. Write `analysis/$1/AI_NATIVE_SPEC.md` containing:
 - **Non-functional requirements** inferred from legacy (batch windows, volumes)
 - **Behavior Contract** (the Given/When/Then rules — these are the acceptance tests)

-Credential values are masked everywhere in the spec; connection details
-appear as env-var placeholders (`${DATABASE_URL}`), never literals.
-
 ## Phase B — HITL checkpoint #1

 Present the spec summary. Ask the user **one focused question**: "Which of
@@ -48,21 +40,20 @@ should deliberately drop?" Wait for the answer. Record it in the spec.

 ## Phase C — Architecture (single agent, then critique)

-Design the target architecture for "<vision>":
+Design the target architecture for "$2":
 - Mermaid C4 Container diagram
 - Service boundaries with rationale (which rules/entities live where)
 - Technology choices with one-line justification each
 - Data migration approach from legacy stores

 Then spawn **architecture-critic**: "Review this proposed architecture for
-<vision> against the spec in analysis/$1/AI_NATIVE_SPEC.md. Identify over-engineering,
+$2 against the spec in analysis/$1/AI_NATIVE_SPEC.md. Identify over-engineering,
 missed requirements, scaling risks, and simpler alternatives." Incorporate
 the critique. Write the result to `analysis/$1/REIMAGINED_ARCHITECTURE.md`.

 ## Phase D — HITL checkpoint #2

-Present the architecture and **stop — scaffold nothing until the user
-explicitly approves** (use plan mode if the session supports it).
+Enter plan mode. Present the architecture. Wait for approval.

 ## Phase E — Parallel scaffolding

@@ -74,9 +65,7 @@ in parallel**:
 and AI_NATIVE_SPEC.md. Create: project skeleton, domain model, API stubs
 matching the interface contracts, and **executable acceptance tests** for every
 behavior-contract rule assigned to this service (mark unimplemented ones as
-expected-failure/skip with the rule ID). No credential literal from legacy
-code becomes a test fixture or config default — use fake same-shape values
-and env-var placeholders. Write to modernized/$1-reimagined/<service-name>/."
+expected-failure/skip with the rule ID). Write to modernized/$1-reimagined/<service-name>/."

 Show the agents' progress. When all complete, run the acceptance test suites
 and report: total tests, passing (scaffolded behavior), pending (rule IDs
@@ -88,9 +77,7 @@ Write `modernized/$1-reimagined/CLAUDE.md` — the persistent context file for
 the new system, containing: architecture summary, service responsibilities,
 where the spec lives, how to run tests, and the legacy→modern traceability
 map. This file IS the knowledge graph that future agents and engineers will
-load — and it gets committed: connection details and credentials appear
-only as env-var names with a pointer to where they're provisioned, never
-as values.
+load.

 Report: services scaffolded, acceptance tests defined, % behaviors with a
 home, location of all artifacts.
--- a/plugins/code-modernization/commands/modernize-status.md
+++ b/plugins/code-modernization/commands/modernize-status.md
@@ -1,54 +0,0 @@
---
-description: Where am I in the modernization workflow — artifact inventory, staleness, secrets hygiene, next step
-argument-hint: <system-dir>
---
-
-Report where the modernization of `$1` stands, in one screen. This is a
-read-only command — inspect, never modify.
-
-## 1 — Artifact inventory
-
-Check `analysis/$1/` and `modernized/$1*/` and build a table — one row per
-workflow stage, with the artifact's presence and modification time:
-
-| Stage | Artifacts |
-|---|---|
-| preflight | `PREFLIGHT.md` |
-| assess | `ASSESSMENT.md`, `ARCHITECTURE.mmd` |
-| map | `topology.json`, `TOPOLOGY.html`, `*.mmd`, `extract_topology.*` |
-| extract-rules | `BUSINESS_RULES.md`, `DATA_OBJECTS.md` |
-| brief | `MODERNIZATION_BRIEF.md` (note whether the approval block is signed) |
-| harden | `SECURITY_FINDINGS.md`, `security_remediation.patch` |
-| transform / reimagine | each `modernized/$1*/<module>/` dir — note test presence and whether `TRANSFORMATION_NOTES.md` exists |
-
-## 2 — Staleness
-
-Flag any artifact older than an upstream artifact it derives from:
-
- `MODERNIZATION_BRIEF.md` older than `ASSESSMENT.md`, `topology.json`,
-  or `BUSINESS_RULES.md` → the brief no longer reflects discovery;
-  recommend re-running `/modernize-brief`.
- `TOPOLOGY.html` older than `topology.json` → re-run the injection step
-  from `/modernize-map`.
- Any `TRANSFORMATION_NOTES.md` older than `BUSINESS_RULES.md` → the
-  module may not implement the latest rule set; list which.
-
-## 3 — Secrets hygiene
-
- Does `analysis/.gitignore` exist and cover `SECRETS.local.md` /
-  `*.local.patch`? (`git check-ignore` when in a git repo.)
- If `SECRETS.local.md` exists: confirm it is NOT tracked
-  (`git ls-files --error-unmatch`, expect failure) and has never been
-  committed (`git log --all --oneline -- <path>`, expect empty). If
-  either check fails, say so prominently and recommend rotation plus
-  history scrubbing.
-
-## 4 — Verdict
-
-End with three lines:
- **Where you are** — the furthest completed stage and roughly how much
-  of the system it covers (e.g. "mapped 100%, 2 of 14 modules
-  transformed").
- **What's stale** — or "nothing".
- **Next command** — the single most useful next step, with a one-line
-  reason.
--- a/plugins/code-modernization/commands/modernize-transform.md
+++ b/plugins/code-modernization/commands/modernize-transform.md
@@ -9,37 +9,10 @@ equivalence.
 This is a surgical, single-module transformation — one vertical slice of the
 strangler fig. Output goes to `modernized/$1/$2/`.

-## Step 0a — Toolchain check (fail fast on target, adapt on legacy)
-
-Verify the build environment **before** planning, not when the tests
-first run:
-
- **Target stack ($3) — required.** Runtime, package manager, and test
-  framework all respond (`java -version` + `mvn -v`, `node -v` + `npm -v`,
-  `python3 -V` + `pytest --version`, …). If any are missing, stop and
-  report what to install — the new code and its tests cannot run without
-  them, so a plan gate now would just defer the failure an hour. Suggest
-  `/modernize-preflight $1 $3` for the full readiness report.
- **Legacy stack — advisory, never a blocker.** Try a syntax-only compile
-  of the module being transformed (e.g. `cobc -fsyntax-only`). Legacy
-  code often *cannot* build locally by nature, not by misconfiguration —
-  CICS/IMS programs have no local translator, and the real runtime may be
-  a mainframe you don't have. A failed or impossible legacy compile does
-  **not** stop the transform; it changes the equivalence strategy:
-  - dual-execution proof is off the table — characterization tests
-    assert against **recorded traces / golden-master fixtures** (real
-    production outputs, captured reports/screens, SME-confirmed
-    examples) instead of live legacy runs
-  - say so explicitly in the Step 0b plan and later in
-    TRANSFORMATION_NOTES.md ("equivalence is trace-based; legacy was not
-    executable in this environment"), so reviewers know the strength of
-    the proof they're approving
-
-## Step 0b — Plan (HITL gate)
+## Step 0 — Plan (HITL gate)

 Read the source module and any business rules in `analysis/$1/BUSINESS_RULES.md`
-that reference it. Then present the plan and **stop — write no code until
-the user explicitly approves** (use plan mode if the session supports it):
+that reference it. Then **enter plan mode** and present:
 - Which source files are in scope
 - The target module structure (packages/classes/files you'll create)
 - Which business rules / behaviors this module implements
@@ -57,9 +30,7 @@ identify every observable behavior, and encode each as a test case with
 concrete input → expected output pairs derived from the legacy logic.
 Target framework: <appropriate for $3>. Write to
 `modernized/$1/$2/src/test/`. These tests define 'done' — the new code
-must pass all of them. Follow your secret-handling rules: no credential
-literal from legacy code becomes a fixture; substitute fake same-shape
-values and read anything genuinely live from environment variables."
+must pass all of them."

 Show the user the test file. Get a 👍 before proceeding.

@@ -97,10 +68,6 @@ Then show a visual diff of one representative behavior, legacy vs modern:
 ```bash
 delta --side-by-side <(sed -n '<lines>p' legacy/$1/<file>) modernized/$1/$2/src/main/<file>
 ```
-(Fall back to `diff -y --width=160` if `delta` isn't installed.) Never
-pick a credential-bearing line range for this diff, and mask any
-credential-like literal quoted in TRANSFORMATION_NOTES.md — the notes
-live in `modernized/` and get committed.

 ## Step 5 — Architecture review

--- a/plugins/security-guidance/.claude-plugin/plugin.json
+++ b/plugins/security-guidance/.claude-plugin/plugin.json
@@ -1,6 +1,6 @@
 {
  "name": "security-guidance",
-  "version": "2.0.3",
+  "version": "2.0.0",
  "description": "Security review for Claude-generated code. Pattern-based warnings on edits, LLM-powered diff review on Stop, and an agentic commit reviewer that catches injection, XSS, SSRF, hardcoded secrets, and 25+ other vulnerability classes.",
  "author": {
    "name": "David Dworken",
--- a/plugins/security-guidance/hooks/_base.py
+++ b/plugins/security-guidance/hooks/_base.py
@@ -10,42 +10,15 @@ import os
 import threading
 from datetime import datetime

-def state_dir():
-    """Return the absolute path of the plugin's state directory.
-
-    Resolution precedence (highest first):
-      1. SECURITY_WARNINGS_STATE_DIR — plugin-specific override (existing)
-      2. CLAUDE_CONFIG_DIR/security  — CC's config-dir env var (#1868)
-      3. ~/.claude/security          — default fallback
-
-    Empty-string env vars are treated as not-set so a misconfigured shell
-    (`CLAUDE_CONFIG_DIR=` with no value) doesn't silently write to
-    /security at the filesystem root.
-
-    Returns a fully-expanded absolute path (no literal `~`) so subprocess
-    callers can pass it through to code that doesn't re-expand tildes.
-
-    Called per-invocation rather than cached at import time so test
-    monkeypatches of the env vars take effect — the plugin's hooks each
-    run as fresh subprocesses in production, so the per-call cost is
-    negligible compared to subprocess spawn.
-    """
-    explicit = os.environ.get("SECURITY_WARNINGS_STATE_DIR")
-    if explicit:
-        return os.path.expanduser(explicit)
-    cc_config = os.environ.get("CLAUDE_CONFIG_DIR")
-    if cc_config:
-        return os.path.expanduser(os.path.join(cc_config, "security"))
-    return os.path.expanduser("~/.claude/security")
-
-
 # Debug log file. Lives under the plugin state dir (default ~/.claude/security/)
 # rather than /tmp because /tmp is world-writable on multi-user hosts (TOCTOU /
 # symlink-attack surface, cross-user log leakage). Overridable per-process via
-# SECURITY_GUIDANCE_DEBUG_LOG, or per-state-dir via SECURITY_WARNINGS_STATE_DIR
-# (plugin-specific override) or CLAUDE_CONFIG_DIR (CC-wide config dir, #1868).
+# SECURITY_GUIDANCE_DEBUG_LOG, or per-state-dir via SECURITY_WARNINGS_STATE_DIR.
+_DEFAULT_STATE_DIR = os.path.expanduser(
+    os.environ.get("SECURITY_WARNINGS_STATE_DIR") or "~/.claude/security"
+)
 DEBUG_LOG_FILE = os.environ.get("SECURITY_GUIDANCE_DEBUG_LOG") or os.path.join(
-    state_dir(), "log.txt"
+    _DEFAULT_STATE_DIR, "log.txt"
 )
 # Cap the debug log so parallel-worker fleets don't fill disk. When the active
 # file exceeds this it's atomically rotated to <file>.1 (overwriting any prior
@@ -116,18 +89,7 @@ _PV = _read_plugin_version_int()
 # Emitted via _usage_metrics() into the existing emit_metrics() channel so
 # hook metrics rows carry per-invocation token/cost totals
 # alongside the existing skip_reason / vulns_found fields.
-_USAGE = {
-    "in": 0, "out": 0, "cr": 0, "cw": 0, "cost": 0.0, "n": 0,
-    # HTTP error visibility (#2098 visibility gap — see emit comment in
-    # _usage_metrics). Without this, API failures from `_call_claude` left
-    # zero fingerprint in telemetry: the call returns None, the caller's
-    # emit_metrics carries no api_calls field, and the failure is
-    # indistinguishable from "no review needed". The deprecation outage
-    # that broke every commit-review LLM call was invisible until users
-    # reported it manually.
-    "http_err_last": 0,    # most recent HTTP error code this invocation
-    "http_err_count": 0,   # total HTTP errors (4xx + 5xx + network)
-}
+_USAGE = {"in": 0, "out": 0, "cr": 0, "cw": 0, "cost": 0.0, "n": 0}
 _USAGE_LOCK = threading.Lock()

 # $/Mtok (input, output). Used only for the raw-HTTP path; the SDK path
@@ -177,55 +139,19 @@ def _record_usage(usage, model, cost_usd=None):
        _USAGE["n"] += 1


-def _record_http_error(status):
-    """Record an HTTP error from an LLM API call. `status` is the HTTP
-    status code (integer 400–599) or -1 for network/timeout errors. Stored
-    in `_USAGE["http_err_last"]` (most recent) and counted in
-    `_USAGE["http_err_count"]`. Snapshot via `_usage_metrics()` so every
-    subsequent `emit_metrics` includes the failure fingerprint.
-
-    Background: without this, the most recent example was the #2098
-    deprecation 400. Every hook fire's LLM call returned HTTP 400; the
-    plugin caught it and returned None; the emit_metrics carried no
-    api_calls field; aggregate dashboards looked normal. The failure
-    only became visible when a user manually reported errors out of
-    their debug log. With this field, a category-of-failure spike (4xx,
-    5xx, or -1 network) is queryable from BQ in real time.
-    """
-    try:
-        s = int(status)
-    except (TypeError, ValueError):
-        return
-    with _USAGE_LOCK:
-        _USAGE["http_err_last"] = s
-        _USAGE["http_err_count"] += 1
-
-
 def _usage_metrics():
    """Snapshot the accumulator as metric keys. Returns {} when no API calls
-    AND no HTTP errors were made so skip-path emits don't burn key budget.
-    cost_usd rounded to 1e-6 to keep the float finite/short for the zod
-    schema.
-
-    HTTP errors (`http_err_last`, `http_err_count`) emitted ONLY when
-    `http_err_count > 0` so successful calls don't pad every metrics row
-    with two zero fields.
-    """
+    were made so skip-path emits don't burn key budget. cost_usd rounded to
+    1e-6 to keep the float finite/short for the zod schema."""
    with _USAGE_LOCK:
-        if _USAGE["n"] == 0 and _USAGE["http_err_count"] == 0:
+        if _USAGE["n"] == 0:
            return {}
-        out = {}
-        if _USAGE["n"] > 0:
-            out.update({
-                "tok_in": _USAGE["in"],
-                "tok_out": _USAGE["out"],
-                "tok_cache_r": _USAGE["cr"],
-                "tok_cache_w": _USAGE["cw"],
-                "cost_usd": round(_USAGE["cost"], 6),
-                "api_calls": _USAGE["n"],
-            })
-        if _USAGE["http_err_count"] > 0:
-            out["http_err_last"] = _USAGE["http_err_last"]
-            out["http_err_count"] = _USAGE["http_err_count"]
-        return out
+        return {
+            "tok_in": _USAGE["in"],
+            "tok_out": _USAGE["out"],
+            "tok_cache_r": _USAGE["cr"],
+            "tok_cache_w": _USAGE["cw"],
+            "cost_usd": round(_USAGE["cost"], 6),
+            "api_calls": _USAGE["n"],
+        }

--- a/plugins/security-guidance/hooks/diffstate.py
+++ b/plugins/security-guidance/hooks/diffstate.py
@@ -355,9 +355,9 @@ def _list_untracked(cwd):
    the holdouts."""
    try:
        repo = _git_toplevel(cwd) or cwd
-        # core.quotePath=false comes from GIT_CMD globally (see gitutil.py).
        r = subprocess.run(
-            [*GIT_CMD, "ls-files", "--others", "--exclude-standard", "-z"],
+            [*GIT_CMD, "-c", "core.quotePath=false", "ls-files",
+             "--others", "--exclude-standard", "-z"],
            cwd=repo, capture_output=True, timeout=15,
        )
        if r.returncode != 0:
--- a/plugins/security-guidance/hooks/ensure_agent_sdk.py
+++ b/plugins/security-guidance/hooks/ensure_agent_sdk.py
@@ -23,12 +23,6 @@ import sys
 import time
 from pathlib import Path

-# Shared state-dir resolver: SECURITY_WARNINGS_STATE_DIR → CLAUDE_CONFIG_DIR/security
-# → ~/.claude/security. See _base.state_dir for resolution precedence. Re-aliased
-# here to match the existing local name (state_dir was already a local var in
-# main() and _maybe_emit_user_notice).
-from _base import state_dir as _resolve_state_dir
-
 # Outcome codes for the sdk_bootstrap metric. Values are stable for telemetry.
 NOOP_SYSTEM = 0      # claude_agent_sdk already importable in system python
 NOOP_VENV = 1        # venv already built and SDK imports from it
@@ -42,122 +36,6 @@ HOOK_PY_INCOMPATIBLE = 6  # hook interpreter is <3.10 — SDK syntax can't load
                          # here no matter how the venv was built. See #2071.


-# Phase + err-kind integer encoding for sdk_bootstrap_phase / sdk_bootstrap_err.
-#
-# Earlier versions emitted these as STRINGS (e.g. "pip", "dns_fail"). CC's
-# plugin-metrics pipeline silently drops plugin-emitted string values —
-# only `bool|finite-number` plugin metrics reach BigQuery. (CC-core
-# metrics like `subscription_type` are exempt because they're injected
-# downstream of plugin validation.) Confirmed empirically: 185K
-# BUILD_FAILED rows in BQ had `sdk_bootstrap_phase`/`sdk_bootstrap_err`
-# = NULL despite the Python code emitting them. This left ~28K
-# BUILD_FAILED sessions/day with no diagnostic split — flying blind on
-# the real failure modes (pip-no-match vs dns-fail vs ssl-verify etc.).
-#
-# Fix: encode as small integers per the maps below. Values are
-# APPEND-ONLY for telemetry stability. Reserve 99 as the "unknown /
-# uncategorized" bucket so an unmapped err_kind (e.g., a new exception
-# type) still emits a non-zero signal.
-SDK_BOOTSTRAP_PHASE_CODES = {
-    "pre":  1,  # pre-venv (state_dir.mkdir, sentinel open)
-    "venv": 2,  # python -m venv --clear
-    "pip":  3,  # pip install
-    "main": 4,  # uncaught exception above main()
-}
-SDK_BOOTSTRAP_ERR_CODES = {
-    "pip_no_match":         1,
-    "dns_fail":             2,
-    "conn_refused":         3,
-    "ssl_verify":           4,
-    "perm_denied":          5,
-    "no_pip":               6,
-    "disk_full":            7,
-    "proxy_auth":           8,
-    "stderr_timeout":       9,   # pip stderr containing "timeout"/"timed out"
-    "subprocess_timeout":   10,  # subprocess.TimeoutExpired (>120s)
-    # Venv-stage specific categories added after PR #2112 telemetry surfaced
-    # 2,406 phase=2/err=99 sessions in the first 3h of v2.0.1 — venv phase
-    # failing in ways the original pip-flavored patterns didn't catch. These
-    # all split out of what was previously collapsing to _uncategorized.
-    "venv_ensurepip_fail":  11,  # Debian/Ubuntu missing python3-venv;
-                                 # stderr mentions ensurepip non-zero exit
-                                 # or "ensurepip is not available"
-    "venv_path_too_long":   12,  # Windows MAX_PATH (260) or POSIX
-                                 # ENAMETOOLONG — venv writes deep paths
-                                 # under state_dir/agent-sdk-venv/Lib/...
-    "venv_no_module":       13,  # `python3 -m venv` itself missing — "No
-                                 # module named 'venv'" / "No module named venv"
-    "venv_already_exists":  14,  # Errno 17 / "file exists" — sentinel race
-                                 # past O_EXCL or stale dir survived --clear
-    "venv_setup_failed":    15,  # Generic "virtual environment was not
-                                 # created successfully" — catches the long
-                                 # tail of venv setup failures that don't
-                                 # match a more specific category above
-    # 16–98 reserved for future categories; APPEND-ONLY.
-    # 99 catches everything else (including "exc:<TypeName>" and "other:<tail>"
-    # — the original string is debug-loggable but the integer is what makes
-    # it to telemetry). For the "other:" tail, `sdk_bootstrap_stderr_sig`
-    # carries a bounded integer hash so we can still distinguish patterns
-    # in BQ aggregation.
-    "_uncategorized":       99,
-}
-
-
-def _encode_phase(s):
-    """Map err_phase string to its telemetry integer code, or 0 if unset.
-    Empty/None → 0 lets `if encoded:` cleanly skip emission. Per
-    SDK_BOOTSTRAP_PHASE_CODES, valid codes are 1-4."""
-    return SDK_BOOTSTRAP_PHASE_CODES.get((s or "").strip(), 0)
-
-
-def _encode_err_kind(s):
-    """Map err_kind string to its telemetry integer code, or 0 if unset.
-    Direct hits use the static map; "exc:<X>" and "other:<tail>" both
-    collapse to _uncategorized (99) — the raw string survives in debug
-    logs, only the integer reaches BQ."""
-    s = (s or "").strip()
-    if not s:
-        return 0
-    if s in SDK_BOOTSTRAP_ERR_CODES:
-        return SDK_BOOTSTRAP_ERR_CODES[s]
-    # Prefix matches for the catch-all categories
-    if s.startswith("exc:") or s.startswith("other:") or s == "other":
-        return SDK_BOOTSTRAP_ERR_CODES["_uncategorized"]
-    # Unknown string — still emit as uncategorized rather than dropping
-    return SDK_BOOTSTRAP_ERR_CODES["_uncategorized"]
-
-
-def _encode_stderr_sig(err_kind):
-    """Bounded integer hash of the stderr tail captured in "other:<tail>"
-    err_kinds. Lets us distinguish patterns INSIDE the _uncategorized
-    (code 99) bucket without unbounded cardinality.
-
-    Returns 0 for non-"other:" err_kinds (so the field auto-omits from
-    emit_metrics on categorized failures — see the emit block in main()).
-
-    Strategy: take the tail's first ~30 chars (post-lowercase, post-trim),
-    SHA-1, fold the first 2 bytes to 0–999. Different stderr messages
-    cluster into different buckets; same stderr always maps to the same
-    bucket. Cardinality is bounded at 1000, well below any "high
-    cardinality" alarm — and a real failure mode typically produces
-    near-identical stderr across thousands of machines, so 1000 buckets
-    is comfortably wide.
-
-    Why first ~30 chars: stderr like "ERROR: Command failed: <full
-    path>" varies the tail wildly (paths) but the categorization signal
-    is in the leading words. Dropping the suffix focuses the hash on
-    the discriminative part.
-    """
-    if not err_kind or not err_kind.startswith("other:"):
-        return 0
-    import hashlib
-    tail = err_kind[len("other:"):].strip().lower()[:30]
-    if not tail:
-        return 0
-    h = hashlib.sha1(tail.encode("utf-8", errors="replace")).digest()
-    return int.from_bytes(h[:2], "big") % 1000
-
-
 def _sdk_on_syspath() -> bool:
    # find_spec is ~10ms; actually importing the SDK pulls in
    # transitive deps and costs ~800ms — too heavy for a
@@ -212,7 +90,10 @@ def main() -> tuple[int, str, str]:
    if _sdk_on_syspath():
        return NOOP_SYSTEM, "", ""

-    state_dir = Path(_resolve_state_dir())
+    state_dir = Path(
+        os.environ.get("SECURITY_WARNINGS_STATE_DIR")
+        or os.path.expanduser("~/.claude/security")
+    )
    venv = state_dir / "agent-sdk-venv"
    # Windows venvs put the interpreter at Scripts\python.exe; POSIX uses bin/python.
    if sys.platform == "win32":
@@ -296,34 +177,7 @@ def main() -> tuple[int, str, str]:
        else:
            stderr_str = str(stderr_b)
        s = stderr_str.lower()
-        # Venv-specific patterns checked FIRST — they overlap with some pip
-        # patterns (e.g. "no module named ensurepip" could match no_pip OR
-        # venv_ensurepip_fail; the venv-stage interpretation is the right
-        # one when err_phase=="venv"). Order is venv-most-specific →
-        # pip-historical → generic.
-        if err_phase == "venv" and (
-            "ensurepip is not available" in s
-            or ("ensurepip" in s and "returned non-zero" in s)
-            or "the virtual environment was not created" in s and "ensurepip" in s
-        ):
-            err_kind = "venv_ensurepip_fail"
-        elif err_phase == "venv" and (
-            "[errno 36]" in s
-            or "file name too long" in s
-            or "path too long" in s
-        ):
-            err_kind = "venv_path_too_long"
-        elif err_phase == "venv" and (
-            "no module named venv" in s
-            or "no module named 'venv'" in s
-        ):
-            err_kind = "venv_no_module"
-        elif err_phase == "venv" and (
-            "[errno 17]" in s
-            or ("file exists" in s and "venv" in s)
-        ):
-            err_kind = "venv_already_exists"
-        elif "no matching distribution" in s or "could not find a version" in s:
+        if "no matching distribution" in s or "could not find a version" in s:
            err_kind = "pip_no_match"
        elif "name or service not known" in s or "name resolution" in s \
                or "nodename nor servname" in s or "temporary failure in name" in s:
@@ -342,15 +196,6 @@ def main() -> tuple[int, str, str]:
            err_kind = "proxy_auth"
        elif "timeout" in s or "timed out" in s:
            err_kind = "stderr_timeout"
-        elif err_phase == "venv" and (
-            "virtual environment was not created" in s
-            or "error: command" in s and "venv" in s
-        ):
-            # Generic venv-setup catch-all — matched AFTER the more specific
-            # venv patterns above so we don't shadow them, but BEFORE the
-            # other: fallback so generic venv setup failures get their own
-            # bucket instead of polluting the long-tail signature space.
-            err_kind = "venv_setup_failed"
        else:
            # First 60 chars of the last non-empty stderr line — bounded to
            # stay inside CC's metric value-length budget. Real failure modes
@@ -394,7 +239,10 @@ def _maybe_emit_user_notice(outcome: int, pv: int) -> str | None:
    if outcome != HOOK_PY_INCOMPATIBLE:
        return None
    try:
-        state_dir = Path(_resolve_state_dir())
+        state_dir = Path(
+            os.environ.get("SECURITY_WARNINGS_STATE_DIR")
+            or os.path.expanduser("~/.claude/security")
+        )
        marker = state_dir / f".agentic_unavailable_notice_v{pv or 0}"
        if marker.exists():
            return None
@@ -440,33 +288,21 @@ if __name__ == "__main__":
    # and takes the FIRST non-{"async":...} JSON line as the hook response;
    # its `metrics` key is forwarded to the hook metrics event on the
    # next attachments pass. Must be a single line — the registry splits on
-    # \n and json-parses each independently.
-    #
-    # IMPORTANT — values must be bool|finite-number. The validation comment
-    # has historically said "or short strings" but that was wrong: CC's
-    # plugin-metrics pipeline silently drops plugin-emitted string values.
-    # Stay inside the 10-key emit cap.
+    # \n and json-parses each independently. Values must be bool|number OR
+    # short strings (CC accepts string metric values if they're not
+    # null). Stay inside the 10-key emit cap.
    metrics: dict[str, object] = {
        "sdk_bootstrap": outcome,
        "sdk_bootstrap_ms": round((time.perf_counter() - t0) * 1000),
    }
    if err_kind:
-        # Encode phase + err_kind as integer codes (see
-        # SDK_BOOTSTRAP_PHASE_CODES / SDK_BOOTSTRAP_ERR_CODES). Earlier
-        # versions emitted these as strings and CC dropped them — restoring
-        # the diagnostic split that 28K BUILD_FAILED/day need to triage by
-        # root cause. err_phase defaults to "pre" when empty (pre-venv
-        # failure path, e.g. state_dir.mkdir perm-denied).
-        metrics["sdk_bootstrap_phase"] = _encode_phase(err_phase or "pre")
-        metrics["sdk_bootstrap_err"] = _encode_err_kind(err_kind)
-        # For "other:<tail>" (encoded err==99), emit a bounded integer
-        # hash of the stderr tail so BQ can distinguish patterns inside
-        # the _uncategorized bucket without unbounded cardinality. Zero
-        # when err_kind is categorized — the schema reader treats 0 as
-        # "no signal", matching the absence convention.
-        sig = _encode_stderr_sig(err_kind)
-        if sig:
-            metrics["sdk_bootstrap_stderr_sig"] = sig
+        # Truncate defensively; categorized values are <40 chars but the
+        # `other:<tail>` mode could be longer. err_phase may be empty for
+        # pre-venv failures (state_dir.mkdir perm-denied, sentinel O_EXCL
+        # raising a non-FileExistsError OSError) — emit as "pre" so the
+        # err_kind isn't silently dropped.
+        metrics["sdk_bootstrap_phase"] = (err_phase or "pre")[:16]
+        metrics["sdk_bootstrap_err"] = err_kind[:96]
    pv = _plugin_version_int()
    if pv:
        metrics["pv"] = pv
--- a/plugins/security-guidance/hooks/gitutil.py
+++ b/plugins/security-guidance/hooks/gitutil.py
@@ -26,34 +26,18 @@ GIT_CMD = [
    "git",
    "-c", "core.fsmonitor=false",
    "-c", "core.hooksPath=/dev/null",
-    # core.quotePath=false: emit raw UTF-8 in path-emitting commands instead
-    # of C-quoting non-ASCII bytes (default `"\\303\\201vila/..."` vs
-    # `Ávila/...`). Downstream parsers — both ours (parse_diff_into_files,
-    # extract_file_paths_from_diff) and Python stdlib (os.path.isabs,
-    # os.path.join) — expect raw paths and silently drop / mishandle the
-    # quoted form. Adding the flag globally to GIT_CMD covers every
-    # subprocess.run site that uses the splat — diff feeders, rev-parse
-    # path queries (--show-toplevel, --git-dir, --git-common-dir),
-    # reflog %gs subjects, ls-files, status, etc. — without per-site
-    # flag duplication. See #2082, #2099.
-    "-c", "core.quotePath=false",
 ]


 def _git_rev_parse_head(cwd):
    """Return the current HEAD SHA, or None if not a git repo / no commits."""
    try:
-        # See #2099: text=True on Windows cp1252 crashes the reader thread on
-        # any UTF-8 byte undefined in cp1252 (e.g. via a git error message
-        # referencing a non-ASCII filename in stderr). stdout is a SHA so it
-        # IS safe; stderr is not. capture_output=True with bytes-by-default
-        # never decodes, so the reader thread can't crash.
        result = subprocess.run(
            [*GIT_CMD, "rev-parse", "HEAD"],
-            cwd=cwd, capture_output=True, timeout=5
+            cwd=cwd, capture_output=True, text=True, timeout=5
        )
        if result.returncode == 0 and result.stdout.strip():
-            return result.stdout.decode("utf-8", errors="replace").strip()
+            return result.stdout.strip()
        return None
    except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
        return None
@@ -68,17 +52,13 @@ def _find_git_index(cwd):
    Returns the absolute path to the index file, or None.
    """
    try:
-        # See #2099: stdout here is a PATH which can contain non-ASCII bytes
-        # (e.g. C:\אבטחה\repo\.git). text=True decodes via cp1252 strict on
-        # Windows → crashes the reader thread → returns stdout=None →
-        # caller does .strip() on None → AttributeError. Decode manually.
        result = subprocess.run(
            [*GIT_CMD, "rev-parse", "--git-dir"],
-            cwd=cwd, capture_output=True, timeout=5
+            cwd=cwd, capture_output=True, text=True, timeout=5
        )
        if result.returncode != 0:
            return None
-        git_dir = result.stdout.decode("utf-8", errors="replace").strip()
+        git_dir = result.stdout.strip()
        if not os.path.isabs(git_dir):
            git_dir = os.path.join(cwd, git_dir)
        index_path = os.path.join(git_dir, "index")
@@ -148,13 +128,9 @@ def _temp_index(cwd, untracked_paths=None):
        else:
            add_args = None
        if add_args:
-            # No stdout used here (only returncode matters), but text=True
-            # still spawns reader threads that decode stderr — git error
-            # messages can reference non-ASCII filenames and crash on
-            # cp1252. See #2099. Drop text=True so bytes stay raw.
            subprocess.run(
                [*GIT_CMD, "add", "--intent-to-add"] + add_args,
-                cwd=cwd, capture_output=True, timeout=10,
+                cwd=cwd, capture_output=True, text=True, timeout=10,
                env=env,
            )
        yield env
@@ -168,17 +144,11 @@ def _temp_index(cwd, untracked_paths=None):
 def _git_toplevel(cwd):
    """Absolute repo root for `cwd`, or None if not in a work tree."""
    try:
-        # See #2099: stdout is a PATH — `C:\אבטחה\repo` returned as UTF-8
-        # bytes by git. text=True would decode via cp1252 strict on Windows
-        # → reader-thread crash. Decode manually with errors="replace".
        r = subprocess.run(
            [*GIT_CMD, "rev-parse", "--show-toplevel"],
-            cwd=cwd, capture_output=True, timeout=5,
+            cwd=cwd, capture_output=True, text=True, timeout=5,
        )
-        if r.returncode != 0:
-            return None
-        path = r.stdout.decode("utf-8", errors="replace").strip()
-        return path if path else None
+        return r.stdout.strip() if r.returncode == 0 and r.stdout.strip() else None
    except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
        return None

@@ -194,15 +164,13 @@ def _git_dir(repo_root):
    callers can degrade (push-sweep state is best-effort).
    """
    try:
-        # See #2099: stdout is a PATH (shared gitdir), may be non-ASCII.
-        # Decode bytes manually to avoid cp1252 reader-thread crash.
        r = subprocess.run(
            [*GIT_CMD, "rev-parse", "--git-common-dir"],
-            cwd=repo_root, capture_output=True, timeout=5,
+            cwd=repo_root, capture_output=True, text=True, timeout=5,
        )
        if r.returncode != 0:
            return None
-        d = r.stdout.decode("utf-8", errors="replace").strip()
+        d = r.stdout.strip()
        return d if os.path.isabs(d) else os.path.join(repo_root, d)
    except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
        return None
@@ -211,15 +179,13 @@ def _git_dir(repo_root):
 def _git_rev_list_range(repo_root, base, head="HEAD"):
    """Shas in `base..head`, oldest→newest. Empty list on error."""
    try:
-        # See #2099: stdout is ASCII SHAs, but stderr can carry git error
-        # messages referencing non-ASCII filenames — keep bytes raw.
        r = subprocess.run(
            [*GIT_CMD, "rev-list", "--reverse", f"{base}..{head}"],
-            cwd=repo_root, capture_output=True, timeout=10,
+            cwd=repo_root, capture_output=True, text=True, timeout=10,
        )
        if r.returncode != 0:
            return []
-        return [s for s in r.stdout.decode("utf-8", errors="replace").strip().split("\n") if s]
+        return [s for s in r.stdout.strip().split("\n") if s]
    except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
        return []

@@ -233,12 +199,15 @@ def _git_diff_range(repo_root, base, head="HEAD"):
    them reviewed — otherwise unreviewed commits get permanently silenced.
    """
    try:
-        # GIT_CMD globally passes core.quotePath=false (see definition) so
-        # non-ASCII paths in `diff --git a/... b/...` headers come through as
-        # raw UTF-8, not C-quoted. Required by the downstream
-        # parse_diff_into_files / extract_file_paths_from_diff regex.
+        # core.quotePath=false makes git emit raw UTF-8 in `diff --git a/... b/...`
+        # headers instead of C-quoting non-ASCII path bytes (`"a/\303\201vila/..."`
+        # vs `a/Ávila/...`). The downstream `re.match(r'^a/(.+?) b/(.+)$', ...)`
+        # in parse_diff_into_files / extract_file_paths_from_diff matches the
+        # raw form only — quoted headers slip past and the entire file is
+        # silently dropped from review. See #2082 (sibling of #2056 / #2075).
        r = subprocess.run(
-            [*GIT_CMD, "diff", "-p", "--no-color", "--no-ext-diff", base, head],
+            [*GIT_CMD, "-c", "core.quotePath=false",
+             "diff", "-p", "--no-color", "--no-ext-diff", base, head],
            cwd=repo_root, capture_output=True, timeout=30,
        )
        if r.returncode != 0:
@@ -251,11 +220,9 @@ def _git_diff_range(repo_root, base, head="HEAD"):
 def _detect_main_branch(repo_root):
    for ref in ("origin/HEAD", "origin/main", "origin/master", "main", "master"):
        try:
-            # See #2099: stdout is a SHA but stderr can carry non-ASCII git
-            # warnings — keep bytes raw to avoid cp1252 reader-thread crash.
            r = subprocess.run(
                [*GIT_CMD, "rev-parse", "--verify", "-q", ref],
-                cwd=repo_root, capture_output=True, timeout=5,
+                cwd=repo_root, capture_output=True, text=True, timeout=5,
            )
            if r.returncode == 0 and r.stdout.strip():
                return ref
@@ -363,9 +330,8 @@ def _git_name_only(cwd, base, include_untracked=False):
    # result.stdout=None, and propagate AttributeError out of the helper.
    # Same fix shape as diffstate._list_untracked. See #2056.
    def _run(env):
-        # core.quotePath=false comes from GIT_CMD globally (see definition).
        result = subprocess.run(
-            [*GIT_CMD, "diff", "--name-only", "-z", base],
+            [*GIT_CMD, "-c", "core.quotePath=false", "diff", "--name-only", "-z", base],
            cwd=cwd, capture_output=True, timeout=30,
            env=env,
        )
@@ -402,9 +368,9 @@ def _git_status_porcelain(cwd):
    # sibling helpers — a non-ASCII path in the worktree would otherwise
    # crash the cp1252 reader thread on Windows. See #2056.
    try:
-        # core.quotePath=false comes from GIT_CMD globally (see definition).
        r = subprocess.run(
-            [*GIT_CMD, "status", "--porcelain=v1", "-uall", "-z"],
+            [*GIT_CMD, "-c", "core.quotePath=false", "status",
+             "--porcelain=v1", "-uall", "-z"],
            cwd=cwd, capture_output=True, timeout=30,
        )
        if r.returncode != 0:
@@ -444,12 +410,9 @@ def _is_ancestor(cwd, maybe_ancestor, descendant):
    """True if `maybe_ancestor` is reachable from `descendant` (i.e. HEAD
    moved forward via commit/merge, not sideways via checkout)."""
    try:
-        # See #2099: only returncode matters, but text=True spawns reader
-        # threads that decode stderr — git error messages can carry non-ASCII
-        # filenames. Drop text=True to keep bytes raw, avoid cp1252 crash.
        result = subprocess.run(
            [*GIT_CMD, "merge-base", "--is-ancestor", maybe_ancestor, descendant],
-            cwd=cwd, capture_output=True, timeout=5,
+            cwd=cwd, capture_output=True, text=True, timeout=5,
        )
        return result.returncode == 0
    except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
@@ -480,8 +443,11 @@ def get_git_diff(cwd, baseline_sha, full_context=False, paths=None, untracked_pa
        # change exists to fix.
        return ""

-    # core.quotePath=false comes from GIT_CMD globally (see definition).
-    cmd = [*GIT_CMD, "diff", "--no-color", "--no-ext-diff", baseline_sha] + (["--unified=99999"] if full_context else []) + pathspec
+    # core.quotePath=false: emit raw UTF-8 in `diff --git a/... b/...` headers
+    # so non-ASCII paths aren't C-quoted past the downstream parse_diff_into_files
+    # regex. See #2082 (sibling of #2056 / #2075).
+    cmd = [*GIT_CMD, "-c", "core.quotePath=false",
+           "diff", "--no-color", "--no-ext-diff", baseline_sha] + (["--unified=99999"] if full_context else []) + pathspec
    try:
        with _temp_index(cwd, untracked_paths) as env:
            # env is None when no index could be found (bare repo / not a
--- a/plugins/security-guidance/hooks/llm.py
+++ b/plugins/security-guidance/hooks/llm.py
@@ -27,7 +27,7 @@ from typing import Optional, Tuple, Dict, Any, List

 import extensibility
 import review_api
-from _base import debug_log, _record_usage, _record_http_error, _PV, PROVENANCE_TAG, state_dir as _resolve_state_dir  # noqa: F401
+from _base import debug_log, _record_usage, _PV, PROVENANCE_TAG  # noqa: F401
 from session_state import with_locked_state


@@ -355,7 +355,10 @@ def _call_claude_via_sdk(prompt, output_schema, *, max_tokens=16000, model=None)
        # Try the venv ensure_agent_sdk.py builds. Same fallback logic as
        # agentic_review() — duplicated here so the 3P path doesn't require
        # the agentic path to have run first.
-        _state_dir = _resolve_state_dir()
+        _state_dir = os.environ.get(
+            "SECURITY_WARNINGS_STATE_DIR",
+            os.path.expanduser("~/.claude/security"),
+        )
        _inject_agent_sdk_venv_into_syspath(_state_dir)
        try:
            import asyncio as _asyncio  # noqa: F811
@@ -368,7 +371,6 @@ def _call_claude_via_sdk(prompt, output_schema, *, max_tokens=16000, model=None)
        except Exception as e:
            debug_log(f"3P sdk-single-turn: SDK unavailable ({e})")
            _last_call_claude_http_error = -1
-            _record_http_error(-1)
            return None

    cli_path = os.environ.get("SG_AGENTIC_CLI_PATH") or None
@@ -426,7 +428,6 @@ def _call_claude_via_sdk(prompt, output_schema, *, max_tokens=16000, model=None)
    except _asyncio.TimeoutError:
        debug_log("3P sdk-single-turn: timeout after 60s")
        _last_call_claude_http_error = -1
-        _record_http_error(-1)
        return None
    except Exception as e:
        debug_log(f"3P sdk-single-turn: query failed ({e})")
@@ -435,7 +436,6 @@ def _call_claude_via_sdk(prompt, output_schema, *, max_tokens=16000, model=None)
            for _l in _captured_stderr[:20]:
                debug_log(f"  | {_l.rstrip()}")
        _last_call_claude_http_error = -1
-        _record_http_error(-1)
        return None


@@ -482,21 +482,10 @@ def _call_claude(prompt, output_schema, thinking_budget=10000, max_tokens=16000,
        "max_tokens": max_tokens,
        "system": CLAUDE_CODE_SYSTEM_PROMPT,
        "messages": [{"role": "user", "content": prompt}],
-        # API moved the structured-output schema from top-level `output_format`
-        # to `output_config.format` per
-        # https://platform.claude.com/docs/en/build-with-claude/structured-outputs.
-        # The old form "continues to work for a transition period" for some
-        # auth modes (API key + non-streaming), but is rejected with
-        # `invalid_request_error: output_format: This field is deprecated.
-        # Use 'output_config.format' instead.` for others (OAuth Bearer +
-        # newer CLI versions hit it consistently — reporter saw 462 errors
-        # in one day). See #2098.
-        "output_config": {
-            "format": {
-                "type": "json_schema",
-                "schema": output_schema,
-            },
-        },
+        "output_format": {
+            "type": "json_schema",
+            "schema": output_schema
+        }
    }
    if thinking_budget > 0:
        # Models trained on adaptive thinking (4.6+) reject the budget_tokens
@@ -504,10 +493,7 @@ def _call_claude(prompt, output_schema, thinking_budget=10000, max_tokens=16000,
        # models (4.5 and earlier, all 3.x) reject adaptive. Pick by model.
        if _model_supports_adaptive_thinking(payload["model"]):
            payload["thinking"] = {"type": "adaptive"}
-            # Merge `effort` into the existing output_config dict (which
-            # now carries the `format` schema) rather than reassigning —
-            # otherwise the schema is silently overwritten. See #2098.
-            payload["output_config"]["effort"] = "high"
+            payload["output_config"] = {"effort": "high"}
        else:
            payload["thinking"] = {
                "type": "enabled",
@@ -545,7 +531,6 @@ def _call_claude(prompt, output_schema, thinking_budget=10000, max_tokens=16000,
                error_body = e.read().decode("utf-8") if e.fp else ""
                debug_log(f"API error: {e.code} - {error_body[:200]}")
                _last_call_claude_http_error = e.code
-                _record_http_error(e.code)
                return None
        except (urllib.error.URLError, TimeoutError) as e:
            if attempt < 2:
@@ -555,7 +540,6 @@ def _call_claude(prompt, output_schema, thinking_budget=10000, max_tokens=16000,
            else:
                debug_log(f"Request failed after retries: {e}")
                _last_call_claude_http_error = -1
-                _record_http_error(-1)
                return None

    if not response_data:
@@ -564,7 +548,6 @@ def _call_claude(prompt, output_schema, thinking_budget=10000, max_tokens=16000,
        # call uses the token; record the 401 so callers don't see error=None.
        if _last_call_claude_http_error is None:
            _last_call_claude_http_error = 401
-            _record_http_error(401)
        return None

    # Find the text block (skip thinking blocks)
@@ -1162,7 +1145,10 @@ def agentic_review(
        # ~/.claude/security/ with the SDK installed; try that as a fallback
        # before giving up. The system import is attempted first so users
        # who DO have it never touch the venv.
-        _state_dir = _resolve_state_dir()
+        _state_dir = os.environ.get(
+            "SECURITY_WARNINGS_STATE_DIR",
+            os.path.expanduser("~/.claude/security"),
+        )
        _venv_tried = _inject_agent_sdk_venv_into_syspath(_state_dir)
        try:
            import asyncio as _asyncio  # noqa: F811
--- a/plugins/security-guidance/hooks/security_reminder_hook.py
+++ b/plugins/security-guidance/hooks/security_reminder_hook.py
@@ -82,7 +82,6 @@ from _base import (  # noqa: E402,F401
    PROVENANCE_TAG, PROVENANCE_BANNER,
    _read_plugin_version_int, _PV, _USAGE, _USAGE_LOCK,
    _PRICE_PER_MTOK, _PRICE_DEFAULT, _record_usage, _usage_metrics,
-    state_dir as _resolve_state_dir,
 )
 import extensibility  # noqa: E402
 from patterns import (  # noqa: E402,F401
@@ -221,34 +220,15 @@ def emit_metrics(
    task-notification one-liner. Must be in the same JSON line as the metrics
    because CC stops scanning stdout after the first {-prefixed line.

-    `additional_context` (asyncRewake findings): model-visible guidance text.
-    Delivery channel depends on `hook_event_name` because CC's hook-output
-    contract is NOT symmetric across events:
-
-      - PostToolUse (commit-review, push-sweep): surfaced via the modern
-        hookSpecificOutput.additionalContext protocol. `PostToolUse` is a
-        member of CC's hookSpecificOutput discriminated union
-        (coreSchemas.ts), so the JSON validates and metrics/rewakeSummary
-        are consumed. See #1375 / #1783 for why this replaced the legacy
-        stderr + exit(2) shape for PostToolUse.
-
-      - Stop / SubagentStop: there is NO `Stop` member in that union, so
-        emitting hookSpecificOutput{hookEventName:"Stop"} makes the whole
-        line fail isSyncHookJSONOutput validation — which on the asyncRewake
-        path silently drops metrics AND rewakeSummary, and (because the
-        legacy stderr write was removed) leaks the raw JSON to the model as
-        the rewake body. CC's asyncRewake delivery actually reads
-        `stderr || stdout` for the model-visible body and only scans stdout
-        JSON for metrics+rewakeSummary — it never reads additionalContext
-        on this path. So for Stop we use the documented clean pattern:
-        guidance on stderr, valid JSON (metrics + rewakeSummary +
-        top-level decision/reason) on stdout. The top-level decision:"block"
-        + reason also covers the sync-fallback path (single-shot `claude -p`,
-        where asyncRewake degrades to a sync Stop hook that reads
-        decision/reason). See #2159.
-
-    Empty/None additional_context emits neither channel (back-compat for
-    metrics-only callers).
+    `additional_context` (asyncRewake findings): model-visible guidance text
+    that CC surfaces via the modern hook-output protocol
+    (hookSpecificOutput.additionalContext) instead of the legacy stderr +
+    exit(2) pair. The caller passes the finding-explanation text it would
+    have written to stderr; the JSON channel carries it cleanly so CC's UI
+    shows the reason properly instead of "Permission denied with no reason".
+    See anthropics/claude-plugins-official#1375 and #1783. Empty/None
+    means no hookSpecificOutput field is emitted (preserves backward compat
+    for legacy emit-sites that only want metrics).

    `system_message` (optional, asyncRewake only): user-visible TUI message,
    distinct from rewakeSummary which is the task-notification one-liner.
@@ -256,9 +236,10 @@ def emit_metrics(
    surface; systemMessage adds a per-fire override when the static
    rewakeMessage isn't specific enough for the finding being shown.

-    `hook_event_name` (used only when additional_context is set): selects the
-    delivery channel above. Defaults to "PostToolUse" (commit-review and
-    push-sweep are the most common callers); handle_stop_hook passes "Stop".
+    `hook_event_name` (used only when additional_context is set): which event
+    the hookSpecificOutput attaches to. Defaults to "PostToolUse" since the
+    commit-review and push-sweep handlers are the most common callers;
+    handle_stop_hook explicitly passes "Stop".
    """
    head = {}
    if _PV and "pv" not in metrics:
@@ -270,23 +251,14 @@ def emit_metrics(
    if rewake_summary:
        out["rewakeSummary"] = rewake_summary
    if additional_context:
-        if hook_event_name in ("Stop", "SubagentStop"):
-            # Stop is NOT in CC's hookSpecificOutput union — emitting it there
-            # fails schema validation and drops metrics+rewakeSummary (#2159).
-            # Clean pattern: guidance on stderr (the asyncRewake body channel,
-            # delivered via `stderr || stdout`), top-level decision/reason for
-            # the sync-fallback path. stdout JSON stays valid so metrics +
-            # rewakeSummary survive.
-            sys.stderr.write(additional_context)
-            sys.stderr.flush()
-            out["decision"] = "block"
-            out["reason"] = additional_context
-        else:
-            # PostToolUse et al. — valid union member; modern protocol.
-            out["hookSpecificOutput"] = {
-                "hookEventName": hook_event_name,
-                "additionalContext": additional_context,
-            }
+        # Wrap in hookSpecificOutput per CC's modern hook-output contract.
+        # Drops the legacy `sys.stderr.write(...) + sys.exit(2)` shape that
+        # left CC's UI showing "denied with no reason" (#1783) and triggered
+        # "json output validation failed" on older CC versions (#1375).
+        out["hookSpecificOutput"] = {
+            "hookEventName": hook_event_name,
+            "additionalContext": additional_context,
+        }
    if system_message:
        out["systemMessage"] = system_message
    print(json.dumps(out), flush=True)
@@ -576,11 +548,7 @@ def handle_user_prompt_submit(input_data):
    elif sha:
        debug_log(f"Captured git baseline: {sha[:12]}")
    else:
-        # Show cwd so the next reporter can immediately see when this isn't
-        # actually "not a git repo" but a path-encoding / permissions / git
-        # invocation failure. See #2099.
-        debug_log(f"Failed to capture git baseline (cwd={cwd!r}) — not a git repo, "
-                  f"or git invocation failed (check log entries above)")
+        debug_log("Failed to capture git baseline (not a git repo?)")

    sys.exit(0)

@@ -887,30 +855,23 @@ def _detect_prev_upstream(repo_root, bash_output):
    # @{u}@{1} — only meaningful if an upstream is configured.
    for ref in ("@{u}@{1}", "@{push}@{1}"):
        try:
-            # See #2099: stdout is a SHA but stderr can carry non-ASCII git
-            # warnings — keep bytes raw to avoid cp1252 reader-thread crash.
            r = subprocess.run(
                [*GIT_CMD, "rev-parse", "--verify", "-q", ref],
-                cwd=repo_root, capture_output=True, timeout=5,
+                cwd=repo_root, capture_output=True, text=True, timeout=5,
            )
-            sha = r.stdout.decode("utf-8", errors="replace").strip()
-            if r.returncode == 0 and sha:
-                return sha
+            if r.returncode == 0 and r.stdout.strip():
+                return r.stdout.strip()
        except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
            pass
    main = _detect_main_branch(repo_root)
    if main:
        try:
-            # See #2099: drop text=True; decode bytes manually so a
-            # cp1252-undefined byte in git's stderr doesn't crash the
-            # reader thread.
            r = subprocess.run(
                [*GIT_CMD, "merge-base", "HEAD", main],
-                cwd=repo_root, capture_output=True, timeout=5,
+                cwd=repo_root, capture_output=True, text=True, timeout=5,
            )
-            sha = r.stdout.decode("utf-8", errors="replace").strip()
-            if r.returncode == 0 and sha:
-                return sha
+            if r.returncode == 0 and r.stdout.strip():
+                return r.stdout.strip()
        except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
            pass
    return None
@@ -1224,18 +1185,18 @@ def handle_commit_review_posttooluse(input_data):
            # core.quotePath=false: emit raw UTF-8 in `diff --git a/... b/...`
            # headers so non-ASCII paths aren't C-quoted past the downstream
            # parse_diff_into_files regex (sibling of #2056 / #2075). See #2082.
-            # core.quotePath=false comes from GIT_CMD globally (see gitutil.py).
            if pre_amend_sha:
                # Delta review: pre-amend → post-amend. `git diff` (not show)
                # so the output is a pure unified diff with no commit header.
                result = subprocess.run(
-                    [*GIT_CMD, "diff", "--no-color", "--no-ext-diff",
-                     pre_amend_sha, sha, "--"],
+                    [*GIT_CMD, "-c", "core.quotePath=false",
+                     "diff", "--no-color", "--no-ext-diff", pre_amend_sha, sha, "--"],
                    cwd=repo_root, capture_output=True, timeout=15
                )
            else:
                result = subprocess.run(
-                    [*GIT_CMD, "show", "-p", "--no-color", "--no-ext-diff", sha, "--"],
+                    [*GIT_CMD, "-c", "core.quotePath=false",
+                     "show", "-p", "--no-color", "--no-ext-diff", sha, "--"],
                    cwd=repo_root, capture_output=True, timeout=15
                )
        except (subprocess.TimeoutExpired, FileNotFoundError, OSError) as e:
@@ -1362,13 +1323,12 @@ def handle_commit_review_posttooluse(input_data):
    try:
        full_shas = []
        for s in shas:
-            # See #2099: drop text=True; decode manually for cp1252 safety.
            r = subprocess.run(
                [*GIT_CMD, "rev-parse", "--verify", "-q", s],
-                cwd=repo_root, capture_output=True, timeout=5,
+                cwd=repo_root, capture_output=True, text=True, timeout=5,
            )
            if r.returncode == 0:
-                full_shas.append(r.stdout.decode("utf-8", errors="replace").strip())
+                full_shas.append(r.stdout.strip())
        _append_reviewed_shas(repo_root, full_shas, vulns_found=len(vulns or []))
    except Exception:
        pass
@@ -1570,10 +1530,9 @@ def handle_push_sweep_posttooluse(input_data):
    # both.
    head = None
    try:
-        # See #2099: drop text=True; decode manually for cp1252 safety.
        r = subprocess.run([*GIT_CMD, "rev-parse", "HEAD"], cwd=repo_root,
-                           capture_output=True, timeout=5)
-        head = r.stdout.decode("utf-8", errors="replace").strip() if r.returncode == 0 else None
+                           capture_output=True, text=True, timeout=5)
+        head = r.stdout.strip() if r.returncode == 0 else None
    except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
        pass
    push_section = _push_section(bash_output or "")
@@ -1603,15 +1562,14 @@ def handle_push_sweep_posttooluse(input_data):
        quiet_success = False
        if not (bash_output or "").strip() and not interrupted:
            try:
-                # See #2099: drop text=True; decode manually for cp1252 safety.
                r_cur = subprocess.run(
                    [*GIT_CMD, "rev-parse", "--verify", "-q", "@{u}"],
-                    cwd=repo_root, capture_output=True, timeout=5)
+                    cwd=repo_root, capture_output=True, text=True, timeout=5)
                r_prev = subprocess.run(
                    [*GIT_CMD, "rev-parse", "--verify", "-q", "@{u}@{1}"],
-                    cwd=repo_root, capture_output=True, timeout=5)
-                cur = r_cur.stdout.decode("utf-8", errors="replace").strip() if r_cur.returncode == 0 else ""
-                prev_u = r_prev.stdout.decode("utf-8", errors="replace").strip() if r_prev.returncode == 0 else ""
+                    cwd=repo_root, capture_output=True, text=True, timeout=5)
+                cur = r_cur.stdout.strip() if r_cur.returncode == 0 else ""
+                prev_u = r_prev.stdout.strip() if r_prev.returncode == 0 else ""
                quiet_success = bool(cur and prev_u and cur == head and prev_u != cur)
            except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
                pass
@@ -1625,12 +1583,11 @@ def handle_push_sweep_posttooluse(input_data):
        # reviewed-shas state.
        for local_ref in new_branch_matches:
            try:
-                # See #2099: drop text=True; decode manually for cp1252 safety.
                r = subprocess.run(
                    [*GIT_CMD, "rev-parse", "--verify", "-q", local_ref],
-                    cwd=repo_root, capture_output=True, timeout=5,
+                    cwd=repo_root, capture_output=True, text=True, timeout=5,
                )
-                local_sha = r.stdout.decode("utf-8", errors="replace").strip() if r.returncode == 0 else ""
+                local_sha = r.stdout.strip() if r.returncode == 0 else ""
            except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
                local_sha = ""
            if local_sha and local_sha != head:
@@ -2107,7 +2064,10 @@ def handle_stop_hook(input_data):
    })
    sys.exit(0)

-_SDK_BOOTSTRAP_THROTTLE = os.path.join(_resolve_state_dir(), ".sdk_bootstrap_spawned")
+_SDK_BOOTSTRAP_THROTTLE = os.path.join(
+    os.environ.get("SECURITY_WARNINGS_STATE_DIR")
+    or os.path.expanduser("~/.claude/security"),
+    ".sdk_bootstrap_spawned")

 def _maybe_bootstrap_agent_sdk_async():
    """Fire-and-forget SDK bootstrap, for remote-pod environments.
--- a/plugins/security-guidance/hooks/session_state.py
+++ b/plugins/security-guidance/hooks/session_state.py
@@ -19,7 +19,7 @@ import os
 import re
 from datetime import datetime

-from _base import debug_log, state_dir as _state_dir
+from _base import debug_log


 def _state_key(session_id):
@@ -36,20 +36,20 @@ def _state_key(session_id):

 def get_state_file(session_id):
    """Get session-specific state file path."""
-    state_dir = _state_dir()
+    state_dir = os.environ.get("SECURITY_WARNINGS_STATE_DIR", os.path.expanduser("~/.claude/security"))
    return os.path.join(state_dir, f"security_warnings_state_{_state_key(session_id)}.json")


 def get_lock_file(session_id):
    """Get session-specific lock file path."""
-    state_dir = _state_dir()
+    state_dir = os.environ.get("SECURITY_WARNINGS_STATE_DIR", os.path.expanduser("~/.claude/security"))
    return os.path.join(state_dir, f"security_warnings_state_{_state_key(session_id)}.lock")


 def cleanup_old_state_files():
    """Remove state files and lock files older than 30 days."""
    try:
-        state_dir = _state_dir()
+        state_dir = os.environ.get("SECURITY_WARNINGS_STATE_DIR", os.path.expanduser("~/.claude/security"))
        if not os.path.exists(state_dir):
            return

--- a/plugins/security-guidance/hooks/sg-python.sh
+++ b/plugins/security-guidance/hooks/sg-python.sh
@@ -22,17 +22,6 @@
 #        "${CLAUDE_PLUGIN_ROOT}/hooks/security_reminder_hook.py"
 set -e

-# Force UTF-8 for ALL Python filesystem + IO operations (PEP 540).
-# Without this, Windows Python defaults `locale.getpreferredencoding()` to
-# cp1252 — which makes `text=True` in subprocess.run / open() / json.load
-# crash the internal reader thread on any byte that's undefined in cp1252
-# (e.g. the 0x81 byte from ف, present in any path/filename with
-# Arabic/Hebrew/CJK characters). See #2056, #2099.
-#
-# No-op on macOS/Linux (already UTF-8). Must be set BEFORE Python starts —
-# changing it from inside the interpreter has no effect.
-export PYTHONUTF8=1
-
 # Git Bash / MSYS on Windows hands script paths to this shim in POSIX form
 # (`/c/Users/...`). When we exec a Windows `python.exe` (which we do on
 # Windows since `python3` is the Microsoft Store stub), python interprets the
Author	SHA1	Message	Date
Bryan Thompson	58e3dc5d45	bump: fail the per-entry check-dispatch step when a dispatch fails The dispatch step logged each failed gh workflow run as a warning and exited 0, so a transient API error or rate limit could leave a per-entry bump PR missing a required check while the bump run still showed green. The composite action skips slugs with an open PR, so the stranded PR was never retried. Attempt every dispatch (one failure must not strand the other branches), record failures via a temp file (the while loop runs in a pipe subshell), then emit an error and exit non-zero if any dispatch failed, so the bump run goes red and the affected PR can be re-dispatched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-29 17:08:25 -05:00
Bryan Thompson	f8310109cd	bump: dispatch all three required checks per per-entry PR Bump PRs are opened with GITHUB_TOKEN, which doesn't fire on:pull_request (recursion guard). The per-entry cutover already dispatched scan-plugins.yml per branch to satisfy the `scan` required check, but `check` (Check MCP URLs) and `validate` (Validate Plugins) are also required on main and likewise never fired — leaving every bump PR BLOCKED on missing checks (observed on the batched #2079, which only cleared after a human-authored push re-fired the pull_request workflows). Fix: dispatch all three workflows per per-entry bump branch. Each runs its job unconditionally on workflow_dispatch, so the check run lands on the branch HEAD (= PR head) and satisfies the required check. - validate-plugins.yml: add workflow_dispatch trigger (check-mcp-urls.yml already had one). gh workflow run requires the trigger on the default branch; this lands together with the per-entry bump so main stays consistent. - bump-plugin-shas.yml: loop the dispatch over {scan-plugins,check-mcp-urls,validate-plugins}; tolerate a single transient dispatch failure (warn, don't abort) so one hiccup can't strand the rest of the batch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-29 16:52:06 -05:00
Bryan Thompson	44ee67f099	bump: re-pin to merged composite action SHA on -community main The pr-mode: per-entry input now lives on main of the bump-plugin-shas action (merged at e2019b2a). Update the pin and drop the now-stale header comment that tracked the feature branch.	2026-05-29 16:52:06 -05:00
Bryan Thompson	0f6558e96b	bump: switch to per-entry PR mode (one PR per stale plugin) Replaces the single batched bump PR with one PR per stale plugin so a single failing plugin no longer blocks the rest. Pins to a feature branch of the bump-plugin-shas action that adds 'pr-mode: per-entry'; re-pin to the merge commit on the action's main when that lands. - pr-mode: per-entry → one PR per plugin on bump/<slug> - max_bumps default lowered 130 → 30 (per-entry scans cost more) - scan dispatch fanned out over pr-urls JSON (one per per-entry branch) - header comments updated for per-entry semantics	2026-05-29 16:52:06 -05:00