code-modernization: legacy toolchain is advisory, not a transform blocker

Legacy code often cannot build locally by nature — CICS/IMS programs have no local translator and the real runtime may be a mainframe the user doesn't have. Stopping transform on a failed legacy smoke compile would block exactly those systems. - transform Step 0a: the target toolchain remains required (its tests cannot run without it); a failed or impossible legacy compile no longer stops the run — the equivalence strategy switches to recorded traces / golden-master fixtures, and that downgrade is stated in the plan and in TRANSFORMATION_NOTES.md so reviewers know the strength of the proof - preflight: a red legacy toolchain now yields Ready-with-gaps for transform/reimagine instead of Not-ready
code-modernization: limit marketplace.json change to the one description line
2026-06-10 18:23:36 +00:00 · 2026-06-09 08:48:05 -07:00 · 2026-06-09 08:48:04 -07:00 · 2026-06-09 08:48:04 -07:00 · 2026-06-09 08:48:04 -07:00 · 2026-06-09 08:48:04 -07:00
24 changed files with 1997 additions and 338 deletions
--- a/.claude-plugin/marketplace.json
+++ b/.claude-plugin/marketplace.json
--- a/.github/workflows/bump-plugin-shas.yml
+++ b/.github/workflows/bump-plugin-shas.yml
@@ -2,25 +2,24 @@ name: Bump Plugin SHAs

 # Nightly sweep: for each external entry whose upstream HEAD has moved past
 # its pinned SHA, validate at the new SHA with `claude plugin validate`
-# inline, then open one PR with all passing bumps. Each run force-resets the
-# bump/plugin-shas branch, so a previous night's unmerged PR is replaced (and
-# its review state discarded) — review and merge same-day to avoid churn.
+# inline, then open one PR per bumped plugin on branch `bump/<slug>`.
+# Failing entries stay isolated in their own PR; passing bumps merge
+# independently.
 #
 # Bot-free — uses the default GITHUB_TOKEN. PRs opened with GITHUB_TOKEN don't
-# trigger on:pull_request workflows, so the policy scan (`Scan Plugins`, a
-# required status check on main) would never run and the bump PR could never
-# merge. workflow_dispatch is exempt from that recursion guard, so we dispatch
-# the scan ourselves on the bump branch after the PR is opened. The check run
-# lands on the branch HEAD — the same SHA as the PR head — and satisfies the
-# required check.
+# trigger on:pull_request workflows, so the required status checks on main
+# (`scan` from Scan Plugins, `check` from Check MCP URLs, `validate` from
+# Validate Plugins) would never run and the bump PR could never merge.
+# workflow_dispatch is exempt from that recursion guard, so we dispatch all
+# three ourselves against each per-entry bump branch after its PR is opened.
+# Each check run lands on the branch HEAD — the same SHA as the PR head — and
+# satisfies the corresponding required check. (Each of those workflows runs
+# its job unconditionally on workflow_dispatch, so a dispatch always reports.)
 #
-# max-bumps is set above the external-entry count so a single run can clear
-# any backlog. The cost-control mechanisms are downstream:
-#   - scan-plugins.yml caches verdicts by (plugin, sha) so an unchanged SHA
-#     is never re-scanned across nightly force-resets.
-#   - revert-failed-bumps.yml drops policy-failing entries from the bump PR
-#     so one bad upstream can't block the rest.
-# See those files for details.
+# max-bumps caps the per-night work for cost control. Per-entry scans are
+# more expensive than a single batched scan, so the cap is conservative.
+# The composite action skips entries that already have an open bump PR, so
+# re-dispatches don't pile up duplicate work.

 on:
  schedule:
@@ -30,12 +29,12 @@ on:
      max_bumps:
        description: Cap on plugins bumped this run
        required: false
-        default: '130'
+        default: '30'

 permissions:
  contents: write
  pull-requests: write
-  actions: write  # gh workflow run scan-plugins.yml on the bump branch
+  actions: write  # gh workflow run {scan-plugins,check-mcp-urls,validate-plugins}.yml per bump branch

 concurrency:
  group: bump-plugin-shas
@@ -43,8 +42,8 @@ concurrency:
 jobs:
  bump:
    runs-on: ubuntu-latest
-    # Per-bump cost is ~2s (ls-remote + shallow clone + validate); 130 entries
-    # is ~5 min. The 60 min ceiling absorbs slow upstreams without letting a
+    # Per-bump cost is ~2s (ls-remote + shallow clone + validate); 30 entries
+    # is ~1-2 min. The 60 min ceiling absorbs slow upstreams without letting a
    # pathological run consume the default 360 min budget.
    timeout-minutes: 60
    steps:
@@ -52,18 +51,44 @@ jobs:

      # createCommitOnBranch-based bump so commits are signed by GitHub and
      # satisfy the org-level required_signatures ruleset on main.
-      - uses: anthropics/claude-plugins-community/.github/actions/bump-plugin-shas@c41c6911de0afffd2bc5cd8b21fb1e06444ee13b
+      - uses: anthropics/claude-plugins-community/.github/actions/bump-plugin-shas@e2019b2a01f11aa1484c53540b1cfab5eebbc299
        id: bump
        with:
          marketplace-path: .claude-plugin/marketplace.json
-          max-bumps: ${{ inputs.max_bumps || '130' }}
+          max-bumps: ${{ inputs.max_bumps || '30' }}
+          pr-mode: per-entry
          claude-cli-version: latest

-      # `bump/plugin-shas` is the action's default `pr-branch`. The scan diffs
-      # the branch against origin/main (the action's base-ref fallback when
-      # there's no pull_request event) and scans only the bumped entries.
-      - name: Dispatch policy scan on bump branch
-        if: steps.bump.outputs.pr-url != ''
+      # Per-entry fan-out: dispatch the three required checks against each bump
+      # branch. `pr-urls` is a JSON array of {name, old_sha, new_sha, branch,
+      # pr_url} entries emitted by the composite action when pr-mode is
+      # per-entry. All three (scan / check / validate) are required on main and
+      # none fire on the GITHUB_TOKEN-opened PR, so each must be dispatched.
+      # A single failed dispatch (transient API error / rate limit) must not
+      # strand the remaining branches, so we attempt every dispatch, then fail
+      # the step if any failed: a missing required check would otherwise leave
+      # its bump PR silently blocked behind a green run, and the composite
+      # action skips slugs with an open PR so it would never be retried.
+      - name: Dispatch required checks per per-entry PR
+        if: steps.bump.outputs.pr-urls != '' && steps.bump.outputs.pr-urls != '[]'
        env:
          GH_TOKEN: ${{ github.token }}
-        run: gh workflow run scan-plugins.yml --ref bump/plugin-shas
+          PR_URLS: ${{ steps.bump.outputs.pr-urls }}
+        run: |
+          set -euo pipefail
+          dispatch_failures="$(mktemp)"
+          jq -c '.[]' <<<"$PR_URLS" | while read -r entry; do
+            branch=$(jq -r '.branch' <<<"$entry")
+            name=$(jq -r '.name' <<<"$entry")
+            for wf in scan-plugins check-mcp-urls validate-plugins; do
+              echo "Dispatching ${wf}.yml against $branch ($name)"
+              if ! gh workflow run "${wf}.yml" --ref "$branch"; then
+                echo "::error::Failed to dispatch ${wf}.yml against $branch ($name) — required check will be missing; re-dispatch with: gh workflow run ${wf}.yml --ref $branch"
+                echo "${wf} ${branch}" >> "$dispatch_failures"
+              fi
+            done
+          done
+          if [ -s "$dispatch_failures" ]; then
+            echo "::error::$(wc -l < "$dispatch_failures" | tr -d ' ') required-check dispatch(es) failed; the affected bump PR(s) are blocked until re-dispatched (see annotations above)."
+            exit 1
+          fi
--- a/.github/workflows/validate-plugins.yml
+++ b/.github/workflows/validate-plugins.yml
@@ -12,6 +12,14 @@ on:
    branches: [main]
    paths:
      - '.claude-plugin/**'
+  # `validate` is a required status check on main. Bump PRs are opened with
+  # GITHUB_TOKEN, which doesn't fire on:pull_request (recursion guard), so the
+  # path-filtered trigger above never reports on them and the PR would be
+  # blocked forever. The bump workflow dispatches this against each per-entry
+  # bump branch instead; the check run lands on the branch HEAD (= PR head)
+  # and satisfies the required check. The validate job runs unconditionally,
+  # so a dispatch always reports.
+  workflow_dispatch:

 permissions:
  contents: read
--- a/plugins/code-modernization/.claude-plugin/plugin.json
+++ b/plugins/code-modernization/.claude-plugin/plugin.json
@@ -1,6 +1,6 @@
 {
  "name": "code-modernization",
-  "description": "Modernize legacy codebases (COBOL, legacy Java/C++, monolith web apps) with a structured assess → map → extract-rules → brief → reimagine/transform → harden workflow and specialist review agents",
+  "description": "Modernize legacy codebases (COBOL, legacy Java/C++, monolith web apps) with a structured preflight / assess / map / extract-rules / brief / reimagine / transform / harden workflow, an interactive topology viewer, and specialist review agents",
  "author": {
    "name": "Anthropic",
    "email": "support@anthropic.com"
--- a/plugins/code-modernization/README.md
+++ b/plugins/code-modernization/README.md
@@ -7,7 +7,7 @@ A structured workflow and set of specialist agents for modernizing legacy codeba
 Legacy modernization fails most often not because the target technology is wrong, but because teams skip steps: they transform code before understanding it, reimagine architecture before extracting business rules, or ship without a harness that would catch behavior drift. This plugin enforces a sequence:

 ```
-assess → map → extract-rules → brief → reimagine | transform → harden
+preflight → assess → map → extract-rules → brief → reimagine | transform → harden
 ```

 The discovery commands (`assess`, `map`, `extract-rules`) build artifacts under `analysis/<system>/`. The `brief` command synthesizes them into an approval gate. The build commands (`reimagine`, `transform`) write new code under `modernized/`. The `harden` command audits the legacy system and produces a reviewable remediation patch. Each step has a dedicated slash command, and specialist agents (legacy analyst, business rules extractor, architecture critic, security auditor, test engineer) are invoked from within those commands — or directly — to keep the work honest.
@@ -20,25 +20,36 @@ Commands take a `<system-dir>` argument and assume the system being modernized l
 mkdir -p legacy && ln -s /path/to/your/legacy/codebase legacy/billing
 ```

-## Optional tooling
+## What to give Claude

-`/modernize-assess` works best with [`scc`](https://github.com/boyter/scc) (LOC + complexity + COCOMO) or [`cloc`](https://github.com/AlDanial/cloc), and falls back to `find`/`wc` if neither is installed. Portfolio mode also benefits from [`lizard`](https://github.com/terryyin/lizard) (cyclomatic complexity). The commands degrade gracefully without them, but the metrics will be coarser.
+The commands degrade gracefully, but each of these makes the output meaningfully better — run `/modernize-preflight <system-dir>` to check all of them at once and get a readiness report:
+
+- **Analysis tools**: [`scc`](https://github.com/boyter/scc) (LOC + complexity + COCOMO) or [`cloc`](https://github.com/AlDanial/cloc); [`lizard`](https://github.com/terryyin/lizard) for portfolio mode. Without them, metrics fall back to `find`/`wc` and get coarser.
+- **A working build toolchain** for the legacy stack (e.g. GnuCOBOL for COBOL) — required before `/modernize-transform` can prove behavioral equivalence, and verified by preflight with a real smoke compile against your code.
+- **The whole system in the tree**: deployment descriptors (JCL, CICS definitions, route configs), copybooks/includes, and DDL/schemas. Entry-point detection and data lineage in `/modernize-map` are guesswork without them.
+- **Production telemetry** (optional): an observability MCP server or batch job logs enable the runtime overlay in `/modernize-assess` and timing annotations on critical paths.

 ## Commands

 The commands are designed to be run in order, but each produces a standalone artifact so you can stop, review, and resume.

+### `/modernize-preflight <system-dir> [target-stack]`
+Environment readiness check, meant to run first: detects the legacy stack, checks analysis tooling, **smoke-compiles a real source file** with the legacy toolchain (the errors this surfaces — missing copybooks, wrong dialect flags — are the ones that otherwise appear mid-transform), inventories missing includes / deployment descriptors / binary-only artifacts, and probes for telemetry. Produces `analysis/<system>/PREFLIGHT.md` with a per-command Ready / Ready-with-gaps / Not-ready verdict.
+
 ### `/modernize-assess <system-dir>`  — or — `/modernize-assess --portfolio <parent-dir>`
 Inventory the legacy codebase: languages, line counts, complexity, build system, integrations, technical debt, security posture, documentation gaps, and a COCOMO-derived effort estimate. Produces `analysis/<system>/ASSESSMENT.md` and `analysis/<system>/ARCHITECTURE.mmd`. Spawns `legacy-analyst` (×2) and `security-auditor` in parallel for deep reads. With `--portfolio`, sweeps every subdirectory of a parent directory and writes a sequencing heat-map to `analysis/portfolio.html`.

 ### `/modernize-map <system-dir>`
-Build a dependency and topology map of the **legacy** system: program/module call graph, data lineage (programs ↔ data stores), entry points, dead-end candidates, and one traced critical-path business flow. Writes a re-runnable extraction script and produces `analysis/<system>/topology.json` (machine-readable), `analysis/<system>/TOPOLOGY.html` (rendered Mermaid + architect observations), and standalone `call-graph.mmd`, `data-lineage.mmd`, and `critical-path.mmd`.
+
+![Interactive topology map of AWS CardDemo — domains as containers, modules sized by lines of code, dependency edges colored by kind, entry points ringed](assets/topology-viewer-screenshot.jpg)
+
+Build a dependency and topology map of the **legacy** system: program/module call graph, data lineage (programs ↔ data stores), entry points, dead-end candidates, and 2–4 traced business flows each anchored to a persona (the claimant, the operator, the auditor — not the maintainer). Writes a re-runnable extraction script and produces `analysis/<system>/topology.json` plus `analysis/<system>/TOPOLOGY.html` — an **interactive zoomable map** (circle-pack of domains/modules sized by LOC, dependency edges with per-kind toggles, search, click-for-details sidebar, and a walkthrough mode that plays each persona flow as a numbered path with a plain-language narrative). Built from a template shipped with the plugin, so it works on systems far too dense for a static diagram. Small domain-level `call-graph.mmd`, `data-lineage.mmd`, and `critical-path.mmd` are still exported for docs and PRs.

 ### `/modernize-extract-rules <system-dir> [module-pattern]`
 Mine the business rules embedded in the legacy code — calculations, validations, eligibility, state transitions, policies — into Given/When/Then "Rule Cards" with `file:line` citations and confidence ratings. Spawns three `business-rules-extractor` agents in parallel (calculations, validations, lifecycle). Produces `analysis/<system>/BUSINESS_RULES.md` and `analysis/<system>/DATA_OBJECTS.md`.

 ### `/modernize-brief <system-dir> [target-stack]`
-Synthesize the discovery artifacts into a phased **Modernization Brief** — the single document a steering committee approves and engineering executes: target architecture, strangler-fig phase plan with entry/exit criteria, behavior contract, validation strategy, open questions, and an approval block. Reads `ASSESSMENT.md`, `TOPOLOGY.html`, and `BUSINESS_RULES.md` and **stops if any are missing** — run the discovery commands first. Produces `analysis/<system>/MODERNIZATION_BRIEF.md` and enters plan mode as a human-in-the-loop gate.
+Synthesize the discovery artifacts into a phased **Modernization Brief** — the single document a steering committee approves and engineering executes: target architecture, strangler-fig phase plan with entry/exit criteria, persona-based business walkthroughs (the section non-technical approvers actually read), behavior contract, validation strategy, open questions, and an approval block. Reads `ASSESSMENT.md`, `TOPOLOGY.html`, and `BUSINESS_RULES.md` and **stops if any are missing** — run the discovery commands first. Produces `analysis/<system>/MODERNIZATION_BRIEF.md` and enters plan mode as a human-in-the-loop gate.

 ### `/modernize-reimagine <system-dir> <target-vision>`
 Greenfield rebuild from extracted intent rather than a structural port. Mines a spec (`analysis/<system>/AI_NATIVE_SPEC.md`), designs a target architecture and has it adversarially reviewed (`analysis/<system>/REIMAGINED_ARCHITECTURE.md`), then **scaffolds services with executable acceptance tests** under `modernized/<system>-reimagined/` and writes a `CLAUDE.md` knowledge handoff for the new system. Two human-in-the-loop checkpoints. Spawns `business-rules-extractor`, `legacy-analyst` (×2), `architecture-critic`, and general-purpose scaffolding agents.
@@ -46,6 +57,9 @@ Greenfield rebuild from extracted intent rather than a structural port. Mines a
 ### `/modernize-transform <system-dir> <module> <target-stack>`
 Surgical, single-module strangler-fig rewrite. Plans first (HITL gate), then writes characterization tests via `test-engineer`, then an idiomatic target implementation under `modernized/<system>/<module>/`, proves equivalence by running the tests, and produces `TRANSFORMATION_NOTES.md` mapping legacy → modern with deliberate deviations called out. Reviewed by `architecture-critic`.

+### `/modernize-status <system-dir>`
+Read-only progress report: artifact inventory with timestamps per workflow stage, staleness flags (e.g. a brief older than the assessment it was built from), secrets-hygiene checks (quarantine file gitignored and never committed), and the single most useful next command. Run it anytime you come back to a modernization after a break.
+
 ### `/modernize-harden <system-dir>`
 Security hardening pass on the **legacy** system: OWASP/CWE scan, dependency CVEs, secrets, injection. Spawns `security-auditor`. Produces `analysis/<system>/SECURITY_FINDINGS.md` ranked Critical / High / Medium / Low and a reviewed `analysis/<system>/security_remediation.patch` with minimal fixes for the Critical/High findings. The patch is reviewed by a second `security-auditor` pass before you see it. **Never edits `legacy/`** — you review and apply the patch yourself when ready, then re-run to verify. Useful as a pre-modernization step when the legacy system will keep running in production during the migration.

@@ -81,17 +95,21 @@ This plugin ships commands and agents, but modernization projects benefit from a
      "Edit(modernized/**)"
    ],
    "deny": [
-      "Edit(legacy/**)"
+      "Edit(legacy/**)",
+      "Write(legacy/**)"
    ]
  }
 }
 ```

-Adjust `legacy/` and `modernized/` to match your actual layout. The key invariants: `Edit` under `legacy/` is denied, and writes are scoped to `analysis/` (for documents) and `modernized/` (for the new code). Every command in this plugin respects this — `/modernize-harden` writes a patch to `analysis/` rather than editing `legacy/` in place.
+Adjust `legacy/` and `modernized/` to match your actual layout. The key invariants: `Edit`/`Write` under `legacy/` are denied, and writes are scoped to `analysis/` (for documents) and `modernized/` (for the new code). Note this guards the file tools — shell commands that mutate files (`sed -i`, `git apply`) still go through the normal Bash permission prompt, so review those prompts with the same invariant in mind. Every command in this plugin respects this — `/modernize-harden` writes a patch to `analysis/` rather than editing `legacy/` in place.

 ## Typical Workflow

 ```bash
+# 0. Check the environment is ready (tools, toolchain, source completeness)
+/modernize-preflight billing
+
 # 1. Inventory the legacy system (or sweep a portfolio of them)
 /modernize-assess billing

@@ -112,6 +130,9 @@ Adjust `legacy/` and `modernized/` to match your actual layout. The key invarian

 # 6. Security-harden the legacy system that's still in production
 /modernize-harden billing
+
+# Anytime: where am I, what's stale, what's next
+/modernize-status billing
 ```

 ## License
--- a/plugins/code-modernization/assets/topology-viewer-screenshot.jpg
+++ b/plugins/code-modernization/assets/topology-viewer-screenshot.jpg
--- a/plugins/code-modernization/assets/topology-viewer.html
+++ b/plugins/code-modernization/assets/topology-viewer.html
--- a/plugins/code-modernization/commands/modernize-brief.md
+++ b/plugins/code-modernization/commands/modernize-brief.md
@@ -8,10 +8,19 @@ single document a steering committee approves and engineering executes.

 Target stack: `$2` (if blank, recommend one based on the assessment findings).

-Read `analysis/$1/ASSESSMENT.md`, `analysis/$1/TOPOLOGY.html` (and the `.mmd`
-files alongside it), and `analysis/$1/BUSINESS_RULES.md` first. If any are
-missing, say so and stop — they come from `/modernize-assess`, `/modernize-map`,
-and `/modernize-extract-rules` respectively. Run those first.
+Read `analysis/$1/ASSESSMENT.md`, `analysis/$1/topology.json` (plus the
+`.mmd` files alongside it — do NOT read `TOPOLOGY.html`, it's an
+interactive viewer with the data minified inside), and
+`analysis/$1/BUSINESS_RULES.md` first. If any are missing, say so and
+stop — they come from `/modernize-assess`, `/modernize-map`, and
+`/modernize-extract-rules` respectively. Run those first.
+
+**Staleness check:** compare modification times. If any input is newer
+than an existing `MODERNIZATION_BRIEF.md`, the brief is being justifiably
+regenerated; but if an existing brief is newer than all inputs and the
+user re-ran this command anyway, ask what changed. Either way, note the
+input timestamps in the brief's header so reviewers can see what it was
+built from.

 ## The Brief

@@ -31,28 +40,38 @@ fewest-dependencies first. For each phase:
 - Scope (which legacy modules, which target services)
 - Entry criteria (what must be true to start)
 - Exit criteria (what tests/metrics prove it's done)
- Estimated effort (person-weeks, derived from COCOMO + complexity data)
+- Estimated effort (person-months, same unit as the assessment's COCOMO
+  figure — convert deliberately if you present weeks)
 - Risk level + top 2 risks + mitigation

 Render the phases as a Mermaid `gantt` chart.

-### 4. Behavior Contract
+### 4. Business Walkthroughs
+For each persona flow in `analysis/$1/topology.json` (`flows` — produced
+by `/modernize-map`), a short narrative table: persona, what happens in
+business language, which legacy modules implement it today, and which
+phase from §3 replaces each. This is the section non-technical approvers
+actually read — it connects "Phase 2" to "what happens when a customer
+files a claim". If topology.json has no flows, derive 2–3 walkthroughs
+from the entry points and say they need SME confirmation.
+
+### 5. Behavior Contract
 List the **P0 rules** from BUSINESS_RULES.md (the ones tagged `Priority: P0` —
 money, regulatory, data integrity) that MUST be proven equivalent before any
 phase ships. These become the regression suite. Flag any P0 rule with
 Confidence < High as a blocker requiring SME confirmation before its phase
 starts.

-### 5. Validation Strategy
+### 6. Validation Strategy
 State which combination applies: characterization tests, contract tests,
 parallel-run / dual-execution diff, property-based tests, manual UAT.
 Justify per phase.

-### 6. Open Questions
+### 7. Open Questions
 Anything requiring human/SME decision before Phase 1 starts. Each as a
 checkbox the approver must tick.

-### 7. Approval Block
+### 8. Approval Block
 ```
 Approved by: ________________  Date: __________
 Approval covers: Phase 1 only | Full plan
@@ -60,6 +79,7 @@ Approval covers: Phase 1 only | Full plan

 ## Present

-Enter **plan mode** and present a summary of the brief. Do NOT proceed to any
-transformation until the user explicitly approves. This gate is the
-human-in-the-loop control point.
+Present a summary of the brief and **stop — write nothing further until
+the user explicitly approves** (use plan mode if the session supports
+it). This gate is the human-in-the-loop control point; "no objection" is
+not approval.
--- a/plugins/code-modernization/commands/modernize-map.md
+++ b/plugins/code-modernization/commands/modernize-map.md
@@ -55,50 +55,124 @@ re-run and audited. Have it write a machine-readable
 `analysis/$1/topology.json` and print a human summary. Run it; show the
 summary (cap at ~200 lines for very large estates).

-## Render
+`topology.json` must follow this schema — it feeds the interactive viewer:

-From the extracted data, generate **three Mermaid diagrams** and write them
-to `analysis/$1/TOPOLOGY.html` as a self-contained page that renders in any
-browser.
-
-The HTML page must use: dark `#1e1e1e` background, `#d4d4d4` text,
-`#cc785c` for `<h2>`/accents, `system-ui` font, all CSS **inline** (no
-external stylesheets). Load Mermaid from a CDN in `<head>`:
-
-```html
-<script type="module">
-  import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@11/dist/mermaid.esm.min.mjs';
-  mermaid.initialize({ startOnLoad: true, theme: 'dark' });
-</script>
+```json
+{
+  "system": "<display name>",
+  "root": {
+    "id": "sys", "name": "<system>", "kind": "system",
+    "children": [
+      { "id": "dom:<domain>", "name": "<Domain>", "kind": "domain",
+        "children": [
+          { "id": "<MODULE>", "name": "<MODULE>", "kind": "module",
+            "language": "cobol", "loc": 1234, "file": "src/MODULE.cbl" }
+        ] },
+      { "id": "dom:data", "name": "Data stores", "kind": "domain",
+        "children": [
+          { "id": "ds:<NAME>", "name": "<NAME>", "kind": "datastore" }
+        ] }
+    ]
+  },
+  "edges": [
+    { "source": "<id>", "target": "<id>", "kind": "call" }
+  ],
+  "entryPoints": ["<id>", "..."],
+  "deadEnds": ["<id>", "..."],
+  "observations": ["<architect observation>", "..."],
+  "flows": [
+    { "name": "<business flow>", "persona": "<who experiences it>",
+      "description": "<one sentence, plain language>",
+      "steps": [
+        { "label": "<business-language step>", "nodes": ["<id>", "<id>"] }
+      ] }
+  ]
+}
 ```

-Each diagram goes in a `<pre class="mermaid">...</pre>` block. Do **not**
-wrap diagrams in markdown ` ``` ` fences inside the HTML.
+- Group leaf modules under `domain` containers (use the domains from
+  `/modernize-assess` if available). Leaf kinds: `module`, `datastore`,
+  `job`, `screen`. `loc` drives circle size — include it for modules.
+- Edge kinds: `call` (direct), `dispatch` (dynamic/router), `read`,
+  `write`. Every edge endpoint must be a leaf id that exists in the tree.
+- `deadEnds`: the dead-end candidates from the extraction, rendered with
+  a dashed outline in the viewer. Apply the suppression rules above —
+  anything that could be the target of an unresolved dynamic call does
+  NOT belong here; record that uncertainty in `observations` instead.
+- **Datastore ids and names must be logical identifiers** — DD name,
+  dataset name, table/schema name, at most host:port. If the resolved
+  config value is a URL or DSN, strip userinfo and credential query
+  params before it goes anywhere in topology.json: the file gets
+  committed and the viewer displays names verbatim. Never copy raw
+  config values into `observations`.
+- `observations`: 3–7 architect observations — tight coupling clusters,
+  single points of failure, service-extraction candidates, data stores
+  with too many writers, dispatch targets the extraction could not
+  resolve.
+- `flows` is the **persona walkthrough** section — see below.

-1. **`graph TD` — Module call graph.** Cluster by domain (use `subgraph`).
-   Highlight entry points in a distinct style. Cap at ~40 nodes — if larger,
-   show domain-level with one expanded domain.
+## Persona flows

-2. **`graph LR` — Data lineage.** Programs → data stores.
-   Mark read vs write edges.
+Trace **2–4 end-to-end business flows**, each anchored to a persona —
+the people who experience the system, not the people who maintain it
+(e.g. for a benefits system: the claimant, the caseworker, the auditor;
+for billing: the customer, the billing operator). For each flow:

-3. **`flowchart TD` — Critical path.** Trace ONE end-to-end business flow
-   (e.g., "monthly billing run" or "process payment") through every program
-   and data store it touches, in execution order. If production telemetry is
-   available (see `/modernize-assess` Step 4), annotate each step with its
-   p50/p99 wall-clock.
+- `name` + one-sentence `description` in plain business language —
+  something a steering committee member relates to ("a claimant files a
+  weekly claim"), not a data-flow label ("CLM batch ingest").
+- `steps`: 3–8 steps, each with a business-language `label` and the
+  `nodes` (programs + data stores) that implement that step, in
+  execution order.

-Also export the three diagrams as standalone `.mmd` files for re-use:
-`analysis/$1/call-graph.mmd`, `analysis/$1/data-lineage.mmd`,
-`analysis/$1/critical-path.mmd`.
+This is the bridge between the technical map and non-technical
+stakeholders: the same diagram answers "which program does X" for
+engineers and "what happens when someone files a claim" for everyone else.

-## Annotate
+## Render

-Below each `<pre class="mermaid">` block in TOPOLOGY.html, add a `<ul>`
-with 3-5 **architect observations**: tight coupling clusters, single
-points of failure, candidates for service extraction, data stores
-touched by too many writers.
+`analysis/$1/TOPOLOGY.html` is an **interactive map**: a zoomable
+circle-pack of the whole system (domains as containers, modules sized by
+LOC) with dependency edges, search, per-node detail sidebar, edge-kind
+toggles, and a flow-walkthrough mode that plays each persona flow as a
+numbered path. Build it from the template that ships with this plugin —
+do not hand-write the viewer:
+
+```bash
+python3 - "${CLAUDE_PLUGIN_ROOT}/assets/topology-viewer.html" analysis/$1 <<'EOF'
+import json, sys
+tpl_path, out_dir = sys.argv[1], sys.argv[2]
+tpl = open(tpl_path).read()
+marker = "/*__TOPOLOGY_DATA__*/ null"
+assert marker in tpl, f"injection marker not found in {tpl_path}"
+data = json.dumps(json.load(open(f"{out_dir}/topology.json")))
+open(f"{out_dir}/TOPOLOGY.html", "w").write(
+    tpl.replace(marker, "/*__TOPOLOGY_DATA__*/ " + data))
+print(f"wrote {out_dir}/TOPOLOGY.html")
+EOF
+```
+
+The viewer is fully self-contained (the d3 subset it needs is inlined in
+the template) — it works offline and on air-gapped networks. If the
+`python3` invocation fails to find the template,
+`${CLAUDE_PLUGIN_ROOT}` was not substituted — report that rather than
+hand-writing a viewer.
+
+Mermaid stays for **small, exportable** diagrams. Generate standalone
+`.mmd` files for reuse in docs and PRs — but keep each under ~40 edges;
+collapse to domain level if the full graph is bigger (dense Mermaid
+becomes unreadable, which is exactly what the interactive map is for):
+
+- `analysis/$1/call-graph.mmd` — domain-level `graph TD`, entry points
+  highlighted
+- `analysis/$1/data-lineage.mmd` — `graph LR`, programs → data stores,
+  read vs write marked
+- `analysis/$1/critical-path.mmd` — `flowchart TD` of the primary flow
+  from `flows`, annotated with p50/p99 wall-clock if telemetry is
+  available (see `/modernize-assess` Step 4)

 ## Present

-Tell the user to open `analysis/$1/TOPOLOGY.html` in a browser.
+Tell the user to open `analysis/$1/TOPOLOGY.html` in a browser, and to
+try: search for a module, click it to see its connections, and pick a
+persona flow from the walkthrough dropdown.
--- a/plugins/code-modernization/commands/modernize-preflight.md
+++ b/plugins/code-modernization/commands/modernize-preflight.md
@@ -0,0 +1,98 @@
+---
+description: Environment readiness check — analysis tools, build toolchain, source completeness, telemetry access
+argument-hint: <system-dir> [target-stack]
+---
+
+Check whether this environment is ready to analyze — and eventually
+transform — `legacy/$1`, and tell the user exactly what to fix before the
+other commands run into it. Modernization sessions fail late and
+confusingly when this isn't done: assessment metrics silently degrade
+without analysis tools, characterization tests can't run without a build
+toolchain, and dependency maps come out wrong when half the source isn't
+in the tree.
+
+Run every check even when an early one fails — the point is one complete
+readiness report, not the first error.
+
+## Check 1 — Detect the stack
+
+Fingerprint `legacy/$1` from file extensions and manifests: languages,
+build system, deployment/config descriptors. This drives which checks
+below apply. Report what was detected and the rough file split.
+
+## Check 2 — Analysis tooling
+
+For each, check availability (`command -v`) and report version, what it's
+used for, and what degrades without it:
+
+| Tool | Used by | Without it |
+|---|---|---|
+| `scc` (or `cloc`) | assess | LOC/complexity fall back to `find`+`wc`; COCOMO estimate gets coarser |
+| `lizard` | assess --portfolio | complexity estimated from decision-keyword counts |
+| `glow` | all | markdown artifacts render as plain text |
+| `delta` | transform | side-by-side diffs fall back to `diff -y` |
+
+Include the platform's install one-liner for anything missing
+(`brew install scc`, `apt install cloc`, `pip install lizard`, …).
+
+## Check 3 — Build toolchain (smoke test, not just presence)
+
+Identify the compiler/interpreter for the detected legacy stack — e.g.
+GnuCOBOL (`cobc`) for COBOL, JDK + Maven/Gradle for Java, `cc`/`make` for
+C, `dotnet` for .NET. Then **prove it works on this codebase**: pick one
+representative source file and run a syntax-only compile
+(`cobc -fsyntax-only`, `javac`, `gcc -fsyntax-only`, …).
+
+A failed smoke test is the most valuable output of this command — report
+the actual error and diagnose it: missing copybook/include path, missing
+dialect flag (`-std=ibm` etc.), fixed vs free format, missing dependency
+jar. These are the errors that otherwise surface mid-`/modernize-transform`
+with much less context.
+
+If the user passed a `[target-stack]`, do the same for it: runtime,
+package manager, test framework (`mvn -v`, `npm -v`, `pytest --version`, …).
+
+## Check 4 — Source completeness
+
+The dependency map is only as good as what's in the tree. Check for the
+detected stack's equivalents of:
+
+- **Referenced-but-missing includes** — copybooks (`COPY X` with no
+  `X.cpy`), headers, imports that resolve nowhere. Count and list the top
+  missing names.
+- **Deployment/config descriptors** — JCL for batch COBOL, CICS CSD
+  definitions, `web.xml`/route configs, cron/scheduler definitions.
+  Without these, entry-point detection and the code↔storage join in
+  `/modernize-map` are guesswork.
+- **Data definitions** — DDL, schemas, copybook record layouts, ORM
+  mappings.
+- **Binary-only artifacts** — load modules, jars, DLLs with no matching
+  source. These become unmappable black boxes; flag them now.
+
+## Check 5 — Optional context
+
+- **Production telemetry** — is an observability/APM MCP server connected,
+  or are batch job logs / runtime exports available? (Enables the runtime
+  overlay in `/modernize-assess` Step 4 and timing annotations in
+  `/modernize-map`.)
+- **Version control history** — is `legacy/$1` under git with meaningful
+  history? (Change-frequency data sharpens risk ranking.)
+
+## Report
+
+Write `analysis/$1/PREFLIGHT.md`: a status table — one row per check,
+status ✅ / ⚠️ / ❌, what was found, and the fix for anything not green —
+followed by a **Ready / Ready-with-gaps / Not ready** verdict per command:
+
+- `assess` + `map` + `extract-rules` — need Checks 1–2 green-ish and
+  Check 4's missing-include count low
+- `brief` — needs only the three discovery artifacts; no tooling
+- `transform` + `reimagine` — additionally need Check 3 green for the
+  **target** stack. A red legacy toolchain downgrades these to
+  Ready-with-gaps, not Not-ready: equivalence testing falls back to
+  recorded traces / golden-master fixtures instead of dual execution
+  (common and expected for CICS/IMS code that has no local runtime)
+- `harden` — needs Check 2 plus any stack-specific SAST tooling found
+
+Print the table in the session too, and end with the single most
+important fix if anything is red.
--- a/plugins/code-modernization/commands/modernize-reimagine.md
+++ b/plugins/code-modernization/commands/modernize-reimagine.md
@@ -3,7 +3,11 @@ description: Multi-agent greenfield rebuild — extract specs from legacy, desig
 argument-hint: <system-dir> <target-vision>
 ---

-**Reimagine** `legacy/$1` as: $2
+The first token of `$ARGUMENTS` is the system dir (`$1`); **everything
+after it is the target vision** — it is usually multiple words, so do not
+truncate it to one token. Below, `<vision>` means that full remainder.
+
+**Reimagine** `legacy/$1` as: <vision>

 This is not a port — it's a rebuild from extracted intent. The legacy system
 becomes the *specification source*, not the structural template. This command
@@ -19,7 +23,8 @@ Spawn concurrently and show the user that all three are running:
 2. **legacy-analyst** — "Catalog every external interface of legacy/$1:
   inbound (screens, APIs, batch triggers, queues) and outbound (reports,
   files, downstream calls, DB writes). For each: name, direction, payload
-   shape, frequency/SLA if discernible."
+   shape, frequency/SLA if discernible. Mask any credential embedded in
+   endpoints or payload examples per your secret-handling rules."

 3. **legacy-analyst** — "Identify the core domain entities in legacy/$1 and
   their relationships. Return as an entity list + Mermaid erDiagram."
@@ -32,6 +37,9 @@ Collect results. Write `analysis/$1/AI_NATIVE_SPEC.md` containing:
 - **Non-functional requirements** inferred from legacy (batch windows, volumes)
 - **Behavior Contract** (the Given/When/Then rules — these are the acceptance tests)

+Credential values are masked everywhere in the spec; connection details
+appear as env-var placeholders (`${DATABASE_URL}`), never literals.
+
 ## Phase B — HITL checkpoint #1

 Present the spec summary. Ask the user **one focused question**: "Which of
@@ -40,20 +48,21 @@ should deliberately drop?" Wait for the answer. Record it in the spec.

 ## Phase C — Architecture (single agent, then critique)

-Design the target architecture for "$2":
+Design the target architecture for "<vision>":
 - Mermaid C4 Container diagram
 - Service boundaries with rationale (which rules/entities live where)
 - Technology choices with one-line justification each
 - Data migration approach from legacy stores

 Then spawn **architecture-critic**: "Review this proposed architecture for
-$2 against the spec in analysis/$1/AI_NATIVE_SPEC.md. Identify over-engineering,
+<vision> against the spec in analysis/$1/AI_NATIVE_SPEC.md. Identify over-engineering,
 missed requirements, scaling risks, and simpler alternatives." Incorporate
 the critique. Write the result to `analysis/$1/REIMAGINED_ARCHITECTURE.md`.

 ## Phase D — HITL checkpoint #2

-Enter plan mode. Present the architecture. Wait for approval.
+Present the architecture and **stop — scaffold nothing until the user
+explicitly approves** (use plan mode if the session supports it).

 ## Phase E — Parallel scaffolding

@@ -65,7 +74,9 @@ in parallel**:
 and AI_NATIVE_SPEC.md. Create: project skeleton, domain model, API stubs
 matching the interface contracts, and **executable acceptance tests** for every
 behavior-contract rule assigned to this service (mark unimplemented ones as
-expected-failure/skip with the rule ID). Write to modernized/$1-reimagined/<service-name>/."
+expected-failure/skip with the rule ID). No credential literal from legacy
+code becomes a test fixture or config default — use fake same-shape values
+and env-var placeholders. Write to modernized/$1-reimagined/<service-name>/."

 Show the agents' progress. When all complete, run the acceptance test suites
 and report: total tests, passing (scaffolded behavior), pending (rule IDs
@@ -77,7 +88,9 @@ Write `modernized/$1-reimagined/CLAUDE.md` — the persistent context file for
 the new system, containing: architecture summary, service responsibilities,
 where the spec lives, how to run tests, and the legacy→modern traceability
 map. This file IS the knowledge graph that future agents and engineers will
-load.
+load — and it gets committed: connection details and credentials appear
+only as env-var names with a pointer to where they're provisioned, never
+as values.

 Report: services scaffolded, acceptance tests defined, % behaviors with a
 home, location of all artifacts.
--- a/plugins/code-modernization/commands/modernize-status.md
+++ b/plugins/code-modernization/commands/modernize-status.md
@@ -0,0 +1,54 @@
+---
+description: Where am I in the modernization workflow — artifact inventory, staleness, secrets hygiene, next step
+argument-hint: <system-dir>
+---
+
+Report where the modernization of `$1` stands, in one screen. This is a
+read-only command — inspect, never modify.
+
+## 1 — Artifact inventory
+
+Check `analysis/$1/` and `modernized/$1*/` and build a table — one row per
+workflow stage, with the artifact's presence and modification time:
+
+| Stage | Artifacts |
+|---|---|
+| preflight | `PREFLIGHT.md` |
+| assess | `ASSESSMENT.md`, `ARCHITECTURE.mmd` |
+| map | `topology.json`, `TOPOLOGY.html`, `*.mmd`, `extract_topology.*` |
+| extract-rules | `BUSINESS_RULES.md`, `DATA_OBJECTS.md` |
+| brief | `MODERNIZATION_BRIEF.md` (note whether the approval block is signed) |
+| harden | `SECURITY_FINDINGS.md`, `security_remediation.patch` |
+| transform / reimagine | each `modernized/$1*/<module>/` dir — note test presence and whether `TRANSFORMATION_NOTES.md` exists |
+
+## 2 — Staleness
+
+Flag any artifact older than an upstream artifact it derives from:
+
+- `MODERNIZATION_BRIEF.md` older than `ASSESSMENT.md`, `topology.json`,
+  or `BUSINESS_RULES.md` → the brief no longer reflects discovery;
+  recommend re-running `/modernize-brief`.
+- `TOPOLOGY.html` older than `topology.json` → re-run the injection step
+  from `/modernize-map`.
+- Any `TRANSFORMATION_NOTES.md` older than `BUSINESS_RULES.md` → the
+  module may not implement the latest rule set; list which.
+
+## 3 — Secrets hygiene
+
+- Does `analysis/.gitignore` exist and cover `SECRETS.local.md` /
+  `*.local.patch`? (`git check-ignore` when in a git repo.)
+- If `SECRETS.local.md` exists: confirm it is NOT tracked
+  (`git ls-files --error-unmatch`, expect failure) and has never been
+  committed (`git log --all --oneline -- <path>`, expect empty). If
+  either check fails, say so prominently and recommend rotation plus
+  history scrubbing.
+
+## 4 — Verdict
+
+End with three lines:
+- **Where you are** — the furthest completed stage and roughly how much
+  of the system it covers (e.g. "mapped 100%, 2 of 14 modules
+  transformed").
+- **What's stale** — or "nothing".
+- **Next command** — the single most useful next step, with a one-line
+  reason.
--- a/plugins/code-modernization/commands/modernize-transform.md
+++ b/plugins/code-modernization/commands/modernize-transform.md
@@ -9,10 +9,37 @@ equivalence.
 This is a surgical, single-module transformation — one vertical slice of the
 strangler fig. Output goes to `modernized/$1/$2/`.

-## Step 0 — Plan (HITL gate)
+## Step 0a — Toolchain check (fail fast on target, adapt on legacy)
+
+Verify the build environment **before** planning, not when the tests
+first run:
+
+- **Target stack ($3) — required.** Runtime, package manager, and test
+  framework all respond (`java -version` + `mvn -v`, `node -v` + `npm -v`,
+  `python3 -V` + `pytest --version`, …). If any are missing, stop and
+  report what to install — the new code and its tests cannot run without
+  them, so a plan gate now would just defer the failure an hour. Suggest
+  `/modernize-preflight $1 $3` for the full readiness report.
+- **Legacy stack — advisory, never a blocker.** Try a syntax-only compile
+  of the module being transformed (e.g. `cobc -fsyntax-only`). Legacy
+  code often *cannot* build locally by nature, not by misconfiguration —
+  CICS/IMS programs have no local translator, and the real runtime may be
+  a mainframe you don't have. A failed or impossible legacy compile does
+  **not** stop the transform; it changes the equivalence strategy:
+  - dual-execution proof is off the table — characterization tests
+    assert against **recorded traces / golden-master fixtures** (real
+    production outputs, captured reports/screens, SME-confirmed
+    examples) instead of live legacy runs
+  - say so explicitly in the Step 0b plan and later in
+    TRANSFORMATION_NOTES.md ("equivalence is trace-based; legacy was not
+    executable in this environment"), so reviewers know the strength of
+    the proof they're approving
+
+## Step 0b — Plan (HITL gate)

 Read the source module and any business rules in `analysis/$1/BUSINESS_RULES.md`
-that reference it. Then **enter plan mode** and present:
+that reference it. Then present the plan and **stop — write no code until
+the user explicitly approves** (use plan mode if the session supports it):
 - Which source files are in scope
 - The target module structure (packages/classes/files you'll create)
 - Which business rules / behaviors this module implements
@@ -30,7 +57,9 @@ identify every observable behavior, and encode each as a test case with
 concrete input → expected output pairs derived from the legacy logic.
 Target framework: <appropriate for $3>. Write to
 `modernized/$1/$2/src/test/`. These tests define 'done' — the new code
-must pass all of them."
+must pass all of them. Follow your secret-handling rules: no credential
+literal from legacy code becomes a fixture; substitute fake same-shape
+values and read anything genuinely live from environment variables."

 Show the user the test file. Get a 👍 before proceeding.

@@ -68,6 +97,10 @@ Then show a visual diff of one representative behavior, legacy vs modern:
 ```bash
 delta --side-by-side <(sed -n '<lines>p' legacy/$1/<file>) modernized/$1/$2/src/main/<file>
 ```
+(Fall back to `diff -y --width=160` if `delta` isn't installed.) Never
+pick a credential-bearing line range for this diff, and mask any
+credential-like literal quoted in TRANSFORMATION_NOTES.md — the notes
+live in `modernized/` and get committed.

 ## Step 5 — Architecture review

--- a/plugins/security-guidance/.claude-plugin/plugin.json
+++ b/plugins/security-guidance/.claude-plugin/plugin.json
@@ -1,6 +1,6 @@
 {
  "name": "security-guidance",
-  "version": "2.0.0",
+  "version": "2.0.3",
  "description": "Security review for Claude-generated code. Pattern-based warnings on edits, LLM-powered diff review on Stop, and an agentic commit reviewer that catches injection, XSS, SSRF, hardcoded secrets, and 25+ other vulnerability classes.",
  "author": {
    "name": "David Dworken",
--- a/plugins/security-guidance/hooks/_base.py
+++ b/plugins/security-guidance/hooks/_base.py
@@ -10,15 +10,42 @@ import os
 import threading
 from datetime import datetime

+def state_dir():
+    """Return the absolute path of the plugin's state directory.
+
+    Resolution precedence (highest first):
+      1. SECURITY_WARNINGS_STATE_DIR — plugin-specific override (existing)
+      2. CLAUDE_CONFIG_DIR/security  — CC's config-dir env var (#1868)
+      3. ~/.claude/security          — default fallback
+
+    Empty-string env vars are treated as not-set so a misconfigured shell
+    (`CLAUDE_CONFIG_DIR=` with no value) doesn't silently write to
+    /security at the filesystem root.
+
+    Returns a fully-expanded absolute path (no literal `~`) so subprocess
+    callers can pass it through to code that doesn't re-expand tildes.
+
+    Called per-invocation rather than cached at import time so test
+    monkeypatches of the env vars take effect — the plugin's hooks each
+    run as fresh subprocesses in production, so the per-call cost is
+    negligible compared to subprocess spawn.
+    """
+    explicit = os.environ.get("SECURITY_WARNINGS_STATE_DIR")
+    if explicit:
+        return os.path.expanduser(explicit)
+    cc_config = os.environ.get("CLAUDE_CONFIG_DIR")
+    if cc_config:
+        return os.path.expanduser(os.path.join(cc_config, "security"))
+    return os.path.expanduser("~/.claude/security")
+
+
 # Debug log file. Lives under the plugin state dir (default ~/.claude/security/)
 # rather than /tmp because /tmp is world-writable on multi-user hosts (TOCTOU /
 # symlink-attack surface, cross-user log leakage). Overridable per-process via
-# SECURITY_GUIDANCE_DEBUG_LOG, or per-state-dir via SECURITY_WARNINGS_STATE_DIR.
-_DEFAULT_STATE_DIR = os.path.expanduser(
-    os.environ.get("SECURITY_WARNINGS_STATE_DIR") or "~/.claude/security"
-)
+# SECURITY_GUIDANCE_DEBUG_LOG, or per-state-dir via SECURITY_WARNINGS_STATE_DIR
+# (plugin-specific override) or CLAUDE_CONFIG_DIR (CC-wide config dir, #1868).
 DEBUG_LOG_FILE = os.environ.get("SECURITY_GUIDANCE_DEBUG_LOG") or os.path.join(
-    _DEFAULT_STATE_DIR, "log.txt"
+    state_dir(), "log.txt"
 )
 # Cap the debug log so parallel-worker fleets don't fill disk. When the active
 # file exceeds this it's atomically rotated to <file>.1 (overwriting any prior
@@ -89,7 +116,18 @@ _PV = _read_plugin_version_int()
 # Emitted via _usage_metrics() into the existing emit_metrics() channel so
 # hook metrics rows carry per-invocation token/cost totals
 # alongside the existing skip_reason / vulns_found fields.
-_USAGE = {"in": 0, "out": 0, "cr": 0, "cw": 0, "cost": 0.0, "n": 0}
+_USAGE = {
+    "in": 0, "out": 0, "cr": 0, "cw": 0, "cost": 0.0, "n": 0,
+    # HTTP error visibility (#2098 visibility gap — see emit comment in
+    # _usage_metrics). Without this, API failures from `_call_claude` left
+    # zero fingerprint in telemetry: the call returns None, the caller's
+    # emit_metrics carries no api_calls field, and the failure is
+    # indistinguishable from "no review needed". The deprecation outage
+    # that broke every commit-review LLM call was invisible until users
+    # reported it manually.
+    "http_err_last": 0,    # most recent HTTP error code this invocation
+    "http_err_count": 0,   # total HTTP errors (4xx + 5xx + network)
+}
 _USAGE_LOCK = threading.Lock()

 # $/Mtok (input, output). Used only for the raw-HTTP path; the SDK path
@@ -139,19 +177,55 @@ def _record_usage(usage, model, cost_usd=None):
        _USAGE["n"] += 1


+def _record_http_error(status):
+    """Record an HTTP error from an LLM API call. `status` is the HTTP
+    status code (integer 400–599) or -1 for network/timeout errors. Stored
+    in `_USAGE["http_err_last"]` (most recent) and counted in
+    `_USAGE["http_err_count"]`. Snapshot via `_usage_metrics()` so every
+    subsequent `emit_metrics` includes the failure fingerprint.
+
+    Background: without this, the most recent example was the #2098
+    deprecation 400. Every hook fire's LLM call returned HTTP 400; the
+    plugin caught it and returned None; the emit_metrics carried no
+    api_calls field; aggregate dashboards looked normal. The failure
+    only became visible when a user manually reported errors out of
+    their debug log. With this field, a category-of-failure spike (4xx,
+    5xx, or -1 network) is queryable from BQ in real time.
+    """
+    try:
+        s = int(status)
+    except (TypeError, ValueError):
+        return
+    with _USAGE_LOCK:
+        _USAGE["http_err_last"] = s
+        _USAGE["http_err_count"] += 1
+
+
 def _usage_metrics():
    """Snapshot the accumulator as metric keys. Returns {} when no API calls
-    were made so skip-path emits don't burn key budget. cost_usd rounded to
-    1e-6 to keep the float finite/short for the zod schema."""
-    with _USAGE_LOCK:
-        if _USAGE["n"] == 0:
-            return {}
-        return {
-            "tok_in": _USAGE["in"],
-            "tok_out": _USAGE["out"],
-            "tok_cache_r": _USAGE["cr"],
-            "tok_cache_w": _USAGE["cw"],
-            "cost_usd": round(_USAGE["cost"], 6),
-            "api_calls": _USAGE["n"],
-        }
+    AND no HTTP errors were made so skip-path emits don't burn key budget.
+    cost_usd rounded to 1e-6 to keep the float finite/short for the zod
+    schema.
+
+    HTTP errors (`http_err_last`, `http_err_count`) emitted ONLY when
+    `http_err_count > 0` so successful calls don't pad every metrics row
+    with two zero fields.
+    """
+    with _USAGE_LOCK:
+        if _USAGE["n"] == 0 and _USAGE["http_err_count"] == 0:
+            return {}
+        out = {}
+        if _USAGE["n"] > 0:
+            out.update({
+                "tok_in": _USAGE["in"],
+                "tok_out": _USAGE["out"],
+                "tok_cache_r": _USAGE["cr"],
+                "tok_cache_w": _USAGE["cw"],
+                "cost_usd": round(_USAGE["cost"], 6),
+                "api_calls": _USAGE["n"],
+            })
+        if _USAGE["http_err_count"] > 0:
+            out["http_err_last"] = _USAGE["http_err_last"]
+            out["http_err_count"] = _USAGE["http_err_count"]
+        return out

--- a/plugins/security-guidance/hooks/diffstate.py
+++ b/plugins/security-guidance/hooks/diffstate.py
@@ -138,7 +138,17 @@ def restore_unreviewed_stop_state(session_id, paths, baseline_sha):


 def get_baseline_file_content(session_id, file_path, cwd):
-    """Get the content of a file at the baseline SHA. Returns None if unavailable."""
+    """Get the content of a file at the baseline SHA. Returns None if unavailable.
+
+    Decode the file content as UTF-8 with errors="replace" rather than using
+    text=True: source files in user repos can be latin-1 / cp1252 / shift-jis
+    / etc., and on Windows text=True would decode via locale.getpreferredencoding()
+    in strict mode and raise UnicodeDecodeError in the subprocess reader
+    thread — leaving result.stdout=None and propagating AttributeError when
+    the caller tries to use it. Same class as the existing migrations at
+    security_reminder_hook.py:540 (reflog subjects) and :1115 (commit
+    diffs); this helper was missed in that pass. See
+    anthropics/claude-plugins-official#2056."""
    baseline_sha = load_baseline_sha(session_id)
    if not baseline_sha:
        return None
@@ -151,12 +161,12 @@ def get_baseline_file_content(session_id, file_path, cwd):
            return None
        result = subprocess.run(
            [*GIT_CMD, "show", f"{baseline_sha}:{rel_path}"],
-            cwd=cwd, capture_output=True, text=True, timeout=5
+            cwd=cwd, capture_output=True, timeout=5
        )
        if result.returncode == 0:
-            return result.stdout
+            return (result.stdout or b"").decode("utf-8", errors="replace")
        return None
-    except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
+    except (subprocess.TimeoutExpired, FileNotFoundError, OSError, ValueError):
        return None


@@ -173,11 +183,16 @@ def capture_git_baseline(cwd):
    and `compute_v2_review_set` subtracts that set so pre-existing untracked
    files are not reviewed as Claude-authored.
    """
+    # stdout is a SHA so text=True is safe on stdout, but a non-ASCII
+    # filename in `git stash create`'s STDERR warning (e.g. a worktree
+    # with `Ávila_report.txt` triggers a quotePath/locale warning) would
+    # trip the stderr reader thread on Windows cp1252. Decode both streams
+    # leniently for symmetry with _list_untracked. See #2056.
    try:
        # Check if HEAD exists (i.e., repo has at least one commit)
        head_check = subprocess.run(
            [*GIT_CMD, "rev-parse", "HEAD"],
-            cwd=cwd, capture_output=True, text=True, timeout=5
+            cwd=cwd, capture_output=True, timeout=5
        )
        if head_check.returncode != 0:
            # No commits yet — skip review rather than creating commits in the user's repo
@@ -186,20 +201,20 @@ def capture_git_baseline(cwd):

        result = subprocess.run(
            [*GIT_CMD, "stash", "create"],
-            cwd=cwd, capture_output=True, text=True, timeout=15
+            cwd=cwd, capture_output=True, timeout=15
        )
-        sha = result.stdout.strip()
+        sha = (result.stdout or b"").decode("utf-8", errors="replace").strip()
        if sha:
            return sha

        # Working tree is clean — stash create returns empty. Use HEAD.
        result = subprocess.run(
            [*GIT_CMD, "rev-parse", "HEAD"],
-            cwd=cwd, capture_output=True, text=True, timeout=5
+            cwd=cwd, capture_output=True, timeout=5
        )
-        sha = result.stdout.strip()
+        sha = (result.stdout or b"").decode("utf-8", errors="replace").strip()
        return sha if sha else None
-    except (subprocess.TimeoutExpired, FileNotFoundError, OSError) as e:
+    except (subprocess.TimeoutExpired, FileNotFoundError, OSError, ValueError) as e:
        debug_log(f"Failed to capture git baseline: {e}")
        return None

@@ -323,19 +338,35 @@ def _list_untracked(cwd):
    mtime is captured so an in-place edit during the turn is still reviewed.

    Uses ls-files (not status) for the UPS path: the index diff isn't needed,
-    and ls-files --others only walks the worktree against .gitignore."""
+    and ls-files --others only walks the worktree against .gitignore.
+
+    Decodes stdout/stderr as UTF-8 with errors="replace" instead of using
+    text=True. With core.quotePath=false git emits raw UTF-8 bytes for
+    non-ASCII filenames; text=True decodes via locale.getpreferredencoding()
+    in strict mode — on Windows that's cp1252 with several undefined bytes
+    (0x81/0x8D/0x8F/0x90/0x9D), all of which appear in UTF-8 encodings of
+    common accented capitals (Á Í Ï Ð Ý) and most CJK/emoji codepoints.
+    A non-ASCII filename in the worktree crashed the subprocess reader
+    thread, left r.stdout=None, and propagated AttributeError out of the
+    helper — silently losing the baseline snapshot every UserPromptSubmit.
+    See anthropics/claude-plugins-official#2056. The sibling helpers in
+    gitutil.py already follow the lenient pattern; this function and
+    capture_git_baseline / _git_name_only / _git_status_porcelain were
+    the holdouts."""
    try:
        repo = _git_toplevel(cwd) or cwd
+        # core.quotePath=false comes from GIT_CMD globally (see gitutil.py).
        r = subprocess.run(
-            [*GIT_CMD, "-c", "core.quotePath=false", "ls-files",
-             "--others", "--exclude-standard", "-z"],
-            cwd=repo, capture_output=True, text=True, timeout=15,
+            [*GIT_CMD, "ls-files", "--others", "--exclude-standard", "-z"],
+            cwd=repo, capture_output=True, timeout=15,
        )
        if r.returncode != 0:
-            debug_log(f"_list_untracked rc={r.returncode}: {r.stderr[:200]}")
+            stderr_str = (r.stderr or b"").decode("utf-8", errors="replace")
+            debug_log(f"_list_untracked rc={r.returncode}: {stderr_str[:200]}")
            return {}
+        stdout = (r.stdout or b"").decode("utf-8", errors="replace")
        out = {}
-        for p in r.stdout.split("\0"):
+        for p in stdout.split("\0"):
            if not p:
                continue
            try:
@@ -346,7 +377,9 @@ def _list_untracked(cwd):
                debug_log(f"_list_untracked: capped at {UNTRACKED_BASELINE_CAP}")
                break
        return out
-    except (subprocess.TimeoutExpired, FileNotFoundError, OSError) as e:
+    except (subprocess.TimeoutExpired, FileNotFoundError, OSError, ValueError) as e:
+        # ValueError guards against any future strict-decode regression
+        # so the helper degrades to {} instead of crashing the hook.
        debug_log(f"_list_untracked error: {e}")
        return {}

--- a/plugins/security-guidance/hooks/ensure_agent_sdk.py
+++ b/plugins/security-guidance/hooks/ensure_agent_sdk.py
@@ -23,6 +23,12 @@ import sys
 import time
 from pathlib import Path

+# Shared state-dir resolver: SECURITY_WARNINGS_STATE_DIR → CLAUDE_CONFIG_DIR/security
+# → ~/.claude/security. See _base.state_dir for resolution precedence. Re-aliased
+# here to match the existing local name (state_dir was already a local var in
+# main() and _maybe_emit_user_notice).
+from _base import state_dir as _resolve_state_dir
+
 # Outcome codes for the sdk_bootstrap metric. Values are stable for telemetry.
 NOOP_SYSTEM = 0      # claude_agent_sdk already importable in system python
 NOOP_VENV = 1        # venv already built and SDK imports from it
@@ -36,6 +42,122 @@ HOOK_PY_INCOMPATIBLE = 6  # hook interpreter is <3.10 — SDK syntax can't load
                          # here no matter how the venv was built. See #2071.


+# Phase + err-kind integer encoding for sdk_bootstrap_phase / sdk_bootstrap_err.
+#
+# Earlier versions emitted these as STRINGS (e.g. "pip", "dns_fail"). CC's
+# plugin-metrics pipeline silently drops plugin-emitted string values —
+# only `bool|finite-number` plugin metrics reach BigQuery. (CC-core
+# metrics like `subscription_type` are exempt because they're injected
+# downstream of plugin validation.) Confirmed empirically: 185K
+# BUILD_FAILED rows in BQ had `sdk_bootstrap_phase`/`sdk_bootstrap_err`
+# = NULL despite the Python code emitting them. This left ~28K
+# BUILD_FAILED sessions/day with no diagnostic split — flying blind on
+# the real failure modes (pip-no-match vs dns-fail vs ssl-verify etc.).
+#
+# Fix: encode as small integers per the maps below. Values are
+# APPEND-ONLY for telemetry stability. Reserve 99 as the "unknown /
+# uncategorized" bucket so an unmapped err_kind (e.g., a new exception
+# type) still emits a non-zero signal.
+SDK_BOOTSTRAP_PHASE_CODES = {
+    "pre":  1,  # pre-venv (state_dir.mkdir, sentinel open)
+    "venv": 2,  # python -m venv --clear
+    "pip":  3,  # pip install
+    "main": 4,  # uncaught exception above main()
+}
+SDK_BOOTSTRAP_ERR_CODES = {
+    "pip_no_match":         1,
+    "dns_fail":             2,
+    "conn_refused":         3,
+    "ssl_verify":           4,
+    "perm_denied":          5,
+    "no_pip":               6,
+    "disk_full":            7,
+    "proxy_auth":           8,
+    "stderr_timeout":       9,   # pip stderr containing "timeout"/"timed out"
+    "subprocess_timeout":   10,  # subprocess.TimeoutExpired (>120s)
+    # Venv-stage specific categories added after PR #2112 telemetry surfaced
+    # 2,406 phase=2/err=99 sessions in the first 3h of v2.0.1 — venv phase
+    # failing in ways the original pip-flavored patterns didn't catch. These
+    # all split out of what was previously collapsing to _uncategorized.
+    "venv_ensurepip_fail":  11,  # Debian/Ubuntu missing python3-venv;
+                                 # stderr mentions ensurepip non-zero exit
+                                 # or "ensurepip is not available"
+    "venv_path_too_long":   12,  # Windows MAX_PATH (260) or POSIX
+                                 # ENAMETOOLONG — venv writes deep paths
+                                 # under state_dir/agent-sdk-venv/Lib/...
+    "venv_no_module":       13,  # `python3 -m venv` itself missing — "No
+                                 # module named 'venv'" / "No module named venv"
+    "venv_already_exists":  14,  # Errno 17 / "file exists" — sentinel race
+                                 # past O_EXCL or stale dir survived --clear
+    "venv_setup_failed":    15,  # Generic "virtual environment was not
+                                 # created successfully" — catches the long
+                                 # tail of venv setup failures that don't
+                                 # match a more specific category above
+    # 16–98 reserved for future categories; APPEND-ONLY.
+    # 99 catches everything else (including "exc:<TypeName>" and "other:<tail>"
+    # — the original string is debug-loggable but the integer is what makes
+    # it to telemetry). For the "other:" tail, `sdk_bootstrap_stderr_sig`
+    # carries a bounded integer hash so we can still distinguish patterns
+    # in BQ aggregation.
+    "_uncategorized":       99,
+}
+
+
+def _encode_phase(s):
+    """Map err_phase string to its telemetry integer code, or 0 if unset.
+    Empty/None → 0 lets `if encoded:` cleanly skip emission. Per
+    SDK_BOOTSTRAP_PHASE_CODES, valid codes are 1-4."""
+    return SDK_BOOTSTRAP_PHASE_CODES.get((s or "").strip(), 0)
+
+
+def _encode_err_kind(s):
+    """Map err_kind string to its telemetry integer code, or 0 if unset.
+    Direct hits use the static map; "exc:<X>" and "other:<tail>" both
+    collapse to _uncategorized (99) — the raw string survives in debug
+    logs, only the integer reaches BQ."""
+    s = (s or "").strip()
+    if not s:
+        return 0
+    if s in SDK_BOOTSTRAP_ERR_CODES:
+        return SDK_BOOTSTRAP_ERR_CODES[s]
+    # Prefix matches for the catch-all categories
+    if s.startswith("exc:") or s.startswith("other:") or s == "other":
+        return SDK_BOOTSTRAP_ERR_CODES["_uncategorized"]
+    # Unknown string — still emit as uncategorized rather than dropping
+    return SDK_BOOTSTRAP_ERR_CODES["_uncategorized"]
+
+
+def _encode_stderr_sig(err_kind):
+    """Bounded integer hash of the stderr tail captured in "other:<tail>"
+    err_kinds. Lets us distinguish patterns INSIDE the _uncategorized
+    (code 99) bucket without unbounded cardinality.
+
+    Returns 0 for non-"other:" err_kinds (so the field auto-omits from
+    emit_metrics on categorized failures — see the emit block in main()).
+
+    Strategy: take the tail's first ~30 chars (post-lowercase, post-trim),
+    SHA-1, fold the first 2 bytes to 0–999. Different stderr messages
+    cluster into different buckets; same stderr always maps to the same
+    bucket. Cardinality is bounded at 1000, well below any "high
+    cardinality" alarm — and a real failure mode typically produces
+    near-identical stderr across thousands of machines, so 1000 buckets
+    is comfortably wide.
+
+    Why first ~30 chars: stderr like "ERROR: Command failed: <full
+    path>" varies the tail wildly (paths) but the categorization signal
+    is in the leading words. Dropping the suffix focuses the hash on
+    the discriminative part.
+    """
+    if not err_kind or not err_kind.startswith("other:"):
+        return 0
+    import hashlib
+    tail = err_kind[len("other:"):].strip().lower()[:30]
+    if not tail:
+        return 0
+    h = hashlib.sha1(tail.encode("utf-8", errors="replace")).digest()
+    return int.from_bytes(h[:2], "big") % 1000
+
+
 def _sdk_on_syspath() -> bool:
    # find_spec is ~10ms; actually importing the SDK pulls in
    # transitive deps and costs ~800ms — too heavy for a
@@ -90,10 +212,7 @@ def main() -> tuple[int, str, str]:
    if _sdk_on_syspath():
        return NOOP_SYSTEM, "", ""

-    state_dir = Path(
-        os.environ.get("SECURITY_WARNINGS_STATE_DIR")
-        or os.path.expanduser("~/.claude/security")
-    )
+    state_dir = Path(_resolve_state_dir())
    venv = state_dir / "agent-sdk-venv"
    # Windows venvs put the interpreter at Scripts\python.exe; POSIX uses bin/python.
    if sys.platform == "win32":
@@ -177,7 +296,34 @@ def main() -> tuple[int, str, str]:
        else:
            stderr_str = str(stderr_b)
        s = stderr_str.lower()
-        if "no matching distribution" in s or "could not find a version" in s:
+        # Venv-specific patterns checked FIRST — they overlap with some pip
+        # patterns (e.g. "no module named ensurepip" could match no_pip OR
+        # venv_ensurepip_fail; the venv-stage interpretation is the right
+        # one when err_phase=="venv"). Order is venv-most-specific →
+        # pip-historical → generic.
+        if err_phase == "venv" and (
+            "ensurepip is not available" in s
+            or ("ensurepip" in s and "returned non-zero" in s)
+            or "the virtual environment was not created" in s and "ensurepip" in s
+        ):
+            err_kind = "venv_ensurepip_fail"
+        elif err_phase == "venv" and (
+            "[errno 36]" in s
+            or "file name too long" in s
+            or "path too long" in s
+        ):
+            err_kind = "venv_path_too_long"
+        elif err_phase == "venv" and (
+            "no module named venv" in s
+            or "no module named 'venv'" in s
+        ):
+            err_kind = "venv_no_module"
+        elif err_phase == "venv" and (
+            "[errno 17]" in s
+            or ("file exists" in s and "venv" in s)
+        ):
+            err_kind = "venv_already_exists"
+        elif "no matching distribution" in s or "could not find a version" in s:
            err_kind = "pip_no_match"
        elif "name or service not known" in s or "name resolution" in s \
                or "nodename nor servname" in s or "temporary failure in name" in s:
@@ -196,6 +342,15 @@ def main() -> tuple[int, str, str]:
            err_kind = "proxy_auth"
        elif "timeout" in s or "timed out" in s:
            err_kind = "stderr_timeout"
+        elif err_phase == "venv" and (
+            "virtual environment was not created" in s
+            or "error: command" in s and "venv" in s
+        ):
+            # Generic venv-setup catch-all — matched AFTER the more specific
+            # venv patterns above so we don't shadow them, but BEFORE the
+            # other: fallback so generic venv setup failures get their own
+            # bucket instead of polluting the long-tail signature space.
+            err_kind = "venv_setup_failed"
        else:
            # First 60 chars of the last non-empty stderr line — bounded to
            # stay inside CC's metric value-length budget. Real failure modes
@@ -239,10 +394,7 @@ def _maybe_emit_user_notice(outcome: int, pv: int) -> str | None:
    if outcome != HOOK_PY_INCOMPATIBLE:
        return None
    try:
-        state_dir = Path(
-            os.environ.get("SECURITY_WARNINGS_STATE_DIR")
-            or os.path.expanduser("~/.claude/security")
-        )
+        state_dir = Path(_resolve_state_dir())
        marker = state_dir / f".agentic_unavailable_notice_v{pv or 0}"
        if marker.exists():
            return None
@@ -288,21 +440,33 @@ if __name__ == "__main__":
    # and takes the FIRST non-{"async":...} JSON line as the hook response;
    # its `metrics` key is forwarded to the hook metrics event on the
    # next attachments pass. Must be a single line — the registry splits on
-    # \n and json-parses each independently. Values must be bool|number OR
-    # short strings (CC accepts string metric values if they're not
-    # null). Stay inside the 10-key emit cap.
+    # \n and json-parses each independently.
+    #
+    # IMPORTANT — values must be bool|finite-number. The validation comment
+    # has historically said "or short strings" but that was wrong: CC's
+    # plugin-metrics pipeline silently drops plugin-emitted string values.
+    # Stay inside the 10-key emit cap.
    metrics: dict[str, object] = {
        "sdk_bootstrap": outcome,
        "sdk_bootstrap_ms": round((time.perf_counter() - t0) * 1000),
    }
    if err_kind:
-        # Truncate defensively; categorized values are <40 chars but the
-        # `other:<tail>` mode could be longer. err_phase may be empty for
-        # pre-venv failures (state_dir.mkdir perm-denied, sentinel O_EXCL
-        # raising a non-FileExistsError OSError) — emit as "pre" so the
-        # err_kind isn't silently dropped.
-        metrics["sdk_bootstrap_phase"] = (err_phase or "pre")[:16]
-        metrics["sdk_bootstrap_err"] = err_kind[:96]
+        # Encode phase + err_kind as integer codes (see
+        # SDK_BOOTSTRAP_PHASE_CODES / SDK_BOOTSTRAP_ERR_CODES). Earlier
+        # versions emitted these as strings and CC dropped them — restoring
+        # the diagnostic split that 28K BUILD_FAILED/day need to triage by
+        # root cause. err_phase defaults to "pre" when empty (pre-venv
+        # failure path, e.g. state_dir.mkdir perm-denied).
+        metrics["sdk_bootstrap_phase"] = _encode_phase(err_phase or "pre")
+        metrics["sdk_bootstrap_err"] = _encode_err_kind(err_kind)
+        # For "other:<tail>" (encoded err==99), emit a bounded integer
+        # hash of the stderr tail so BQ can distinguish patterns inside
+        # the _uncategorized bucket without unbounded cardinality. Zero
+        # when err_kind is categorized — the schema reader treats 0 as
+        # "no signal", matching the absence convention.
+        sig = _encode_stderr_sig(err_kind)
+        if sig:
+            metrics["sdk_bootstrap_stderr_sig"] = sig
    pv = _plugin_version_int()
    if pv:
        metrics["pv"] = pv
--- a/plugins/security-guidance/hooks/gitutil.py
+++ b/plugins/security-guidance/hooks/gitutil.py
@@ -26,18 +26,34 @@ GIT_CMD = [
    "git",
    "-c", "core.fsmonitor=false",
    "-c", "core.hooksPath=/dev/null",
+    # core.quotePath=false: emit raw UTF-8 in path-emitting commands instead
+    # of C-quoting non-ASCII bytes (default `"\\303\\201vila/..."` vs
+    # `Ávila/...`). Downstream parsers — both ours (parse_diff_into_files,
+    # extract_file_paths_from_diff) and Python stdlib (os.path.isabs,
+    # os.path.join) — expect raw paths and silently drop / mishandle the
+    # quoted form. Adding the flag globally to GIT_CMD covers every
+    # subprocess.run site that uses the splat — diff feeders, rev-parse
+    # path queries (--show-toplevel, --git-dir, --git-common-dir),
+    # reflog %gs subjects, ls-files, status, etc. — without per-site
+    # flag duplication. See #2082, #2099.
+    "-c", "core.quotePath=false",
 ]


 def _git_rev_parse_head(cwd):
    """Return the current HEAD SHA, or None if not a git repo / no commits."""
    try:
+        # See #2099: text=True on Windows cp1252 crashes the reader thread on
+        # any UTF-8 byte undefined in cp1252 (e.g. via a git error message
+        # referencing a non-ASCII filename in stderr). stdout is a SHA so it
+        # IS safe; stderr is not. capture_output=True with bytes-by-default
+        # never decodes, so the reader thread can't crash.
        result = subprocess.run(
            [*GIT_CMD, "rev-parse", "HEAD"],
-            cwd=cwd, capture_output=True, text=True, timeout=5
+            cwd=cwd, capture_output=True, timeout=5
        )
        if result.returncode == 0 and result.stdout.strip():
-            return result.stdout.strip()
+            return result.stdout.decode("utf-8", errors="replace").strip()
        return None
    except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
        return None
@@ -52,13 +68,17 @@ def _find_git_index(cwd):
    Returns the absolute path to the index file, or None.
    """
    try:
+        # See #2099: stdout here is a PATH which can contain non-ASCII bytes
+        # (e.g. C:\אבטחה\repo\.git). text=True decodes via cp1252 strict on
+        # Windows → crashes the reader thread → returns stdout=None →
+        # caller does .strip() on None → AttributeError. Decode manually.
        result = subprocess.run(
            [*GIT_CMD, "rev-parse", "--git-dir"],
-            cwd=cwd, capture_output=True, text=True, timeout=5
+            cwd=cwd, capture_output=True, timeout=5
        )
        if result.returncode != 0:
            return None
-        git_dir = result.stdout.strip()
+        git_dir = result.stdout.decode("utf-8", errors="replace").strip()
        if not os.path.isabs(git_dir):
            git_dir = os.path.join(cwd, git_dir)
        index_path = os.path.join(git_dir, "index")
@@ -128,9 +148,13 @@ def _temp_index(cwd, untracked_paths=None):
        else:
            add_args = None
        if add_args:
+            # No stdout used here (only returncode matters), but text=True
+            # still spawns reader threads that decode stderr — git error
+            # messages can reference non-ASCII filenames and crash on
+            # cp1252. See #2099. Drop text=True so bytes stay raw.
            subprocess.run(
                [*GIT_CMD, "add", "--intent-to-add"] + add_args,
-                cwd=cwd, capture_output=True, text=True, timeout=10,
+                cwd=cwd, capture_output=True, timeout=10,
                env=env,
            )
        yield env
@@ -144,11 +168,17 @@ def _temp_index(cwd, untracked_paths=None):
 def _git_toplevel(cwd):
    """Absolute repo root for `cwd`, or None if not in a work tree."""
    try:
+        # See #2099: stdout is a PATH — `C:\אבטחה\repo` returned as UTF-8
+        # bytes by git. text=True would decode via cp1252 strict on Windows
+        # → reader-thread crash. Decode manually with errors="replace".
        r = subprocess.run(
            [*GIT_CMD, "rev-parse", "--show-toplevel"],
-            cwd=cwd, capture_output=True, text=True, timeout=5,
+            cwd=cwd, capture_output=True, timeout=5,
        )
-        return r.stdout.strip() if r.returncode == 0 and r.stdout.strip() else None
+        if r.returncode != 0:
+            return None
+        path = r.stdout.decode("utf-8", errors="replace").strip()
+        return path if path else None
    except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
        return None

@@ -164,13 +194,15 @@ def _git_dir(repo_root):
    callers can degrade (push-sweep state is best-effort).
    """
    try:
+        # See #2099: stdout is a PATH (shared gitdir), may be non-ASCII.
+        # Decode bytes manually to avoid cp1252 reader-thread crash.
        r = subprocess.run(
            [*GIT_CMD, "rev-parse", "--git-common-dir"],
-            cwd=repo_root, capture_output=True, text=True, timeout=5,
+            cwd=repo_root, capture_output=True, timeout=5,
        )
        if r.returncode != 0:
            return None
-        d = r.stdout.strip()
+        d = r.stdout.decode("utf-8", errors="replace").strip()
        return d if os.path.isabs(d) else os.path.join(repo_root, d)
    except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
        return None
@@ -179,13 +211,15 @@ def _git_dir(repo_root):
 def _git_rev_list_range(repo_root, base, head="HEAD"):
    """Shas in `base..head`, oldest→newest. Empty list on error."""
    try:
+        # See #2099: stdout is ASCII SHAs, but stderr can carry git error
+        # messages referencing non-ASCII filenames — keep bytes raw.
        r = subprocess.run(
            [*GIT_CMD, "rev-list", "--reverse", f"{base}..{head}"],
-            cwd=repo_root, capture_output=True, text=True, timeout=10,
+            cwd=repo_root, capture_output=True, timeout=10,
        )
        if r.returncode != 0:
            return []
-        return [s for s in r.stdout.strip().split("\n") if s]
+        return [s for s in r.stdout.decode("utf-8", errors="replace").strip().split("\n") if s]
    except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
        return []

@@ -199,6 +233,10 @@ def _git_diff_range(repo_root, base, head="HEAD"):
    them reviewed — otherwise unreviewed commits get permanently silenced.
    """
    try:
+        # GIT_CMD globally passes core.quotePath=false (see definition) so
+        # non-ASCII paths in `diff --git a/... b/...` headers come through as
+        # raw UTF-8, not C-quoted. Required by the downstream
+        # parse_diff_into_files / extract_file_paths_from_diff regex.
        r = subprocess.run(
            [*GIT_CMD, "diff", "-p", "--no-color", "--no-ext-diff", base, head],
            cwd=repo_root, capture_output=True, timeout=30,
@@ -213,9 +251,11 @@ def _git_diff_range(repo_root, base, head="HEAD"):
 def _detect_main_branch(repo_root):
    for ref in ("origin/HEAD", "origin/main", "origin/master", "main", "master"):
        try:
+            # See #2099: stdout is a SHA but stderr can carry non-ASCII git
+            # warnings — keep bytes raw to avoid cp1252 reader-thread crash.
            r = subprocess.run(
                [*GIT_CMD, "rev-parse", "--verify", "-q", ref],
-                cwd=repo_root, capture_output=True, text=True, timeout=5,
+                cwd=repo_root, capture_output=True, timeout=5,
            )
            if r.returncode == 0 and r.stdout.strip():
                return ref
@@ -259,19 +299,29 @@ def _git_reflog_recent_commits(repo_root, max_age_s=120, max_n=5):
        # %gs (the reflog subject) is `commit: <commit-msg first line>` and can
        # contain `|`; put it LAST so split("|", 2) leaves it intact. %H is
        # hex and %ct is integer, so the first two fields are delimiter-safe.
+        #
+        # Bytes + decode utf-8/replace: %gs embeds commit-message subjects
+        # which git stores as raw bytes — commits can be authored in
+        # latin-1 / cp1252 / shift-jis etc., and text=True would raise
+        # UnicodeDecodeError in the subprocess reader thread on Windows
+        # cp1252 (subprocess.run returns r.stdout=None, then
+        # r.stdout.splitlines() AttributeErrors). Mirrors the existing
+        # migration at security_reminder_hook.py:540 — same pattern was
+        # missed here. See anthropics/claude-plugins-official#2056.
        r = subprocess.run(
            [*GIT_CMD, "log", "-g", "-n", str(max_n),
             "--format=%H|%ct|%gs", "HEAD"],
-            cwd=repo_root, capture_output=True, text=True, timeout=5,
+            cwd=repo_root, capture_output=True, timeout=5,
        )
-    except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
+    except (subprocess.TimeoutExpired, FileNotFoundError, OSError, ValueError):
        return [], 0
    if r.returncode != 0:
        return [], 0
+    stdout = (r.stdout or b"").decode("utf-8", errors="replace")
    import time as _time
    now = int(_time.time())
    fresh, stale = [], 0
-    for idx, line in enumerate(r.stdout.splitlines()):
+    for idx, line in enumerate(stdout.splitlines()):
        parts = line.split("|", 2)
        if len(parts) != 3:
            continue
@@ -306,23 +356,32 @@ def _git_name_only(cwd, base, include_untracked=False):
    must distinguish None (error → don't trust as a filter) from set()
    (genuinely nothing changed). `-c core.quotePath=false -z` keeps non-ASCII
    and space-containing paths intact."""
+    # Decode stdout/stderr as UTF-8 with errors="replace" instead of using
+    # text=True. core.quotePath=false makes git emit raw UTF-8 for non-ASCII
+    # paths, and text=True on Windows decodes via cp1252 strict — a non-ASCII
+    # changed path would crash the subprocess reader thread, leave
+    # result.stdout=None, and propagate AttributeError out of the helper.
+    # Same fix shape as diffstate._list_untracked. See #2056.
    def _run(env):
+        # core.quotePath=false comes from GIT_CMD globally (see definition).
        result = subprocess.run(
-            [*GIT_CMD, "-c", "core.quotePath=false", "diff", "--name-only", "-z", base],
-            cwd=cwd, capture_output=True, text=True, timeout=30,
+            [*GIT_CMD, "diff", "--name-only", "-z", base],
+            cwd=cwd, capture_output=True, timeout=30,
            env=env,
        )
        if result.returncode != 0:
-            debug_log(f"_git_name_only({base!r}) rc={result.returncode}: {result.stderr[:200]}")
+            stderr_str = (result.stderr or b"").decode("utf-8", errors="replace")
+            debug_log(f"_git_name_only({base!r}) rc={result.returncode}: {stderr_str[:200]}")
            return None
-        return {p for p in result.stdout.split("\0") if p}
+        stdout = (result.stdout or b"").decode("utf-8", errors="replace")
+        return {p for p in stdout.split("\0") if p}

    try:
        if not include_untracked:
            return _run(None)
        with _temp_index(cwd) as env:
            return _run(env)
-    except (subprocess.TimeoutExpired, FileNotFoundError, OSError) as e:
+    except (subprocess.TimeoutExpired, FileNotFoundError, OSError, ValueError) as e:
        debug_log(f"_git_name_only({base!r}) error: {e}")
        return None

@@ -339,17 +398,22 @@ def _git_status_porcelain(cwd):
    collapses to `dir/`). Required so the untracked set subtracts cleanly
    against the UPS-time `_list_untracked` snapshot, which uses ls-files and
    therefore always lists individual files."""
+    # Lenient decode: same UTF-8 + errors="replace" pattern as the
+    # sibling helpers — a non-ASCII path in the worktree would otherwise
+    # crash the cp1252 reader thread on Windows. See #2056.
    try:
+        # core.quotePath=false comes from GIT_CMD globally (see definition).
        r = subprocess.run(
-            [*GIT_CMD, "-c", "core.quotePath=false", "status",
-             "--porcelain=v1", "-uall", "-z"],
-            cwd=cwd, capture_output=True, text=True, timeout=30,
+            [*GIT_CMD, "status", "--porcelain=v1", "-uall", "-z"],
+            cwd=cwd, capture_output=True, timeout=30,
        )
        if r.returncode != 0:
-            debug_log(f"_git_status_porcelain rc={r.returncode}: {r.stderr[:200]}")
+            stderr_str = (r.stderr or b"").decode("utf-8", errors="replace")
+            debug_log(f"_git_status_porcelain rc={r.returncode}: {stderr_str[:200]}")
            return None, None
        tracked, untracked = set(), set()
-        entries = r.stdout.split("\0")
+        stdout = (r.stdout or b"").decode("utf-8", errors="replace")
+        entries = stdout.split("\0")
        i = 0
        while i < len(entries):
            e = entries[i]
@@ -368,7 +432,9 @@ def _git_status_porcelain(cwd):
                    i += 1
            i += 1
        return tracked, untracked
-    except (subprocess.TimeoutExpired, FileNotFoundError, OSError) as e:
+    except (subprocess.TimeoutExpired, FileNotFoundError, OSError, ValueError) as e:
+        # ValueError guards against any future strict-decode regression
+        # so the helper degrades to (None, None) instead of crashing.
        debug_log(f"_git_status_porcelain error: {e}")
        return None, None

@@ -378,9 +444,12 @@ def _is_ancestor(cwd, maybe_ancestor, descendant):
    """True if `maybe_ancestor` is reachable from `descendant` (i.e. HEAD
    moved forward via commit/merge, not sideways via checkout)."""
    try:
+        # See #2099: only returncode matters, but text=True spawns reader
+        # threads that decode stderr — git error messages can carry non-ASCII
+        # filenames. Drop text=True to keep bytes raw, avoid cp1252 crash.
        result = subprocess.run(
            [*GIT_CMD, "merge-base", "--is-ancestor", maybe_ancestor, descendant],
-            cwd=cwd, capture_output=True, text=True, timeout=5,
+            cwd=cwd, capture_output=True, timeout=5,
        )
        return result.returncode == 0
    except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
@@ -411,6 +480,7 @@ def get_git_diff(cwd, baseline_sha, full_context=False, paths=None, untracked_pa
        # change exists to fix.
        return ""

+    # core.quotePath=false comes from GIT_CMD globally (see definition).
    cmd = [*GIT_CMD, "diff", "--no-color", "--no-ext-diff", baseline_sha] + (["--unified=99999"] if full_context else []) + pathspec
    try:
        with _temp_index(cwd, untracked_paths) as env:
--- a/plugins/security-guidance/hooks/hooks.json
+++ b/plugins/security-guidance/hooks/hooks.json
@@ -49,6 +49,30 @@
            "asyncRewake": true,
            "rewakeMessage": "Background security review of pushed commits not yet reviewed — address or acknowledge the findings below, then continue with the user's original request or continue waiting for their reply:",
            "rewakeSummary": "Push security review found issues"
+          },
+          {
+            "type": "command",
+            "command": "bash \"${CLAUDE_PLUGIN_ROOT}/hooks/sg-python.sh\" \"${CLAUDE_PLUGIN_ROOT}/hooks/security_reminder_hook.py\"",
+            "if": "Bash(gt create:*)",
+            "asyncRewake": true,
+            "rewakeMessage": "Background security review of commit — address or acknowledge the findings below, then continue with the user's original request or continue waiting for their reply:",
+            "rewakeSummary": "Commit security review found issues"
+          },
+          {
+            "type": "command",
+            "command": "bash \"${CLAUDE_PLUGIN_ROOT}/hooks/sg-python.sh\" \"${CLAUDE_PLUGIN_ROOT}/hooks/security_reminder_hook.py\"",
+            "if": "Bash(gt modify:*)",
+            "asyncRewake": true,
+            "rewakeMessage": "Background security review of commit — address or acknowledge the findings below, then continue with the user's original request or continue waiting for their reply:",
+            "rewakeSummary": "Commit security review found issues"
+          },
+          {
+            "type": "command",
+            "command": "bash \"${CLAUDE_PLUGIN_ROOT}/hooks/sg-python.sh\" \"${CLAUDE_PLUGIN_ROOT}/hooks/security_reminder_hook.py\"",
+            "if": "Bash(gt submit:*)",
+            "asyncRewake": true,
+            "rewakeMessage": "Background security review of pushed commits not yet reviewed — address or acknowledge the findings below, then continue with the user's original request or continue waiting for their reply:",
+            "rewakeSummary": "Push security review found issues"
          }
        ],
        "matcher": "Bash"
--- a/plugins/security-guidance/hooks/llm.py
+++ b/plugins/security-guidance/hooks/llm.py
@@ -27,7 +27,7 @@ from typing import Optional, Tuple, Dict, Any, List

 import extensibility
 import review_api
-from _base import debug_log, _record_usage, _PV, PROVENANCE_TAG  # noqa: F401
+from _base import debug_log, _record_usage, _record_http_error, _PV, PROVENANCE_TAG, state_dir as _resolve_state_dir  # noqa: F401
 from session_state import with_locked_state


@@ -355,10 +355,7 @@ def _call_claude_via_sdk(prompt, output_schema, *, max_tokens=16000, model=None)
        # Try the venv ensure_agent_sdk.py builds. Same fallback logic as
        # agentic_review() — duplicated here so the 3P path doesn't require
        # the agentic path to have run first.
-        _state_dir = os.environ.get(
-            "SECURITY_WARNINGS_STATE_DIR",
-            os.path.expanduser("~/.claude/security"),
-        )
+        _state_dir = _resolve_state_dir()
        _inject_agent_sdk_venv_into_syspath(_state_dir)
        try:
            import asyncio as _asyncio  # noqa: F811
@@ -371,6 +368,7 @@ def _call_claude_via_sdk(prompt, output_schema, *, max_tokens=16000, model=None)
        except Exception as e:
            debug_log(f"3P sdk-single-turn: SDK unavailable ({e})")
            _last_call_claude_http_error = -1
+            _record_http_error(-1)
            return None

    cli_path = os.environ.get("SG_AGENTIC_CLI_PATH") or None
@@ -428,6 +426,7 @@ def _call_claude_via_sdk(prompt, output_schema, *, max_tokens=16000, model=None)
    except _asyncio.TimeoutError:
        debug_log("3P sdk-single-turn: timeout after 60s")
        _last_call_claude_http_error = -1
+        _record_http_error(-1)
        return None
    except Exception as e:
        debug_log(f"3P sdk-single-turn: query failed ({e})")
@@ -436,6 +435,7 @@ def _call_claude_via_sdk(prompt, output_schema, *, max_tokens=16000, model=None)
            for _l in _captured_stderr[:20]:
                debug_log(f"  | {_l.rstrip()}")
        _last_call_claude_http_error = -1
+        _record_http_error(-1)
        return None


@@ -482,10 +482,21 @@ def _call_claude(prompt, output_schema, thinking_budget=10000, max_tokens=16000,
        "max_tokens": max_tokens,
        "system": CLAUDE_CODE_SYSTEM_PROMPT,
        "messages": [{"role": "user", "content": prompt}],
-        "output_format": {
-            "type": "json_schema",
-            "schema": output_schema
-        }
+        # API moved the structured-output schema from top-level `output_format`
+        # to `output_config.format` per
+        # https://platform.claude.com/docs/en/build-with-claude/structured-outputs.
+        # The old form "continues to work for a transition period" for some
+        # auth modes (API key + non-streaming), but is rejected with
+        # `invalid_request_error: output_format: This field is deprecated.
+        # Use 'output_config.format' instead.` for others (OAuth Bearer +
+        # newer CLI versions hit it consistently — reporter saw 462 errors
+        # in one day). See #2098.
+        "output_config": {
+            "format": {
+                "type": "json_schema",
+                "schema": output_schema,
+            },
+        },
    }
    if thinking_budget > 0:
        # Models trained on adaptive thinking (4.6+) reject the budget_tokens
@@ -493,7 +504,10 @@ def _call_claude(prompt, output_schema, thinking_budget=10000, max_tokens=16000,
        # models (4.5 and earlier, all 3.x) reject adaptive. Pick by model.
        if _model_supports_adaptive_thinking(payload["model"]):
            payload["thinking"] = {"type": "adaptive"}
-            payload["output_config"] = {"effort": "high"}
+            # Merge `effort` into the existing output_config dict (which
+            # now carries the `format` schema) rather than reassigning —
+            # otherwise the schema is silently overwritten. See #2098.
+            payload["output_config"]["effort"] = "high"
        else:
            payload["thinking"] = {
                "type": "enabled",
@@ -531,6 +545,7 @@ def _call_claude(prompt, output_schema, thinking_budget=10000, max_tokens=16000,
                error_body = e.read().decode("utf-8") if e.fp else ""
                debug_log(f"API error: {e.code} - {error_body[:200]}")
                _last_call_claude_http_error = e.code
+                _record_http_error(e.code)
                return None
        except (urllib.error.URLError, TimeoutError) as e:
            if attempt < 2:
@@ -540,6 +555,7 @@ def _call_claude(prompt, output_schema, thinking_budget=10000, max_tokens=16000,
            else:
                debug_log(f"Request failed after retries: {e}")
                _last_call_claude_http_error = -1
+                _record_http_error(-1)
                return None

    if not response_data:
@@ -548,6 +564,7 @@ def _call_claude(prompt, output_schema, thinking_budget=10000, max_tokens=16000,
        # call uses the token; record the 401 so callers don't see error=None.
        if _last_call_claude_http_error is None:
            _last_call_claude_http_error = 401
+            _record_http_error(401)
        return None

    # Find the text block (skip thinking blocks)
@@ -1145,10 +1162,7 @@ def agentic_review(
        # ~/.claude/security/ with the SDK installed; try that as a fallback
        # before giving up. The system import is attempted first so users
        # who DO have it never touch the venv.
-        _state_dir = os.environ.get(
-            "SECURITY_WARNINGS_STATE_DIR",
-            os.path.expanduser("~/.claude/security"),
-        )
+        _state_dir = _resolve_state_dir()
        _venv_tried = _inject_agent_sdk_venv_into_syspath(_state_dir)
        try:
            import asyncio as _asyncio  # noqa: F811
--- a/plugins/security-guidance/hooks/patterns.py
+++ b/plugins/security-guidance/hooks/patterns.py
@@ -94,6 +94,9 @@ Only use exec() if you absolutely need shell features and the input is guarantee
    },
    {
        "ruleName": "new_function_injection",
+        # JS-only construct: gate to JS/TS files so docs/.md and other prose
+        # mentioning "new Function" don't trip the warning.
+        "path_filter": lambda p: p.endswith(_JS_EXTS),
        "substrings": ["new Function"],
        "reminder": "\u26a0\ufe0f Security Warning: Using new Function() with string interpolation is a CODE INJECTION vulnerability. If any variable is concatenated or interpolated into the function body string, an attacker controlling that variable can execute arbitrary code. Use safe alternatives: for property access use obj[key] or array.reduce((o, k) => o[k], root); for computation use a safe expression parser. NEVER interpolate untrusted strings into new Function() bodies.",
    },
@@ -107,16 +110,24 @@ Only use exec() if you absolutely need shell features and the input is guarantee
    },
    {
        "ruleName": "react_dangerously_set_html",
+        # JS/TS-only (React); gate so .md docs / .py / .go files don't trip.
+        "path_filter": lambda p: p.endswith(_JS_EXTS),
        "substrings": ["dangerouslySetInnerHTML"],
        "reminder": "⚠️ Security Warning: dangerouslySetInnerHTML can lead to XSS vulnerabilities if used with untrusted content. Ensure all content is properly sanitized using an HTML sanitizer library like DOMPurify, or use safe alternatives.",
    },
    {
        "ruleName": "document_write_xss",
+        # Browser DOM API: only meaningful in JS/TS source.
+        "path_filter": lambda p: p.endswith(_JS_EXTS),
        "substrings": ["document.write"],
        "reminder": "⚠️ Security Warning: document.write() can be exploited for XSS attacks and has performance issues. Use DOM manipulation methods like createElement() and appendChild() instead.",
    },
    {
        "ruleName": "innerHTML_xss",
+        # Browser DOM API: only meaningful in JS/TS source. Closes FPs like
+        # docs/example HTML, playground/self-contained skills that hardcode
+        # innerHTML strings with zero user input (#410).
+        "path_filter": lambda p: p.endswith(_JS_EXTS),
        "substrings": [".innerHTML =", ".innerHTML="],
        "reminder": "⚠️ Security Warning: Setting innerHTML with untrusted content can lead to XSS vulnerabilities. Use textContent for plain text or safe DOM methods for HTML content. If you need HTML support, consider using an HTML sanitizer library such as DOMPurify.",
    },
@@ -217,11 +228,15 @@ Additionally, validate user inputs:
    },
    {
        "ruleName": "outerHTML_xss",
+        # Browser DOM API: only meaningful in JS/TS source.
+        "path_filter": lambda p: p.endswith(_JS_EXTS),
        "substrings": [".outerHTML =", ".outerHTML="],
        "reminder": "⚠️ Security Warning: Use textContent or sanitize with DOMPurify. outerHTML assignment is an XSS sink equivalent to innerHTML.",
    },
    {
        "ruleName": "insertAdjacentHTML_xss",
+        # Browser DOM API: only meaningful in JS/TS source.
+        "path_filter": lambda p: p.endswith(_JS_EXTS),
        "substrings": [".insertAdjacentHTML("],
        "reminder": "⚠️ Security Warning: Use insertAdjacentText() or sanitize with DOMPurify. insertAdjacentHTML is an XSS sink.",
    },
--- a/plugins/security-guidance/hooks/security_reminder_hook.py
+++ b/plugins/security-guidance/hooks/security_reminder_hook.py
@@ -82,6 +82,7 @@ from _base import (  # noqa: E402,F401
    PROVENANCE_TAG, PROVENANCE_BANNER,
    _read_plugin_version_int, _PV, _USAGE, _USAGE_LOCK,
    _PRICE_PER_MTOK, _PRICE_DEFAULT, _record_usage, _usage_metrics,
+    state_dir as _resolve_state_dir,
 )
 import extensibility  # noqa: E402
 from patterns import (  # noqa: E402,F401
@@ -190,7 +191,13 @@ CONTINUATION_SUFFIX = (
    "response."
 )

-def emit_metrics(metrics, rewake_summary=None):
+def emit_metrics(
+    metrics,
+    rewake_summary=None,
+    additional_context=None,
+    system_message=None,
+    hook_event_name="PostToolUse",
+):
    """
    Write a SyncHookJSONOutput line to stdout for Claude Code to pick up.
    For asyncRewake (Stop) hooks, CC scans stdout for the first {-prefixed line
@@ -213,6 +220,45 @@ def emit_metrics(metrics, rewake_summary=None):
    rewakeSummary in hooks.json, shown to the user in the terminal as the
    task-notification one-liner. Must be in the same JSON line as the metrics
    because CC stops scanning stdout after the first {-prefixed line.
+
+    `additional_context` (asyncRewake findings): model-visible guidance text.
+    Delivery channel depends on `hook_event_name` because CC's hook-output
+    contract is NOT symmetric across events:
+
+      - PostToolUse (commit-review, push-sweep): surfaced via the modern
+        hookSpecificOutput.additionalContext protocol. `PostToolUse` is a
+        member of CC's hookSpecificOutput discriminated union
+        (coreSchemas.ts), so the JSON validates and metrics/rewakeSummary
+        are consumed. See #1375 / #1783 for why this replaced the legacy
+        stderr + exit(2) shape for PostToolUse.
+
+      - Stop / SubagentStop: there is NO `Stop` member in that union, so
+        emitting hookSpecificOutput{hookEventName:"Stop"} makes the whole
+        line fail isSyncHookJSONOutput validation — which on the asyncRewake
+        path silently drops metrics AND rewakeSummary, and (because the
+        legacy stderr write was removed) leaks the raw JSON to the model as
+        the rewake body. CC's asyncRewake delivery actually reads
+        `stderr || stdout` for the model-visible body and only scans stdout
+        JSON for metrics+rewakeSummary — it never reads additionalContext
+        on this path. So for Stop we use the documented clean pattern:
+        guidance on stderr, valid JSON (metrics + rewakeSummary +
+        top-level decision/reason) on stdout. The top-level decision:"block"
+        + reason also covers the sync-fallback path (single-shot `claude -p`,
+        where asyncRewake degrades to a sync Stop hook that reads
+        decision/reason). See #2159.
+
+    Empty/None additional_context emits neither channel (back-compat for
+    metrics-only callers).
+
+    `system_message` (optional, asyncRewake only): user-visible TUI message,
+    distinct from rewakeSummary which is the task-notification one-liner.
+    Use sparingly — the rewakeMessage in hooks.json is the primary user
+    surface; systemMessage adds a per-fire override when the static
+    rewakeMessage isn't specific enough for the finding being shown.
+
+    `hook_event_name` (used only when additional_context is set): selects the
+    delivery channel above. Defaults to "PostToolUse" (commit-review and
+    push-sweep are the most common callers); handle_stop_hook passes "Stop".
    """
    head = {}
    if _PV and "pv" not in metrics:
@@ -223,6 +269,26 @@ def emit_metrics(metrics, rewake_summary=None):
    out = {"metrics": metrics}
    if rewake_summary:
        out["rewakeSummary"] = rewake_summary
+    if additional_context:
+        if hook_event_name in ("Stop", "SubagentStop"):
+            # Stop is NOT in CC's hookSpecificOutput union — emitting it there
+            # fails schema validation and drops metrics+rewakeSummary (#2159).
+            # Clean pattern: guidance on stderr (the asyncRewake body channel,
+            # delivered via `stderr || stdout`), top-level decision/reason for
+            # the sync-fallback path. stdout JSON stays valid so metrics +
+            # rewakeSummary survive.
+            sys.stderr.write(additional_context)
+            sys.stderr.flush()
+            out["decision"] = "block"
+            out["reason"] = additional_context
+        else:
+            # PostToolUse et al. — valid union member; modern protocol.
+            out["hookSpecificOutput"] = {
+                "hookEventName": hook_event_name,
+                "additionalContext": additional_context,
+            }
+    if system_message:
+        out["systemMessage"] = system_message
    print(json.dumps(out), flush=True)

 # =====================================================================
@@ -510,7 +576,11 @@ def handle_user_prompt_submit(input_data):
    elif sha:
        debug_log(f"Captured git baseline: {sha[:12]}")
    else:
-        debug_log("Failed to capture git baseline (not a git repo?)")
+        # Show cwd so the next reporter can immediately see when this isn't
+        # actually "not a git repo" but a path-encoding / permissions / git
+        # invocation failure. See #2099.
+        debug_log(f"Failed to capture git baseline (cwd={cwd!r}) — not a git repo, "
+                  f"or git invocation failed (check log entries above)")

    sys.exit(0)

@@ -594,8 +664,29 @@ _COMMIT_SHA_RE = re.compile(r'^\[[^\]]*?\b([0-9a-f]{7,40})\]', re.MULTILINE)
 # detection — it does NOT tolerate `git -c k=v commit` global options, which
 # keeps this hook aligned with CC's commit attribution on what counts as a
 # commit.
-_GIT_COMMIT_RE = re.compile(r'\bgit\s+commit(?:\s|$)')
-_GIT_AMEND_RE = re.compile(r'\s--amend\b')
+#
+# Also matches `gt create` and `gt modify` — Graphite's stacked-PR wrapper
+# around git. `gt create` produces a new commit (mapped to git commit
+# semantics); `gt modify` amends the current commit (mapped to git commit
+# --amend, also flagged by _GIT_AMEND_RE below). The hooks.json matcher
+# widening for `gt create:*` / `gt modify:*` / `gt submit:*` ships in the
+# same change set — without that widening this regex change is dead code
+# because the hook subprocess never spawns for gt invocations. See #2048.
+_GIT_COMMIT_RE = re.compile(
+    # `git -C <path>` and `git -c key=val` global options are allowed between
+    # `git` and `commit` (mirrors the long-standing tolerance in
+    # _GIT_PUSH_RE). Without this, `git -C /repo commit` is silently dropped
+    # by the handler — see #2089's secondary finding. The gt branch has no
+    # global-option layer to worry about.
+    r'\bgit(?:\s+-[Cc]\s+\S+|\s+--\S+=\S+)*\s+commit\b'
+    r'|\bgt\s+(?:create|modify)\b'
+)
+# Match either the `--amend` flag (with the leading whitespace boundary
+# preserved from the original) OR `gt modify` which is semantically an
+# amend. The handler treats matches as "find the pre-amend SHA via reflog
+# and diff against THAT, not against the post-amend HEAD's parent" — same
+# code path for both git --amend and gt modify.
+_GIT_AMEND_RE = re.compile(r'(?:\s--amend\b|\bgt\s+modify\b)')

 # Rolling-window cap on LLM commit-review calls. See atomic_check_rate_limit
 # docstring for the rationale that motivated the switch from a lifetime cap.
@@ -624,8 +715,13 @@ COMMIT_REVIEW_RATE_WINDOW_S = int(
 # entry would buy minimal extra coverage (sessions that push only via gh) at
 # the cost of an extra python spawn on every `... && gh pr create` compound
 # (the common case). Those sessions are caught on their next standalone `git push`.
+# Matches `git push` (with optional `-c k=v` / `-C path` global options
+# CC's hooks.json matcher doesn't tolerate) OR `gt submit` — Graphite's
+# stacked-PR push command. gt submit forwards to `git push` internally,
+# but the bash hook fires on Claude's top-level command so we need to
+# recognize gt submit at the matcher level. See #2048.
 _GIT_PUSH_RE = re.compile(
-    r'\bgit(?:\s+-[cC]\s+\S+|\s+--\S+=\S+)*\s+push\b'
+    r'(?:\bgit(?:\s+-[cC]\s+\S+|\s+--\S+=\S+)*\s+push\b|\bgt\s+submit\b)'
 )

 # `git push` stdout: "abc1234..def5678  branch -> branch" (or `+abc..def` on
@@ -791,23 +887,30 @@ def _detect_prev_upstream(repo_root, bash_output):
    # @{u}@{1} — only meaningful if an upstream is configured.
    for ref in ("@{u}@{1}", "@{push}@{1}"):
        try:
+            # See #2099: stdout is a SHA but stderr can carry non-ASCII git
+            # warnings — keep bytes raw to avoid cp1252 reader-thread crash.
            r = subprocess.run(
                [*GIT_CMD, "rev-parse", "--verify", "-q", ref],
-                cwd=repo_root, capture_output=True, text=True, timeout=5,
+                cwd=repo_root, capture_output=True, timeout=5,
            )
-            if r.returncode == 0 and r.stdout.strip():
-                return r.stdout.strip()
+            sha = r.stdout.decode("utf-8", errors="replace").strip()
+            if r.returncode == 0 and sha:
+                return sha
        except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
            pass
    main = _detect_main_branch(repo_root)
    if main:
        try:
+            # See #2099: drop text=True; decode bytes manually so a
+            # cp1252-undefined byte in git's stderr doesn't crash the
+            # reader thread.
            r = subprocess.run(
                [*GIT_CMD, "merge-base", "HEAD", main],
-                cwd=repo_root, capture_output=True, text=True, timeout=5,
+                cwd=repo_root, capture_output=True, timeout=5,
            )
-            if r.returncode == 0 and r.stdout.strip():
-                return r.stdout.strip()
+            sha = r.stdout.decode("utf-8", errors="replace").strip()
+            if r.returncode == 0 and sha:
+                return sha
        except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
            pass
    return None
@@ -1118,11 +1221,16 @@ def handle_commit_review_posttooluse(input_data):
    resolved = 0
    for sha in shas:
        try:
+            # core.quotePath=false: emit raw UTF-8 in `diff --git a/... b/...`
+            # headers so non-ASCII paths aren't C-quoted past the downstream
+            # parse_diff_into_files regex (sibling of #2056 / #2075). See #2082.
+            # core.quotePath=false comes from GIT_CMD globally (see gitutil.py).
            if pre_amend_sha:
                # Delta review: pre-amend → post-amend. `git diff` (not show)
                # so the output is a pure unified diff with no commit header.
                result = subprocess.run(
-                    [*GIT_CMD, "diff", "--no-color", "--no-ext-diff", pre_amend_sha, sha, "--"],
+                    [*GIT_CMD, "diff", "--no-color", "--no-ext-diff",
+                     pre_amend_sha, sha, "--"],
                    cwd=repo_root, capture_output=True, timeout=15
                )
            else:
@@ -1254,12 +1362,13 @@ def handle_commit_review_posttooluse(input_data):
    try:
        full_shas = []
        for s in shas:
+            # See #2099: drop text=True; decode manually for cp1252 safety.
            r = subprocess.run(
                [*GIT_CMD, "rev-parse", "--verify", "-q", s],
-                cwd=repo_root, capture_output=True, text=True, timeout=5,
+                cwd=repo_root, capture_output=True, timeout=5,
            )
            if r.returncode == 0:
-                full_shas.append(r.stdout.strip())
+                full_shas.append(r.stdout.decode("utf-8", errors="replace").strip())
        _append_reviewed_shas(repo_root, full_shas, vulns_found=len(vulns or []))
    except Exception:
        pass
@@ -1361,18 +1470,26 @@ def handle_commit_review_posttooluse(input_data):
        if s in sev:
            sev[s] += 1

+    # Rebuild guidance from new_vulns only — concrete_guidance from the LLM
+    # still lists deduped entries. Pass via additional_context so CC surfaces
+    # the reason via hookSpecificOutput.additionalContext instead of empty
+    # stdout (#1783) / stderr-only "json output validation failed" (#1375).
+    _commit_guidance = (PROVENANCE_BANNER + "\n\n"
+                        + _format_vulns_guidance(new_vulns)
+                        + CONTINUATION_SUFFIX + "\n")
    emit_metrics({
        "vulns_found": len(new_vulns), **_base, **_agentic_m,
        "critical_count": sev["critical"], "high_count": sev["high"],
        "files_reviewed": len(diff_files), "review_ms": review_ms,
        **({"deduped": n_deduped} if n_deduped else {}),
-    }, rewake_summary=_format_vulns_summary(new_vulns, prefix="Commit security review found"))
+    }, rewake_summary=_format_vulns_summary(new_vulns, prefix="Commit security review found"),
+       additional_context=_commit_guidance,
+       hook_event_name="PostToolUse")

-    # Rebuild guidance from new_vulns only — concrete_guidance from the LLM
-    # still lists deduped entries.
-    sys.stderr.write(PROVENANCE_BANNER + "\n\n"
-                     + _format_vulns_guidance(new_vulns)
-                     + CONTINUATION_SUFFIX + "\n")
+    # exit(2) is preserved per the asyncRewake protocol — it's what CC
+    # uses as the "force fix" signal that triggers the rewakeMessage flow.
+    # The stderr.write was removed; additional_context above now carries
+    # the same text via the modern JSON channel. See #1358/#1375/#1783.
    sys.exit(2)

 def handle_push_sweep_posttooluse(input_data):
@@ -1453,9 +1570,10 @@ def handle_push_sweep_posttooluse(input_data):
    # both.
    head = None
    try:
+        # See #2099: drop text=True; decode manually for cp1252 safety.
        r = subprocess.run([*GIT_CMD, "rev-parse", "HEAD"], cwd=repo_root,
-                           capture_output=True, text=True, timeout=5)
-        head = r.stdout.strip() if r.returncode == 0 else None
+                           capture_output=True, timeout=5)
+        head = r.stdout.decode("utf-8", errors="replace").strip() if r.returncode == 0 else None
    except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
        pass
    push_section = _push_section(bash_output or "")
@@ -1485,14 +1603,15 @@ def handle_push_sweep_posttooluse(input_data):
        quiet_success = False
        if not (bash_output or "").strip() and not interrupted:
            try:
+                # See #2099: drop text=True; decode manually for cp1252 safety.
                r_cur = subprocess.run(
                    [*GIT_CMD, "rev-parse", "--verify", "-q", "@{u}"],
-                    cwd=repo_root, capture_output=True, text=True, timeout=5)
+                    cwd=repo_root, capture_output=True, timeout=5)
                r_prev = subprocess.run(
                    [*GIT_CMD, "rev-parse", "--verify", "-q", "@{u}@{1}"],
-                    cwd=repo_root, capture_output=True, text=True, timeout=5)
-                cur = r_cur.stdout.strip() if r_cur.returncode == 0 else ""
-                prev_u = r_prev.stdout.strip() if r_prev.returncode == 0 else ""
+                    cwd=repo_root, capture_output=True, timeout=5)
+                cur = r_cur.stdout.decode("utf-8", errors="replace").strip() if r_cur.returncode == 0 else ""
+                prev_u = r_prev.stdout.decode("utf-8", errors="replace").strip() if r_prev.returncode == 0 else ""
                quiet_success = bool(cur and prev_u and cur == head and prev_u != cur)
            except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
                pass
@@ -1506,11 +1625,12 @@ def handle_push_sweep_posttooluse(input_data):
        # reviewed-shas state.
        for local_ref in new_branch_matches:
            try:
+                # See #2099: drop text=True; decode manually for cp1252 safety.
                r = subprocess.run(
                    [*GIT_CMD, "rev-parse", "--verify", "-q", local_ref],
-                    cwd=repo_root, capture_output=True, text=True, timeout=5,
+                    cwd=repo_root, capture_output=True, timeout=5,
                )
-                local_sha = r.stdout.strip() if r.returncode == 0 else ""
+                local_sha = r.stdout.decode("utf-8", errors="replace").strip() if r.returncode == 0 else ""
            except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
                local_sha = ""
            if local_sha and local_sha != head:
@@ -1629,17 +1749,23 @@ def handle_push_sweep_posttooluse(input_data):
    # Metrics — keep within the 10-key cap; agentic sub-metrics are dropped
    # here in favour of the push-sweep funnel keys (telemetry can join on session_id
    # to the per-commit fires for agentic detail). rewake_summary must ride
-    # this line (CC reads only the first {-prefixed stdout line); it's a
-    # no-op when new_vulns is empty since we exit 0 below.
-    emit_metrics({
+    # this line (CC reads only the first {-prefixed stdout line); the emit
+    # is deferred to the two exit points below so the with-vulns path can
+    # also pass additional_context in the same JSON line (#1375/#1783) —
+    # the by-design "CC keeps only the first JSON line" constraint means
+    # we can't emit twice. Builds the shared metrics dict here; vulns path
+    # adds additional_context, no-vulns path emits as-is.
+    _push_metrics = {
        **_base, "pushed": len(push_range), "unreviewed": len(tail),
        "prefix_advanced": prefix_advanced, "vulns_found": len(new_vulns),
        "files_reviewed": len(diff_files), "review_ms": review_ms,
        **({"deduped": n_deduped} if n_deduped else {}),
-    }, rewake_summary=_format_vulns_summary(new_vulns, prefix="Push security review found"))
+    }
+    _push_rewake_summary = _format_vulns_summary(new_vulns, prefix="Push security review found")

    if not new_vulns:
        debug_log("Push sweep: no new findings")
+        emit_metrics(_push_metrics, rewake_summary=_push_rewake_summary)
        sys.exit(0)

    # First-push of a big branch can surface many findings at once across
@@ -1692,9 +1818,14 @@ def handle_push_sweep_posttooluse(input_data):
        guidance = _format_vulns_guidance(reported) or ""
    else:
        guidance = concrete_guidance or _format_vulns_guidance(reported) or ""
-    sys.stderr.write(
-        PROVENANCE_BANNER + "\n\n" + guidance + CONTINUATION_SUFFIX + "\n"
-    )
+    # Emit metrics + additional_context together — single JSON line is the
+    # contract CC's hook parser expects. exit(2) preserved as the asyncRewake
+    # "force fix" trigger (see comment near handle_commit_review_posttooluse).
+    # See #1358 / #1375 / #1783.
+    emit_metrics(_push_metrics, rewake_summary=_push_rewake_summary,
+                 additional_context=(PROVENANCE_BANNER + "\n\n"
+                                     + guidance + CONTINUATION_SUFFIX + "\n"),
+                 hook_event_name="PostToolUse")
    sys.exit(2)

 def handle_stop_hook(input_data):
@@ -1927,6 +2058,11 @@ def handle_stop_hook(input_data):
        # untracked_baseline_n is the signal for whether the UPS-time
        # untracked-snapshot capture actually ran.
        sweep_trimmed = {k: v for k, v in sweep.items() if k != "warn_unresolved_mask"}
+        # Pass guidance via additional_context so CC surfaces the findings via
+        # hookSpecificOutput.additionalContext instead of stderr-only (which
+        # was the cause of "json output validation failed" / empty-reason UI in
+        # #1375 / #1783). exit(2) preserved as the asyncRewake "force fix"
+        # signal — that's the documented mechanism. See #1358 / #1375 / #1783.
        emit_metrics({
            "vulns_found": len(vulns),
            "untracked_baseline_n": len(untracked_at_baseline),
@@ -1940,10 +2076,10 @@ def handle_stop_hook(input_data):
            **({"diff_truncated": llm._last_review_truncated_bytes}
               if llm._last_review_truncated_bytes else {}),
            **sweep_trimmed,
-        }, rewake_summary=_format_vulns_summary(vulns))
-
-        # Exit code 2 with stderr forces Claude to continue and fix
-        sys.stderr.write(PROVENANCE_BANNER + "\n\n" + concrete_guidance + CONTINUATION_SUFFIX + "\n")
+        }, rewake_summary=_format_vulns_summary(vulns),
+           additional_context=(PROVENANCE_BANNER + "\n\n"
+                               + concrete_guidance + CONTINUATION_SUFFIX + "\n"),
+           hook_event_name="Stop")
        sys.exit(2)

    if llm._last_call_claude_http_error is not None:
@@ -1971,10 +2107,7 @@ def handle_stop_hook(input_data):
    })
    sys.exit(0)

-_SDK_BOOTSTRAP_THROTTLE = os.path.join(
-    os.environ.get("SECURITY_WARNINGS_STATE_DIR")
-    or os.path.expanduser("~/.claude/security"),
-    ".sdk_bootstrap_spawned")
+_SDK_BOOTSTRAP_THROTTLE = os.path.join(_resolve_state_dir(), ".sdk_bootstrap_spawned")

 def _maybe_bootstrap_agent_sdk_async():
    """Fire-and-forget SDK bootstrap, for remote-pod environments.
--- a/plugins/security-guidance/hooks/session_state.py
+++ b/plugins/security-guidance/hooks/session_state.py
@@ -19,7 +19,7 @@ import os
 import re
 from datetime import datetime

-from _base import debug_log
+from _base import debug_log, state_dir as _state_dir


 def _state_key(session_id):
@@ -36,20 +36,20 @@ def _state_key(session_id):

 def get_state_file(session_id):
    """Get session-specific state file path."""
-    state_dir = os.environ.get("SECURITY_WARNINGS_STATE_DIR", os.path.expanduser("~/.claude/security"))
+    state_dir = _state_dir()
    return os.path.join(state_dir, f"security_warnings_state_{_state_key(session_id)}.json")


 def get_lock_file(session_id):
    """Get session-specific lock file path."""
-    state_dir = os.environ.get("SECURITY_WARNINGS_STATE_DIR", os.path.expanduser("~/.claude/security"))
+    state_dir = _state_dir()
    return os.path.join(state_dir, f"security_warnings_state_{_state_key(session_id)}.lock")


 def cleanup_old_state_files():
    """Remove state files and lock files older than 30 days."""
    try:
-        state_dir = os.environ.get("SECURITY_WARNINGS_STATE_DIR", os.path.expanduser("~/.claude/security"))
+        state_dir = _state_dir()
        if not os.path.exists(state_dir):
            return

--- a/plugins/security-guidance/hooks/sg-python.sh
+++ b/plugins/security-guidance/hooks/sg-python.sh
@@ -22,6 +22,17 @@
 #        "${CLAUDE_PLUGIN_ROOT}/hooks/security_reminder_hook.py"
 set -e

+# Force UTF-8 for ALL Python filesystem + IO operations (PEP 540).
+# Without this, Windows Python defaults `locale.getpreferredencoding()` to
+# cp1252 — which makes `text=True` in subprocess.run / open() / json.load
+# crash the internal reader thread on any byte that's undefined in cp1252
+# (e.g. the 0x81 byte from ف, present in any path/filename with
+# Arabic/Hebrew/CJK characters). See #2056, #2099.
+#
+# No-op on macOS/Linux (already UTF-8). Must be set BEFORE Python starts —
+# changing it from inside the interpreter has no effect.
+export PYTHONUTF8=1
+
 # Git Bash / MSYS on Windows hands script paths to this shim in POSIX form
 # (`/c/Users/...`). When we exec a Windows `python.exe` (which we do on
 # Windows since `python3` is the Microsoft Store stub), python interprets the