security-guidance: emit findings via hookSpecificOutput.additionalContext (#1358 #1375 #1783 )

Fixes #1358, #1375, and #1783 — three related complaints about the hook output protocol used at the three asyncRewake exit-2 sites (handle_commit_review_posttooluse, handle_push_sweep_posttooluse, handle_stop_hook). The old shape at each site was: emit_metrics({...}) # JSON to stdout (metrics) sys.stderr.write(banner + guidance + suffix) # plain text to stderr sys.exit(2) # asyncRewake trigger That triggered three reported problems: #1375: CC's hook system parsing stdout for a SyncHookJSONOutput sees only the bare metrics dict — no findings reason — and on older CC versions surfaces a 'json output validation failed' error because stderr's plain text isn't valid JSON. #1783: CC's UI shows 'Permission to use Edit has been denied' with no permissionDecisionReason — the stderr text is invisible to that UI surface; CC only renders fields it can find in the JSON. #1358: Reporters experienced the exit(2) as 'gating' behavior rather than 'warning' behavior. The pattern-warning path in main() was migrated to exit(0) + hookSpecificOutput.additionalContext long ago; these three asyncRewake sites were never updated. Fix: extend emit_metrics() to accept additional_context, system_message, and hook_event_name kwargs, and emit them in the same SyncHookJSONOutput line as the metrics. CC's parser stops scanning stdout after the first {-prefixed line, so the findings must ride in that same line — calling emit_metrics twice or adding a second print(json.dumps(...)) would silently drop the second emission. At each of the three call sites: route the guidance text that used to go to stderr through additional_context instead. The stderr.write is dropped — additionalContext carries the same text to the model via the JSON channel, and the legacy stderr surface is what triggered #1375's JSON validation error on older CC clients. exit(2) is preserved at all three sites. That's the documented mechanism for triggering the asyncRewake 'force fix' feedback loop (per the inline comment at the stop-hook site); switching to exit(0) without verifying CC's protocol-version support risks dropping the rewake entirely and silently losing all the findings the hook just computed. For push-sweep specifically: emit_metrics had to move from an unconditional pre-emission (line ~1680) to two conditional sites (one in the no-vulns branch with exit(0), one in the with-vulns branch with exit(2)) because the with-vulns branch needs to attach additional_context and CC reads only the first JSON line — a second emit would be ignored. Behavior is preserved: every push-sweep fire emits exactly one metrics line, just at a slightly later point in the function body. Verified locally on macOS Python 3.13: - py_compile clean. - Existing 45 smoke + extensibility tests still pass. - 21 new tests in test_hook_output_protocol.py (added to internal test suite at sg-staging/tests/, not in this PR): * 6 backward-compat: emit_metrics with metrics only, with rewake_summary, etc. — verifies the legacy callers still produce the same output shape. * 5 additional_context shape: lands in hookSpecificOutput, round-trips the value, default hook_event_name is sensible, empty/None doesn't pollute the JSON with an empty hSO block. * 3 system_message shape: lands in systemMessage, empty/None suppressed, round-trips. * 1 combined: metrics + rewake_summary + additional_context + system_message + hook_event_name all merge into one JSON line. * 6 round-trip safety: emoji, quotes, backslashes, newlines, Unicode (山田太郎 + 🎉), tabs, null bytes — all survive the json.dumps cycle. * 6 static-shape: each of the three asyncRewake handlers (commit_review, push_sweep, stop_hook) is checked to confirm it passes additional_context to emit_metrics and no longer writes the PROVENANCE_BANNER guidance to stderr. Catches the regression class where a new exit(2) site forgets to plumb guidance through the JSON channel. - 66/66 pass total (45 existing + 21 new) in 2.57s. NOT verified end-to-end with a real CC instance triggering all three hooks. The static-shape tests + the JSON round-trip tests should catch any regression in the emit_metrics output, but the actual interaction with CC's asyncRewake / rewakeMessage flow (especially: does hookSpecificOutput.additionalContext successfully appear in the rewakeMessage that CC sends to the model?) needs runtime verification against a CC version that supports the modern protocol. The reporter for #1375 specifically called out that CC's older versions surfaced 'json output validation failed' on the old stderr- only output; this fix changes the stdout shape to valid JSON with the findings included, which should resolve that error class. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merge pull request #2075 from anthropics/fix-2056-windows-unicode-decode
2026-06-10 02:03:34 +00:00 · 2026-05-28 23:53:04 -07:00 · 2026-05-28 23:36:36 -07:00 · 2026-05-28 23:18:17 -07:00 · 2026-05-28 23:15:16 -07:00 · 2026-05-28 23:07:53 -07:00
9 changed files with 526 additions and 66 deletions
--- a/.claude-plugin/marketplace.json
+++ b/.claude-plugin/marketplace.json
@@ -2135,22 +2135,6 @@
      },
      "homepage": "https://github.com/SAP/open-ux-tools/tree/main/packages/fiori-mcp-server"
    },
-    {
-      "name": "sap-hana-cli",
-      "description": "150+ SAP HANA database tools for AI assistants. Query tables, import/export data, profile data quality, compare schemas, manage backups, monitor performance, and more. Connects to SAP HANA Cloud and on-premise databases.",
-      "author": {
-        "name": "SAP SE",
-        "email": "ospo@sap.com",
-        "url": "https://www.sap.com"
-      },
-      "category": "database",
-      "source": {
-        "source": "url",
-        "url": "https://github.com/SAP-samples/hana-cli-claude-plugin.git",
-        "sha": "160ae47efaffea2e1dd9d6877ab9ec49b78542a0"
-      },
-      "homepage": "https://github.com/SAP-samples/hana-cli-claude-plugin"
-    },
    {
      "name": "sap-mdk-server",
      "description": "MCP server for SAP Mobile Development Kit (MDK). Build and modify MDK applications with AI assistance — schema lookups, action validation, rule editing, and project scaffolding.",
@@ -2601,6 +2585,20 @@
      },
      "homepage": "https://github.com/vercel/vercel-plugin"
    },
+    {
+      "name": "vibe-prospecting",
+      "description": "Vibe Prospecting connects Claude to live B2B company and contact data so users can search, match, enrich, filter, and export prospects at scale. It turns natural-language requests into structured GTM workflows for lead generation, CRM enrichment, company research, executive discovery, and multi-step prospecting automation inside Claude Cowork and Claude Code.",
+      "author": {
+        "name": "vibeprospecting.ai"
+      },
+      "category": "productivity",
+      "source": {
+        "source": "url",
+        "url": "https://github.com/explorium-ai/vibeprospecting-plugin.git",
+        "sha": "ada4d569dbf70194fe18750ecbc5170e9a3f120a"
+      },
+      "homepage": "https://www.vibeprospecting.ai/product/claude-plugin"
+    },
    {
      "name": "windsor-ai",
      "description": "Connect Claude Code to 325+ business data sources via Windsor.ai. Query marketing, sales, CRM, ecommerce, finance, and analytics data from Google Ads, Meta, HubSpot, Salesforce, Shopify, Stripe, and hundreds more — directly from your terminal.",
--- a/.github/workflows/scan-plugins.yml
+++ b/.github/workflows/scan-plugins.yml
@@ -381,3 +381,166 @@ jobs:
            echo "::error::Scan step failed without a parseable policy verdict (likely an infra error)."
            exit 1
          fi
+
+  # ─────────────────────────────────────────────────────────────────────────────
+  # emit-verdict: post a sticky comment per entry to the bump PR with the
+  # structured verdict, so downstream tooling (label automation, delist
+  # authoring) can read verdicts directly instead of scraping job logs.
+  # Sticky comment marker: `<!-- bump-pr-verdict:<name> -->`.
+  #
+  # Mirrors the schema_v1 contract from
+  # anthropics/claude-plugins-community-internal#3908 so the triage scripts
+  # in mcp-local-directory/scripts/triage/ work uniformly across both repos.
+  # -official doesn't run per-entry static checks (zombie, schema, binaries,
+  # etc.) so the `scan.*` axes are emitted as "skipped". The granular policy
+  # booleans (`has_broad_scope_hooks`, `has_undisclosed_telemetry`,
+  # `description_matches_behavior`) aren't surfaced by this workflow's
+  # per-entry artifact yet, so they're emitted as null; the triage
+  # `triage_bool_to_str` helper maps null → "?" so display is graceful.
+  # Status describes the execution state, not the outcome — `ran` when the
+  # scan action evaluated this SHA fresh, `cached` when a prior verdict was
+  # reused (cf. run-verdicts.json's `source` field). Outcome lives in
+  # `policy.passes`. policy-sweep.sh dispatches on this exact vocabulary.
+  #
+  # PR resolution: pull_request events carry the PR number directly. The
+  # bump workflow creates bump PRs via GITHUB_TOKEN (which doesn't fire
+  # pull_request triggers — recursion guard) and dispatches this scan via
+  # workflow_dispatch on the bump branch. In that case we look up the
+  # open PR by head ref. No PR (scan_all dispatch on main, etc.) → no-op.
+  #
+  # continue-on-error at the job level: emit failure must NOT block the
+  # `scan` required check. Consumers fall back to log-scraping if the
+  # comment is absent (gradual migration; no flag day).
+  # ─────────────────────────────────────────────────────────────────────────────
+  emit-verdict:
+    needs: [scan]
+    if: always() && needs.scan.result != 'skipped' && needs.scan.result != 'cancelled'
+    runs-on: ubuntu-latest
+    continue-on-error: true
+    permissions:
+      contents: read
+      pull-requests: write
+    steps:
+      - name: Download scan verdicts
+        uses: actions/download-artifact@v4
+        with:
+          name: scan-verdicts
+          path: /tmp/scan-verdicts
+        continue-on-error: true
+
+      - name: Resolve PR number for this ref
+        id: pr
+        env:
+          GH_TOKEN: ${{ github.token }}
+          EVENT_NAME: ${{ github.event_name }}
+          PR_FROM_EVENT: ${{ github.event.pull_request.number }}
+          REF: ${{ github.ref_name }}
+          REPO: ${{ github.repository }}
+        run: |
+          set -euo pipefail
+          if [[ "$EVENT_NAME" == "pull_request" && -n "$PR_FROM_EVENT" ]]; then
+            echo "number=$PR_FROM_EVENT" >> "$GITHUB_OUTPUT"
+            exit 0
+          fi
+          # workflow_dispatch on the bump branch: find the open PR for it.
+          # head filter takes the form owner:branch.
+          owner="${REPO%%/*}"
+          pr=$(gh api "/repos/${REPO}/pulls?state=open&head=${owner}:${REF}&per_page=1" \
+            --jq '.[0].number // ""')
+          if [[ -z "$pr" ]]; then
+            echo "::notice::No open PR for ref ${REF} — sticky comments skipped (verdicts still in scan-verdicts artifact)"
+          fi
+          echo "number=$pr" >> "$GITHUB_OUTPUT"
+
+      - name: Build and post sticky comments
+        if: steps.pr.outputs.number != ''
+        env:
+          GH_TOKEN: ${{ github.token }}
+          REPO: ${{ github.repository }}
+          PR: ${{ steps.pr.outputs.number }}
+          RUN_ID: ${{ github.run_id }}
+        run: |
+          set -euo pipefail
+
+          verdicts_path=/tmp/scan-verdicts/run-verdicts.json
+          # Missing/empty artifact: scan job ran but didn't produce verdicts
+          # (e.g. the relevance gate said "no changes"). Nothing to comment;
+          # exit clean.
+          if [[ ! -s "$verdicts_path" ]]; then
+            echo "::notice::No run-verdicts.json artifact — nothing to emit"
+            exit 0
+          fi
+          count=$(jq 'length' "$verdicts_path")
+          if [[ "$count" == "0" ]]; then
+            echo "::notice::run-verdicts.json is empty — nothing to emit"
+            exit 0
+          fi
+
+          ran_at=$(date -u +%Y-%m-%dT%H:%M:%SZ)
+
+          # scan.* axes: -official doesn't run per-entry static checks; emit
+          # "skipped" for each so the schema is shape-compatible with -internal.
+          scan_stub='{"clone":"skipped","subpath_missing":"skipped","schema":"skipped","zombie":"skipped","tool_allowlist":"skipped","binaries":"skipped","unique":"skipped","mcp":"skipped"}'
+
+          # Pre-fetch all PR comments once (paginated) for the marker lookup.
+          gh api --paginate "/repos/$REPO/issues/$PR/comments" \
+            --jq '.[] | {id, body}' > /tmp/comments.ndjson
+
+          jq -c '.[]' "$verdicts_path" | while read -r entry; do
+            name=$(jq -r '.name' <<< "$entry")
+            passes=$(jq -r '.passes' <<< "$entry")
+            summary=$(jq -r '.summary // ""' <<< "$entry")
+            violations=$(jq -r '.violations // ""' <<< "$entry")
+            source=$(jq -r '.source // "scan"' <<< "$entry")
+
+            # status = execution state (cf. -internal#3908 vocabulary).
+            # Outcome is in `passes`. Map source → status: scan-action-run
+            # → "ran"; cache-served → "cached". Anything else falls through
+            # as "ran" (only those two values appear in run-verdicts.json).
+            case "$source" in
+              cache) status="cached" ;;
+              scan)  status="ran" ;;
+              *)     status="ran" ;;
+            esac
+
+            policy=$(jq -n \
+              --argjson passes "$passes" \
+              --arg summary "$summary" \
+              --arg violations "$violations" \
+              --arg source "$source" \
+              --arg status "$status" \
+              '{passes: $passes,
+                has_broad_scope_hooks: null,
+                has_undisclosed_telemetry: null,
+                description_matches_behavior: null,
+                summary: $summary,
+                violations: $violations,
+                source: $source,
+                status: $status}')
+
+            verdict=$(jq -n \
+              --argjson scan "$scan_stub" \
+              --argjson policy "$policy" \
+              --arg ran_at "$ran_at" \
+              --arg run_id "$RUN_ID" \
+              '{schema_version: 1, ran_at: $ran_at, run_id: $run_id, scan: $scan, policy: $policy}')
+
+            marker="<!-- bump-pr-verdict:$name -->"
+            body=$(printf '%s\n```json\n%s\n```' "$marker" "$verdict")
+
+            # jq's first() short-circuits and avoids SIGPIPE under pipefail if
+            # duplicate markers exist (shouldn't, but a prior buggy run could
+            # double-post). -s slurps NDJSON; `// empty` yields no output when
+            # no match.
+            existing=$(jq -rs --arg m "$marker" \
+              'first(.[] | select(.body | startswith($m)) | .id) // empty' \
+              /tmp/comments.ndjson)
+
+            if [[ -n "$existing" ]]; then
+              gh api -X PATCH "/repos/$REPO/issues/comments/$existing" -f body="$body" >/dev/null
+              echo "Updated comment $existing for $name"
+            else
+              gh api -X POST "/repos/$REPO/issues/$PR/comments" -f body="$body" >/dev/null
+              echo "Created comment for $name"
+            fi
+          done
--- a/README.md
+++ b/README.md
@@ -42,6 +42,37 @@ plugin-name/
 └── README.md            # Documentation
 ```

+## Skill-bundle plugins
+
+When a plugin's source repository ships skills (`SKILL.md` files) without a `.claude-plugin/plugin.json` manifest, the marketplace entry can declare the skills directly using `strict: false` and an explicit `skills` array.
+
+```json
+{
+  "name": "example-bundle",
+  "description": "Brief description of the bundled skills.",
+  "author": { "name": "Author Name" },
+  "category": "development",
+  "source": {
+    "source": "git-subdir",
+    "url": "https://github.com/example-org/sdk.git",
+    "path": "packages/agent-skills",
+    "ref": "main",
+    "sha": "<commit sha>"
+  },
+  "strict": false,
+  "skills": [
+    "./skill-a",
+    "./skill-b",
+    "./skill-c"
+  ],
+  "homepage": "https://github.com/example-org/sdk"
+}
+```
+
+Each path in `skills` is relative to `source.path` and points at a directory containing a `SKILL.md`. Paths can reach deeper than a single level — for example, `["./libA/skill-1", "./libB/skill-2"]` exposes a curated subset across multiple library subdirectories. Each skill is registered as `<plugin-name>:<skill-name>` in Claude Code.
+
+For the underlying schema, see [Strict mode](https://code.claude.com/docs/en/plugin-marketplaces) in the marketplace documentation.
+
 ## License

 Please see each linked plugin for the relevant LICENSE file.
--- a/plugins/security-guidance/hooks/diffstate.py
+++ b/plugins/security-guidance/hooks/diffstate.py
@@ -138,7 +138,17 @@ def restore_unreviewed_stop_state(session_id, paths, baseline_sha):


 def get_baseline_file_content(session_id, file_path, cwd):
-    """Get the content of a file at the baseline SHA. Returns None if unavailable."""
+    """Get the content of a file at the baseline SHA. Returns None if unavailable.
+
+    Decode the file content as UTF-8 with errors="replace" rather than using
+    text=True: source files in user repos can be latin-1 / cp1252 / shift-jis
+    / etc., and on Windows text=True would decode via locale.getpreferredencoding()
+    in strict mode and raise UnicodeDecodeError in the subprocess reader
+    thread — leaving result.stdout=None and propagating AttributeError when
+    the caller tries to use it. Same class as the existing migrations at
+    security_reminder_hook.py:540 (reflog subjects) and :1115 (commit
+    diffs); this helper was missed in that pass. See
+    anthropics/claude-plugins-official#2056."""
    baseline_sha = load_baseline_sha(session_id)
    if not baseline_sha:
        return None
@@ -151,12 +161,12 @@ def get_baseline_file_content(session_id, file_path, cwd):
            return None
        result = subprocess.run(
            [*GIT_CMD, "show", f"{baseline_sha}:{rel_path}"],
-            cwd=cwd, capture_output=True, text=True, timeout=5
+            cwd=cwd, capture_output=True, timeout=5
        )
        if result.returncode == 0:
-            return result.stdout
+            return (result.stdout or b"").decode("utf-8", errors="replace")
        return None
-    except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
+    except (subprocess.TimeoutExpired, FileNotFoundError, OSError, ValueError):
        return None


@@ -173,11 +183,16 @@ def capture_git_baseline(cwd):
    and `compute_v2_review_set` subtracts that set so pre-existing untracked
    files are not reviewed as Claude-authored.
    """
+    # stdout is a SHA so text=True is safe on stdout, but a non-ASCII
+    # filename in `git stash create`'s STDERR warning (e.g. a worktree
+    # with `Ávila_report.txt` triggers a quotePath/locale warning) would
+    # trip the stderr reader thread on Windows cp1252. Decode both streams
+    # leniently for symmetry with _list_untracked. See #2056.
    try:
        # Check if HEAD exists (i.e., repo has at least one commit)
        head_check = subprocess.run(
            [*GIT_CMD, "rev-parse", "HEAD"],
-            cwd=cwd, capture_output=True, text=True, timeout=5
+            cwd=cwd, capture_output=True, timeout=5
        )
        if head_check.returncode != 0:
            # No commits yet — skip review rather than creating commits in the user's repo
@@ -186,20 +201,20 @@ def capture_git_baseline(cwd):

        result = subprocess.run(
            [*GIT_CMD, "stash", "create"],
-            cwd=cwd, capture_output=True, text=True, timeout=15
+            cwd=cwd, capture_output=True, timeout=15
        )
-        sha = result.stdout.strip()
+        sha = (result.stdout or b"").decode("utf-8", errors="replace").strip()
        if sha:
            return sha

        # Working tree is clean — stash create returns empty. Use HEAD.
        result = subprocess.run(
            [*GIT_CMD, "rev-parse", "HEAD"],
-            cwd=cwd, capture_output=True, text=True, timeout=5
+            cwd=cwd, capture_output=True, timeout=5
        )
-        sha = result.stdout.strip()
+        sha = (result.stdout or b"").decode("utf-8", errors="replace").strip()
        return sha if sha else None
-    except (subprocess.TimeoutExpired, FileNotFoundError, OSError) as e:
+    except (subprocess.TimeoutExpired, FileNotFoundError, OSError, ValueError) as e:
        debug_log(f"Failed to capture git baseline: {e}")
        return None

@@ -323,19 +338,35 @@ def _list_untracked(cwd):
    mtime is captured so an in-place edit during the turn is still reviewed.

    Uses ls-files (not status) for the UPS path: the index diff isn't needed,
-    and ls-files --others only walks the worktree against .gitignore."""
+    and ls-files --others only walks the worktree against .gitignore.
+
+    Decodes stdout/stderr as UTF-8 with errors="replace" instead of using
+    text=True. With core.quotePath=false git emits raw UTF-8 bytes for
+    non-ASCII filenames; text=True decodes via locale.getpreferredencoding()
+    in strict mode — on Windows that's cp1252 with several undefined bytes
+    (0x81/0x8D/0x8F/0x90/0x9D), all of which appear in UTF-8 encodings of
+    common accented capitals (Á Í Ï Ð Ý) and most CJK/emoji codepoints.
+    A non-ASCII filename in the worktree crashed the subprocess reader
+    thread, left r.stdout=None, and propagated AttributeError out of the
+    helper — silently losing the baseline snapshot every UserPromptSubmit.
+    See anthropics/claude-plugins-official#2056. The sibling helpers in
+    gitutil.py already follow the lenient pattern; this function and
+    capture_git_baseline / _git_name_only / _git_status_porcelain were
+    the holdouts."""
    try:
        repo = _git_toplevel(cwd) or cwd
        r = subprocess.run(
            [*GIT_CMD, "-c", "core.quotePath=false", "ls-files",
             "--others", "--exclude-standard", "-z"],
-            cwd=repo, capture_output=True, text=True, timeout=15,
+            cwd=repo, capture_output=True, timeout=15,
        )
        if r.returncode != 0:
-            debug_log(f"_list_untracked rc={r.returncode}: {r.stderr[:200]}")
+            stderr_str = (r.stderr or b"").decode("utf-8", errors="replace")
+            debug_log(f"_list_untracked rc={r.returncode}: {stderr_str[:200]}")
            return {}
+        stdout = (r.stdout or b"").decode("utf-8", errors="replace")
        out = {}
-        for p in r.stdout.split("\0"):
+        for p in stdout.split("\0"):
            if not p:
                continue
            try:
@@ -346,7 +377,9 @@ def _list_untracked(cwd):
                debug_log(f"_list_untracked: capped at {UNTRACKED_BASELINE_CAP}")
                break
        return out
-    except (subprocess.TimeoutExpired, FileNotFoundError, OSError) as e:
+    except (subprocess.TimeoutExpired, FileNotFoundError, OSError, ValueError) as e:
+        # ValueError guards against any future strict-decode regression
+        # so the helper degrades to {} instead of crashing the hook.
        debug_log(f"_list_untracked error: {e}")
        return {}

--- a/plugins/security-guidance/hooks/ensure_agent_sdk.py
+++ b/plugins/security-guidance/hooks/ensure_agent_sdk.py
@@ -32,6 +32,8 @@ BUILD_FAILED = 3     # venv create or pip install raised/timed out
 # llm.py also matches Windows venv layout (Lib/site-packages). Don't reuse the
 # value — telemetry rows from older plugin builds still emit 4.
 SKIP_SENTINEL = 5    # another SessionStart is currently building
+HOOK_PY_INCOMPATIBLE = 6  # hook interpreter is <3.10 — SDK syntax can't load
+                          # here no matter how the venv was built. See #2071.


 def _sdk_on_syspath() -> bool:
@@ -62,6 +64,29 @@ def main() -> tuple[int, str, str]:
    err_phase / err_kind are non-empty only on BUILD_FAILED — they let
    telemetry split bootstrap failures by root cause.
    """
+    # Honesty check (fixes the misleading NOOP_VENV in #2071): the SDK
+    # requires Python >=3.10 and uses 3.10+ syntax (match statements,
+    # PEP 604 unions). On a 3.9 hook interpreter we CANNOT import it no
+    # matter how the venv was built — llm.py runs in this same interpreter
+    # and the syntax-level import will SyntaxError. macOS ships 3.9.6 as
+    # the default `python3` and `/usr/bin` precedes Homebrew in PATH, so
+    # this case is the default state for a large share of macOS users.
+    #
+    # sg-python.sh now prefers python3.10+ binaries so most users won't
+    # reach this branch; the fallback to 3.9 is preserved for the
+    # pattern-warning hooks that don't need the SDK. Reporting
+    # HOOK_PY_INCOMPATIBLE here:
+    #   (a) avoids 30-60s of wasted pip install,
+    #   (b) avoids the lie where the venv_py probe says NOOP_VENV but the
+    #       consumer import fails, and
+    #   (c) gives telemetry a clean bucket to size the affected fleet.
+    if sys.version_info < (3, 10):
+        return (
+            HOOK_PY_INCOMPATIBLE,
+            "hook_py",
+            f"py_{sys.version_info[0]}.{sys.version_info[1]}",
+        )
+
    if _sdk_on_syspath():
        return NOOP_SYSTEM, "", ""

@@ -195,6 +220,56 @@ def main() -> tuple[int, str, str]:
            sentinel.unlink(missing_ok=True)


+def _maybe_emit_user_notice(outcome: int, pv: int) -> str | None:
+    """Return a one-time user-visible notice when the agentic reviewer is
+    in a persistent broken state on this machine, or None if we've already
+    shown the notice for this plugin version (or shouldn't show one).
+
+    The marker file is plugin-version-keyed: a future plugin update can
+    re-notify if behavior changes (e.g. we ship out-of-process SDK in v3
+    and want to tell affected users it's fixed). Failures to write the
+    marker degrade to "skip the notice this session" so we don't spam
+    every SessionStart on a read-only home dir.
+
+    Currently only HOOK_PY_INCOMPATIBLE qualifies. BUILD_FAILED is
+    intentionally excluded — it covers transient causes (network failure,
+    pip registry hiccup, in-flight rebuild) where the next session may
+    succeed and a permanent notice would mislead.
+    """
+    if outcome != HOOK_PY_INCOMPATIBLE:
+        return None
+    try:
+        state_dir = Path(
+            os.environ.get("SECURITY_WARNINGS_STATE_DIR")
+            or os.path.expanduser("~/.claude/security")
+        )
+        marker = state_dir / f".agentic_unavailable_notice_v{pv or 0}"
+        if marker.exists():
+            return None
+        state_dir.mkdir(parents=True, exist_ok=True)
+        # Write timestamp + Python version so the marker is self-documenting
+        # if a user goes looking. O_EXCL would be racier with no real win
+        # (two concurrent SessionStarts both showing the notice once is fine).
+        marker.write_text(
+            f"{time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime())} "
+            f"py={sys.version_info[0]}.{sys.version_info[1]}\n"
+        )
+    except OSError:
+        return None
+    return (
+        f"⚠ security-guidance plugin: the cross-file commit reviewer "
+        f"(layer 3 of 3 — catches IDOR, auth-bypass, cross-file SSRF) "
+        f"is unavailable in this environment. It requires Python ≥3.10, "
+        f"but the hook is running on "
+        f"{sys.version_info[0]}.{sys.version_info[1]}.\n\n"
+        f"Pattern checks and the single-shot LLM diff review are still "
+        f"active. To enable the deeper reviewer, install Python 3.10+ "
+        f"(e.g. `brew install python` on macOS) and restart Claude Code.\n\n"
+        f"This notice is shown once per plugin version. "
+        f"See: github.com/anthropics/claude-plugins-official/issues/2071"
+    )
+
+
 if __name__ == "__main__":
    # Tell the harness this is async — venv create + pip install can take
    # 30-60s on a cold cache, well past the default sync hook timeout.
@@ -231,4 +306,18 @@ if __name__ == "__main__":
    pv = _plugin_version_int()
    if pv:
        metrics["pv"] = pv
-    print(json.dumps({"metrics": metrics}), flush=True)
+    response: dict[str, object] = {"metrics": metrics}
+    # One-time user-visible notice when the agentic reviewer is dead on
+    # arrival. Uses hookSpecificOutput.additionalContext (SessionStart's
+    # supported channel for surfacing text to both the model and the user)
+    # plus systemMessage as a belt-and-suspenders. Marker-file-gated so
+    # this fires exactly once per plugin version per install — see
+    # _maybe_emit_user_notice.
+    notice = _maybe_emit_user_notice(outcome, pv)
+    if notice:
+        response["hookSpecificOutput"] = {
+            "hookEventName": "SessionStart",
+            "additionalContext": notice,
+        }
+        response["systemMessage"] = notice
+    print(json.dumps(response), flush=True)
--- a/plugins/security-guidance/hooks/gitutil.py
+++ b/plugins/security-guidance/hooks/gitutil.py
@@ -259,19 +259,29 @@ def _git_reflog_recent_commits(repo_root, max_age_s=120, max_n=5):
        # %gs (the reflog subject) is `commit: <commit-msg first line>` and can
        # contain `|`; put it LAST so split("|", 2) leaves it intact. %H is
        # hex and %ct is integer, so the first two fields are delimiter-safe.
+        #
+        # Bytes + decode utf-8/replace: %gs embeds commit-message subjects
+        # which git stores as raw bytes — commits can be authored in
+        # latin-1 / cp1252 / shift-jis etc., and text=True would raise
+        # UnicodeDecodeError in the subprocess reader thread on Windows
+        # cp1252 (subprocess.run returns r.stdout=None, then
+        # r.stdout.splitlines() AttributeErrors). Mirrors the existing
+        # migration at security_reminder_hook.py:540 — same pattern was
+        # missed here. See anthropics/claude-plugins-official#2056.
        r = subprocess.run(
            [*GIT_CMD, "log", "-g", "-n", str(max_n),
             "--format=%H|%ct|%gs", "HEAD"],
-            cwd=repo_root, capture_output=True, text=True, timeout=5,
+            cwd=repo_root, capture_output=True, timeout=5,
        )
-    except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
+    except (subprocess.TimeoutExpired, FileNotFoundError, OSError, ValueError):
        return [], 0
    if r.returncode != 0:
        return [], 0
+    stdout = (r.stdout or b"").decode("utf-8", errors="replace")
    import time as _time
    now = int(_time.time())
    fresh, stale = [], 0
-    for idx, line in enumerate(r.stdout.splitlines()):
+    for idx, line in enumerate(stdout.splitlines()):
        parts = line.split("|", 2)
        if len(parts) != 3:
            continue
@@ -306,23 +316,31 @@ def _git_name_only(cwd, base, include_untracked=False):
    must distinguish None (error → don't trust as a filter) from set()
    (genuinely nothing changed). `-c core.quotePath=false -z` keeps non-ASCII
    and space-containing paths intact."""
+    # Decode stdout/stderr as UTF-8 with errors="replace" instead of using
+    # text=True. core.quotePath=false makes git emit raw UTF-8 for non-ASCII
+    # paths, and text=True on Windows decodes via cp1252 strict — a non-ASCII
+    # changed path would crash the subprocess reader thread, leave
+    # result.stdout=None, and propagate AttributeError out of the helper.
+    # Same fix shape as diffstate._list_untracked. See #2056.
    def _run(env):
        result = subprocess.run(
            [*GIT_CMD, "-c", "core.quotePath=false", "diff", "--name-only", "-z", base],
-            cwd=cwd, capture_output=True, text=True, timeout=30,
+            cwd=cwd, capture_output=True, timeout=30,
            env=env,
        )
        if result.returncode != 0:
-            debug_log(f"_git_name_only({base!r}) rc={result.returncode}: {result.stderr[:200]}")
+            stderr_str = (result.stderr or b"").decode("utf-8", errors="replace")
+            debug_log(f"_git_name_only({base!r}) rc={result.returncode}: {stderr_str[:200]}")
            return None
-        return {p for p in result.stdout.split("\0") if p}
+        stdout = (result.stdout or b"").decode("utf-8", errors="replace")
+        return {p for p in stdout.split("\0") if p}

    try:
        if not include_untracked:
            return _run(None)
        with _temp_index(cwd) as env:
            return _run(env)
-    except (subprocess.TimeoutExpired, FileNotFoundError, OSError) as e:
+    except (subprocess.TimeoutExpired, FileNotFoundError, OSError, ValueError) as e:
        debug_log(f"_git_name_only({base!r}) error: {e}")
        return None

@@ -339,17 +357,22 @@ def _git_status_porcelain(cwd):
    collapses to `dir/`). Required so the untracked set subtracts cleanly
    against the UPS-time `_list_untracked` snapshot, which uses ls-files and
    therefore always lists individual files."""
+    # Lenient decode: same UTF-8 + errors="replace" pattern as the
+    # sibling helpers — a non-ASCII path in the worktree would otherwise
+    # crash the cp1252 reader thread on Windows. See #2056.
    try:
        r = subprocess.run(
            [*GIT_CMD, "-c", "core.quotePath=false", "status",
             "--porcelain=v1", "-uall", "-z"],
-            cwd=cwd, capture_output=True, text=True, timeout=30,
+            cwd=cwd, capture_output=True, timeout=30,
        )
        if r.returncode != 0:
-            debug_log(f"_git_status_porcelain rc={r.returncode}: {r.stderr[:200]}")
+            stderr_str = (r.stderr or b"").decode("utf-8", errors="replace")
+            debug_log(f"_git_status_porcelain rc={r.returncode}: {stderr_str[:200]}")
            return None, None
        tracked, untracked = set(), set()
-        entries = r.stdout.split("\0")
+        stdout = (r.stdout or b"").decode("utf-8", errors="replace")
+        entries = stdout.split("\0")
        i = 0
        while i < len(entries):
            e = entries[i]
@@ -368,7 +391,9 @@ def _git_status_porcelain(cwd):
                    i += 1
            i += 1
        return tracked, untracked
-    except (subprocess.TimeoutExpired, FileNotFoundError, OSError) as e:
+    except (subprocess.TimeoutExpired, FileNotFoundError, OSError, ValueError) as e:
+        # ValueError guards against any future strict-decode regression
+        # so the helper degrades to (None, None) instead of crashing.
        debug_log(f"_git_status_porcelain error: {e}")
        return None, None

--- a/plugins/security-guidance/hooks/patterns.py
+++ b/plugins/security-guidance/hooks/patterns.py
@@ -94,6 +94,9 @@ Only use exec() if you absolutely need shell features and the input is guarantee
    },
    {
        "ruleName": "new_function_injection",
+        # JS-only construct: gate to JS/TS files so docs/.md and other prose
+        # mentioning "new Function" don't trip the warning.
+        "path_filter": lambda p: p.endswith(_JS_EXTS),
        "substrings": ["new Function"],
        "reminder": "\u26a0\ufe0f Security Warning: Using new Function() with string interpolation is a CODE INJECTION vulnerability. If any variable is concatenated or interpolated into the function body string, an attacker controlling that variable can execute arbitrary code. Use safe alternatives: for property access use obj[key] or array.reduce((o, k) => o[k], root); for computation use a safe expression parser. NEVER interpolate untrusted strings into new Function() bodies.",
    },
@@ -107,16 +110,24 @@ Only use exec() if you absolutely need shell features and the input is guarantee
    },
    {
        "ruleName": "react_dangerously_set_html",
+        # JS/TS-only (React); gate so .md docs / .py / .go files don't trip.
+        "path_filter": lambda p: p.endswith(_JS_EXTS),
        "substrings": ["dangerouslySetInnerHTML"],
        "reminder": "⚠️ Security Warning: dangerouslySetInnerHTML can lead to XSS vulnerabilities if used with untrusted content. Ensure all content is properly sanitized using an HTML sanitizer library like DOMPurify, or use safe alternatives.",
    },
    {
        "ruleName": "document_write_xss",
+        # Browser DOM API: only meaningful in JS/TS source.
+        "path_filter": lambda p: p.endswith(_JS_EXTS),
        "substrings": ["document.write"],
        "reminder": "⚠️ Security Warning: document.write() can be exploited for XSS attacks and has performance issues. Use DOM manipulation methods like createElement() and appendChild() instead.",
    },
    {
        "ruleName": "innerHTML_xss",
+        # Browser DOM API: only meaningful in JS/TS source. Closes FPs like
+        # docs/example HTML, playground/self-contained skills that hardcode
+        # innerHTML strings with zero user input (#410).
+        "path_filter": lambda p: p.endswith(_JS_EXTS),
        "substrings": [".innerHTML =", ".innerHTML="],
        "reminder": "⚠️ Security Warning: Setting innerHTML with untrusted content can lead to XSS vulnerabilities. Use textContent for plain text or safe DOM methods for HTML content. If you need HTML support, consider using an HTML sanitizer library such as DOMPurify.",
    },
@@ -217,11 +228,15 @@ Additionally, validate user inputs:
    },
    {
        "ruleName": "outerHTML_xss",
+        # Browser DOM API: only meaningful in JS/TS source.
+        "path_filter": lambda p: p.endswith(_JS_EXTS),
        "substrings": [".outerHTML =", ".outerHTML="],
        "reminder": "⚠️ Security Warning: Use textContent or sanitize with DOMPurify. outerHTML assignment is an XSS sink equivalent to innerHTML.",
    },
    {
        "ruleName": "insertAdjacentHTML_xss",
+        # Browser DOM API: only meaningful in JS/TS source.
+        "path_filter": lambda p: p.endswith(_JS_EXTS),
        "substrings": [".insertAdjacentHTML("],
        "reminder": "⚠️ Security Warning: Use insertAdjacentText() or sanitize with DOMPurify. insertAdjacentHTML is an XSS sink.",
    },
--- a/plugins/security-guidance/hooks/security_reminder_hook.py
+++ b/plugins/security-guidance/hooks/security_reminder_hook.py
@@ -190,7 +190,13 @@ CONTINUATION_SUFFIX = (
    "response."
 )

-def emit_metrics(metrics, rewake_summary=None):
+def emit_metrics(
+    metrics,
+    rewake_summary=None,
+    additional_context=None,
+    system_message=None,
+    hook_event_name="PostToolUse",
+):
    """
    Write a SyncHookJSONOutput line to stdout for Claude Code to pick up.
    For asyncRewake (Stop) hooks, CC scans stdout for the first {-prefixed line
@@ -213,6 +219,27 @@ def emit_metrics(metrics, rewake_summary=None):
    rewakeSummary in hooks.json, shown to the user in the terminal as the
    task-notification one-liner. Must be in the same JSON line as the metrics
    because CC stops scanning stdout after the first {-prefixed line.
+
+    `additional_context` (asyncRewake findings): model-visible guidance text
+    that CC surfaces via the modern hook-output protocol
+    (hookSpecificOutput.additionalContext) instead of the legacy stderr +
+    exit(2) pair. The caller passes the finding-explanation text it would
+    have written to stderr; the JSON channel carries it cleanly so CC's UI
+    shows the reason properly instead of "Permission denied with no reason".
+    See anthropics/claude-plugins-official#1375 and #1783. Empty/None
+    means no hookSpecificOutput field is emitted (preserves backward compat
+    for legacy emit-sites that only want metrics).
+
+    `system_message` (optional, asyncRewake only): user-visible TUI message,
+    distinct from rewakeSummary which is the task-notification one-liner.
+    Use sparingly — the rewakeMessage in hooks.json is the primary user
+    surface; systemMessage adds a per-fire override when the static
+    rewakeMessage isn't specific enough for the finding being shown.
+
+    `hook_event_name` (used only when additional_context is set): which event
+    the hookSpecificOutput attaches to. Defaults to "PostToolUse" since the
+    commit-review and push-sweep handlers are the most common callers;
+    handle_stop_hook explicitly passes "Stop".
    """
    head = {}
    if _PV and "pv" not in metrics:
@@ -223,6 +250,17 @@ def emit_metrics(metrics, rewake_summary=None):
    out = {"metrics": metrics}
    if rewake_summary:
        out["rewakeSummary"] = rewake_summary
+    if additional_context:
+        # Wrap in hookSpecificOutput per CC's modern hook-output contract.
+        # Drops the legacy `sys.stderr.write(...) + sys.exit(2)` shape that
+        # left CC's UI showing "denied with no reason" (#1783) and triggered
+        # "json output validation failed" on older CC versions (#1375).
+        out["hookSpecificOutput"] = {
+            "hookEventName": hook_event_name,
+            "additionalContext": additional_context,
+        }
+    if system_message:
+        out["systemMessage"] = system_message
    print(json.dumps(out), flush=True)

 # =====================================================================
@@ -1361,18 +1399,26 @@ def handle_commit_review_posttooluse(input_data):
        if s in sev:
            sev[s] += 1

+    # Rebuild guidance from new_vulns only — concrete_guidance from the LLM
+    # still lists deduped entries. Pass via additional_context so CC surfaces
+    # the reason via hookSpecificOutput.additionalContext instead of empty
+    # stdout (#1783) / stderr-only "json output validation failed" (#1375).
+    _commit_guidance = (PROVENANCE_BANNER + "\n\n"
+                        + _format_vulns_guidance(new_vulns)
+                        + CONTINUATION_SUFFIX + "\n")
    emit_metrics({
        "vulns_found": len(new_vulns), **_base, **_agentic_m,
        "critical_count": sev["critical"], "high_count": sev["high"],
        "files_reviewed": len(diff_files), "review_ms": review_ms,
        **({"deduped": n_deduped} if n_deduped else {}),
-    }, rewake_summary=_format_vulns_summary(new_vulns, prefix="Commit security review found"))
+    }, rewake_summary=_format_vulns_summary(new_vulns, prefix="Commit security review found"),
+       additional_context=_commit_guidance,
+       hook_event_name="PostToolUse")

-    # Rebuild guidance from new_vulns only — concrete_guidance from the LLM
-    # still lists deduped entries.
-    sys.stderr.write(PROVENANCE_BANNER + "\n\n"
-                     + _format_vulns_guidance(new_vulns)
-                     + CONTINUATION_SUFFIX + "\n")
+    # exit(2) is preserved per the asyncRewake protocol — it's what CC
+    # uses as the "force fix" signal that triggers the rewakeMessage flow.
+    # The stderr.write was removed; additional_context above now carries
+    # the same text via the modern JSON channel. See #1358/#1375/#1783.
    sys.exit(2)

 def handle_push_sweep_posttooluse(input_data):
@@ -1629,17 +1675,23 @@ def handle_push_sweep_posttooluse(input_data):
    # Metrics — keep within the 10-key cap; agentic sub-metrics are dropped
    # here in favour of the push-sweep funnel keys (telemetry can join on session_id
    # to the per-commit fires for agentic detail). rewake_summary must ride
-    # this line (CC reads only the first {-prefixed stdout line); it's a
-    # no-op when new_vulns is empty since we exit 0 below.
-    emit_metrics({
+    # this line (CC reads only the first {-prefixed stdout line); the emit
+    # is deferred to the two exit points below so the with-vulns path can
+    # also pass additional_context in the same JSON line (#1375/#1783) —
+    # the by-design "CC keeps only the first JSON line" constraint means
+    # we can't emit twice. Builds the shared metrics dict here; vulns path
+    # adds additional_context, no-vulns path emits as-is.
+    _push_metrics = {
        **_base, "pushed": len(push_range), "unreviewed": len(tail),
        "prefix_advanced": prefix_advanced, "vulns_found": len(new_vulns),
        "files_reviewed": len(diff_files), "review_ms": review_ms,
        **({"deduped": n_deduped} if n_deduped else {}),
-    }, rewake_summary=_format_vulns_summary(new_vulns, prefix="Push security review found"))
+    }
+    _push_rewake_summary = _format_vulns_summary(new_vulns, prefix="Push security review found")

    if not new_vulns:
        debug_log("Push sweep: no new findings")
+        emit_metrics(_push_metrics, rewake_summary=_push_rewake_summary)
        sys.exit(0)

    # First-push of a big branch can surface many findings at once across
@@ -1692,9 +1744,14 @@ def handle_push_sweep_posttooluse(input_data):
        guidance = _format_vulns_guidance(reported) or ""
    else:
        guidance = concrete_guidance or _format_vulns_guidance(reported) or ""
-    sys.stderr.write(
-        PROVENANCE_BANNER + "\n\n" + guidance + CONTINUATION_SUFFIX + "\n"
-    )
+    # Emit metrics + additional_context together — single JSON line is the
+    # contract CC's hook parser expects. exit(2) preserved as the asyncRewake
+    # "force fix" trigger (see comment near handle_commit_review_posttooluse).
+    # See #1358 / #1375 / #1783.
+    emit_metrics(_push_metrics, rewake_summary=_push_rewake_summary,
+                 additional_context=(PROVENANCE_BANNER + "\n\n"
+                                     + guidance + CONTINUATION_SUFFIX + "\n"),
+                 hook_event_name="PostToolUse")
    sys.exit(2)

 def handle_stop_hook(input_data):
@@ -1927,6 +1984,11 @@ def handle_stop_hook(input_data):
        # untracked_baseline_n is the signal for whether the UPS-time
        # untracked-snapshot capture actually ran.
        sweep_trimmed = {k: v for k, v in sweep.items() if k != "warn_unresolved_mask"}
+        # Pass guidance via additional_context so CC surfaces the findings via
+        # hookSpecificOutput.additionalContext instead of stderr-only (which
+        # was the cause of "json output validation failed" / empty-reason UI in
+        # #1375 / #1783). exit(2) preserved as the asyncRewake "force fix"
+        # signal — that's the documented mechanism. See #1358 / #1375 / #1783.
        emit_metrics({
            "vulns_found": len(vulns),
            "untracked_baseline_n": len(untracked_at_baseline),
@@ -1940,10 +2002,10 @@ def handle_stop_hook(input_data):
            **({"diff_truncated": llm._last_review_truncated_bytes}
               if llm._last_review_truncated_bytes else {}),
            **sweep_trimmed,
-        }, rewake_summary=_format_vulns_summary(vulns))
-
-        # Exit code 2 with stderr forces Claude to continue and fix
-        sys.stderr.write(PROVENANCE_BANNER + "\n\n" + concrete_guidance + CONTINUATION_SUFFIX + "\n")
+        }, rewake_summary=_format_vulns_summary(vulns),
+           additional_context=(PROVENANCE_BANNER + "\n\n"
+                               + concrete_guidance + CONTINUATION_SUFFIX + "\n"),
+           hook_event_name="Stop")
        sys.exit(2)

    if llm._last_call_claude_http_error is not None:
--- a/plugins/security-guidance/hooks/sg-python.sh
+++ b/plugins/security-guidance/hooks/sg-python.sh
@@ -47,21 +47,65 @@ fi

 probe() {
    # $1..N: the interpreter command (may be multi-word like `py -3`)
-    # Probe writes the major version to stdout and exits 0 iff it's >=3.
-    "$@" -c 'import sys; print(sys.version_info[0])' 2>/dev/null
+    # Writes "<major>.<minor>" to stdout and exits 0 iff at least Python 3.
+    "$@" -c 'import sys; print(f"{sys.version_info[0]}.{sys.version_info[1]}")' 2>/dev/null
 }

+# True iff arg is a "M.m" version string >= 3.10. claude_agent_sdk requires
+# Python >= 3.10; below that, pip install fails ("No matching distribution")
+# and the LLM-powered review (Stop / commit / push) silently no-ops while
+# pattern checks (PostToolUse regex) keep working. macOS ships 3.9.6 as the
+# default `python3` on current versions, so this guard matters in practice.
+# See anthropics/claude-plugins-official#2071.
+is_sdk_compatible() {
+    case "$1" in
+        3.1[0-9]|3.[2-9][0-9]|[4-9].*|[1-9][0-9].*) return 0 ;;
+        *) return 1 ;;
+    esac
+}
+
+# Pass 1 — try minor-versioned binaries in descending order. These are only
+# present if the user explicitly installed them (Homebrew / python.org / pyenv),
+# so picking one here always upgrades over the system `python3`. Highest
+# available wins; the user doesn't have to PATH-prefer it.
+for cmd in "python3.13" "python3.12" "python3.11" "python3.10"; do
+    v=$(probe "$cmd") || continue
+    if is_sdk_compatible "$v"; then
+        exec "$cmd" "$@"
+    fi
+done
+
+# Pass 2 — bare interpreters, but only if SDK-compatible. Covers Linux distros
+# that ship 3.10+ as the default `python3`, and Windows where `python` /
+# `py -3` resolves to the user's python.org install.
 for cmd in "python3" "python" "py -3"; do
-    # Word-split intentionally so `py -3` works
    # shellcheck disable=SC2086
    v=$(probe $cmd) || continue
-    if [ "$v" = "3" ]; then
+    if is_sdk_compatible "$v"; then
        # shellcheck disable=SC2086
        exec $cmd "$@"
    fi
 done

+# Pass 3 — fallback to any Python 3, even <3.10. Pattern-based checks
+# (PostToolUse regex on Edit/Write) only need 3.6+ and are useful on their
+# own; the SDK-dependent paths will detect the version mismatch and degrade
+# inside the Python code. Without this fallback, the entire plugin would
+# stop working on default macOS, which is a regression vs today.
+for cmd in "python3" "python" "py -3"; do
+    # shellcheck disable=SC2086
+    v=$(probe $cmd) || continue
+    # Accept anything that successfully reported a "M.m" string.
+    case "$v" in
+        [0-9]*.[0-9]*)
+            # shellcheck disable=SC2086
+            exec $cmd "$@"
+            ;;
+    esac
+done
+
 echo "security-guidance: no working Python 3 interpreter found." >&2
-echo "  tried: python3, python, py -3" >&2
+echo "  tried: python3.13, python3.12, python3.11, python3.10, python3, python, py -3" >&2
 echo "  on Windows, install Python from https://python.org (NOT the Microsoft Store)" >&2
+echo "  on macOS, install Python 3.10+ via Homebrew (\`brew install python\`)" >&2
 exit 1
Author	SHA1	Message	Date
Mohamed Hegazy	37ffc76005	security-guidance: emit findings via hookSpecificOutput.additionalContext (#1358 #1375 #1783 ) Fixes #1358, #1375, and #1783 — three related complaints about the hook output protocol used at the three asyncRewake exit-2 sites (handle_commit_review_posttooluse, handle_push_sweep_posttooluse, handle_stop_hook). The old shape at each site was: emit_metrics({...}) # JSON to stdout (metrics) sys.stderr.write(banner + guidance + suffix) # plain text to stderr sys.exit(2) # asyncRewake trigger That triggered three reported problems: #1375: CC's hook system parsing stdout for a SyncHookJSONOutput sees only the bare metrics dict — no findings reason — and on older CC versions surfaces a 'json output validation failed' error because stderr's plain text isn't valid JSON. #1783: CC's UI shows 'Permission to use Edit has been denied' with no permissionDecisionReason — the stderr text is invisible to that UI surface; CC only renders fields it can find in the JSON. #1358: Reporters experienced the exit(2) as 'gating' behavior rather than 'warning' behavior. The pattern-warning path in main() was migrated to exit(0) + hookSpecificOutput.additionalContext long ago; these three asyncRewake sites were never updated. Fix: extend emit_metrics() to accept additional_context, system_message, and hook_event_name kwargs, and emit them in the same SyncHookJSONOutput line as the metrics. CC's parser stops scanning stdout after the first {-prefixed line, so the findings must ride in that same line — calling emit_metrics twice or adding a second print(json.dumps(...)) would silently drop the second emission. At each of the three call sites: route the guidance text that used to go to stderr through additional_context instead. The stderr.write is dropped — additionalContext carries the same text to the model via the JSON channel, and the legacy stderr surface is what triggered #1375's JSON validation error on older CC clients. exit(2) is preserved at all three sites. That's the documented mechanism for triggering the asyncRewake 'force fix' feedback loop (per the inline comment at the stop-hook site); switching to exit(0) without verifying CC's protocol-version support risks dropping the rewake entirely and silently losing all the findings the hook just computed. For push-sweep specifically: emit_metrics had to move from an unconditional pre-emission (line ~1680) to two conditional sites (one in the no-vulns branch with exit(0), one in the with-vulns branch with exit(2)) because the with-vulns branch needs to attach additional_context and CC reads only the first JSON line — a second emit would be ignored. Behavior is preserved: every push-sweep fire emits exactly one metrics line, just at a slightly later point in the function body. Verified locally on macOS Python 3.13: - py_compile clean. - Existing 45 smoke + extensibility tests still pass. - 21 new tests in test_hook_output_protocol.py (added to internal test suite at sg-staging/tests/, not in this PR): * 6 backward-compat: emit_metrics with metrics only, with rewake_summary, etc. — verifies the legacy callers still produce the same output shape. * 5 additional_context shape: lands in hookSpecificOutput, round-trips the value, default hook_event_name is sensible, empty/None doesn't pollute the JSON with an empty hSO block. * 3 system_message shape: lands in systemMessage, empty/None suppressed, round-trips. * 1 combined: metrics + rewake_summary + additional_context + system_message + hook_event_name all merge into one JSON line. * 6 round-trip safety: emoji, quotes, backslashes, newlines, Unicode (山田太郎 + 🎉), tabs, null bytes — all survive the json.dumps cycle. * 6 static-shape: each of the three asyncRewake handlers (commit_review, push_sweep, stop_hook) is checked to confirm it passes additional_context to emit_metrics and no longer writes the PROVENANCE_BANNER guidance to stderr. Catches the regression class where a new exit(2) site forgets to plumb guidance through the JSON channel. - 66/66 pass total (45 existing + 21 new) in 2.57s. NOT verified end-to-end with a real CC instance triggering all three hooks. The static-shape tests + the JSON round-trip tests should catch any regression in the emit_metrics output, but the actual interaction with CC's asyncRewake / rewakeMessage flow (especially: does hookSpecificOutput.additionalContext successfully appear in the rewakeMessage that CC sends to the model?) needs runtime verification against a CC version that supports the modern protocol. The reporter for #1375 specifically called out that CC's older versions surfaced 'json output validation failed' on the old stderr- only output; this fix changes the stdout shape to valid JSON with the findings included, which should resolve that error class. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 23:53:04 -07:00
Mohamed Hegazy	68a700837c	Merge pull request #2075 from anthropics/fix-2056-windows-unicode-decode security-guidance: lenient UTF-8 decode in 6 git-subprocess helpers (#2056)	2026-05-28 23:36:36 -07:00
Mohamed Hegazy	3d349d40b9	Merge pull request #2074 from anthropics/fix-xss-rules-non-js-false-positives security-guidance: gate XSS pattern rules to JS-family files	2026-05-28 23:18:17 -07:00
Mohamed Hegazy	6a63e35e75	security-guidance: lenient UTF-8 decode in 6 git-subprocess helpers (#2056 ) Fixes anthropics/claude-plugins-official#2056 — on Windows, when the worktree contains an untracked file whose name has a character undefined in cp1252 (accented capitals like Á Í Ï Ð Ý, most CJK, emoji), the UserPromptSubmit hook crashes: Exception in thread Thread-5 (_readerthread): UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 Traceback (most recent call last): File diffstate.py, line 338, in _list_untracked for p in r.stdout.split('\\0'): AttributeError: 'NoneType' object has no attribute 'split' Non-blocking (UPS failures still let the prompt through) but the baseline-untracked snapshot is silently lost, so the Stop-hook review mis-handles pre-existing untracked files. Root cause (reporter's diagnosis, verified): 1. core.quotePath=false makes git emit raw UTF-8 for non-ASCII filenames. 2. subprocess.run(..., text=True) decodes via locale.getpreferredencoding(False) in strict mode — on Windows that is cp1252, in which 0x81 / 0x8D / 0x8F / 0x90 / 0x9D are undefined. Those bytes appear in the UTF-8 encodings of Á (C3 81), Í (C3 8D), Ï (C3 8F), Ð (C3 90), Ý (C3 9D), and a large fraction of CJK / emoji codepoints. 3. The decode runs in the subprocess reader thread. The thread raises UnicodeDecodeError, threading prints 'Exception in thread Thread-N', subprocess.run returns with stdout=None. The handler then does None.split('\\0') -> AttributeError, which is NOT in the narrow except (TimeoutExpired, FileNotFoundError, OSError) tuple, so it escapes the helper, propagates out of UserPromptSubmit's ThreadPoolExecutor.result(), and exits the hook non-zero. This is internally inconsistent: gitutil._git_diff_range, security_reminder_hook._reflog_amend_lookup (line ~540), and the commit diff loop (line ~1115) already do bytes + decode utf-8/replace, with comments explicitly noting that text=True would crash. The fix below extends that established pattern to the helpers that were holdouts. Affected helpers (6 total): - diffstate._list_untracked <- reporter, hot path, CRITICAL - diffstate.capture_git_baseline <- reporter, latent - diffstate.get_baseline_file_content <- audit, file content read, HIGH - gitutil._git_name_only <- reporter, latent - gitutil._git_status_porcelain <- reporter, latent - gitutil._git_reflog_recent_commits <- audit, embeds %gs commit msg, HIGH For each one: - Drop text=True from subprocess.run. - Decode r.stdout / r.stderr as .decode('utf-8', errors='replace'). - Add ValueError to the except tuple as defense against any future strict-decode regression (UnicodeDecodeError is a ValueError subclass; including it explicitly degrades the helper to its empty/None return instead of escaping out of the hook). Verified locally on macOS Python 3.13: - py_compile clean on both files. - 45 existing smoke + extensibility tests still pass. - 21 new internal tests (not in this PR — added to the team's local test suite at staging/tests/test_unicode_decode.py): * 18 static-shape parametrized: each of the 6 fixed helpers has no text=True in its subprocess calls, contains errors='replace', and lists ValueError in its except. * Deterministic end-to-end: create real git repo + Ávila_report.txt untracked, call _list_untracked, verify it returns {'Ávila_report.txt': <mtime>} without crashing. * Deterministic end-to-end: same for capture_git_baseline (verifies the latent stderr-warning case stays valid). * Deterministic end-to-end: get_baseline_file_content on a file whose content has 山田太郎 + 🎉; verify the bytes round-trip through the decode. - 66/66 tests pass total (45 existing + 21 new). NOT verified end-to-end on Windows — would need actual cp1252 strict decode to fire. Reporter has the deterministic repro and will re-verify on their Win11 / Python 3.14.x setup before merge. Not in this PR (defense-in-depth, lower risk): - 3 git rev-parse calls returning path output (gitutil._find_git_index, _git_toplevel, _git_dir) could fail on Windows if cwd is in a non-ASCII install directory. Same fix shape but unreported and much lower probability — worth a separate follow-up if anyone actually hits it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 23:15:16 -07:00
Mohamed Hegazy	12a5376e20	security-guidance: gate XSS pattern rules to JS-family files Closes #410, #2037, #2045, #1640, #1280, #1329, #1341, #255, anthropics/claude-code#46720 (partial closes on overlap with other rules). The plugin's substring-only XSS / browser-DOM rules (new_function_injection, react_dangerously_set_html, document_write_xss, innerHTML_xss, outerHTML_xss, insertAdjacentHTML_xss) fired on any file containing the trigger substring — including: * Markdown documentation explaining XSS sinks * Blog posts / READMEs that name browser APIs * Python tutorials referencing dangerouslySetInnerHTML * Plugin skill files with example HTML strings * .yaml / .json configs that happen to contain the literal string * .gitignore / Dockerfile / Makefile These constructs have no meaning outside JS/TS source. Add a path_filter: lambda p: p.endswith(_JS_EXTS) to each so they fire only on .js, .jsx, .ts, .tsx, .mjs, .cjs, .mts, .cts, .vue, .svelte. Cross-checked against the existing _JS_EXTS-gated rules (regex_exec_substring, child_process_exec, exec_substring) — same pattern, same constant, same intent. Uses the module-level _JS_EXTS tuple so future extension changes propagate to all 6 rules atomically. Verified locally on macOS Python 3.13: - py_compile clean. - 45-test existing smoke + extensibility suite still passes. - 151 new parametrized tests in test_xss_gate.py (added to internal test suite this PR doesn't ship): each gated rule x every JS-family extension accepts, x every non-JS path (.md / .py / .yaml / .json / .txt / .html / Dockerfile / Makefile / .gitignore / .sh / .go / .rs / .rb) rejects. 196 tests pass total. Doesn't address everything in the false-positive cluster — issues that require Python-rule gating (#1114 .env.schema exec), tighter substring scoping (#660 pickle in usernames), or hook-protocol changes (#1358 exit-2 vs warning, #1375 plain-text-vs-JSON output) need separate PRs. This PR covers the JS-substring subset cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 23:07:53 -07:00
Mohamed Hegazy	04127de5d1	Merge pull request #2073 from anthropics/fix-2071-macos-python-39 security-guidance: enable LLM review on default macOS Python 3.9 (#2071)	2026-05-28 22:59:23 -07:00
Mohamed Hegazy	a67587c816	security-guidance: enable LLM review on default macOS Python 3.9 Fixes anthropics/claude-plugins-official#2071 — on macOS where the default `python3` is Apple's Command Line Tools Python 3.9.6, the plugin's agentic commit reviewer silently does not run, even when the user has a newer Python installed. Three compounding factors in the bug: 1. `sg-python.sh` only checks the major version (`3`), so it always picks 3.9 even when 3.10+ is on PATH. 2. `claude_agent_sdk` requires Python >=3.10 — pip install on 3.9 returns "No matching distribution" -> bootstrap returns BUILD_FAILED. 3. Even with a hand-built 3.12 venv, `llm.py` imports the SDK in-process into the hook's interpreter (still 3.9), which raises SyntaxError. The existing venv-probe in `ensure_agent_sdk.py` uses the venv's own Python (3.12) so it reports NOOP_VENV (healthy) while the consumer fails — misleading telemetry on top of silent feature degradation. Per BQ telemetry, 14,073 external macOS users hit sdk_bootstrap=BUILD_FAILED in the past 4 days (the default-macOS cohort), out of ~86K total external installed users. Combined with ~20K other users in similar broken-bootstrap states (Windows pre-#2055, Linux <3.10), about half the installed base has a silently-broken agentic reviewer. This PR implements the reporter's items #1, #3, and #4. Item #2 (running the SDK out-of-process) is deferred as a bigger refactor. Item #1 — hooks/sg-python.sh — prefer >=3.10 binaries via 3-pass probe: Pass 1: python3.13 / 3.12 / 3.11 / 3.10 (>=3.10 by name, highest wins) Pass 2: bare python3 / python / py -3 (accept only if reported >=3.10) Pass 3: bare python3 / python / py -3 (any Python 3, FALLBACK so pattern checks still work on macOS-default 3.9 — no regression vs today; SDK-dependent paths detect the version mismatch inside Python and degrade cleanly via item #4) Item #4 — ensure_agent_sdk.py — health-check honesty: Added HOOK_PY_INCOMPATIBLE=6 outcome with short-circuit at top of main(): if sys.version_info < (3, 10): return HOOK_PY_INCOMPATIBLE, "hook_py", f"py_{...}" Telemetry consequences after rollout: sdk_bootstrap=6 is a new clean bucket; some users currently miscounted in sdk_bootstrap=3 BUILD_FAILED (wasted pip cycles) and sdk_bootstrap=1 NOOP_VENV (falsely-healthy) move to sdk_bootstrap=6. The remaining NOOP_VENV count becomes trustworthy. Item #3 — ensure_agent_sdk.py — one-time user-visible notice: When outcome == HOOK_PY_INCOMPATIBLE and a marker file at `~/.claude/security/.agentic_unavailable_notice_v<pv>` doesn't exist, the SessionStart response includes hookSpecificOutput.additionalContext + systemMessage explaining the situation. Marker file is plugin- version-keyed so a future fix (e.g. shipping out-of-process SDK) can bump pv and re-notify users. BUILD_FAILED is intentionally excluded from the notice — it covers transient causes where a permanent banner would mislead. Verified locally on macOS Python 3.13: - py_compile clean on both files. - Existing 45-test smoke + extensibility suite: 45/45 PASS in 2.50s. - Unit test of simulated 3.9 path: HOOK_PY_INCOMPATIBLE returned with correct phase/kind; notice shown on first call, suppressed on second, reshown on bumped pv; BUILD_FAILED correctly does NOT trigger notice. NOT verified: actual Python 3.9 behavior end-to-end (would need a 3.9 install). Worth a follow-up smoke test in a 3.9 venv before next release. The unit test simulating 3.9 covers the logic but not the runtime invocation through the shim. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 22:58:01 -07:00
Bryan Thompson	502de97746	Add vibe-prospecting plugin (#1997 )	2026-05-28 15:30:04 -07:00
Bryan Thompson	679f52da9e	feat(scan): emit per-entry sticky verdict comments (#2009 ) Adds an `emit-verdict` job to scan-plugins.yml that posts a sticky comment per scanned entry to the corresponding bump PR, with marker `<!-- bump-pr-verdict:<name> -->`. The body is a schema_v1 JSON block, the same shape `anthropics/claude-plugins-community-internal`'s `scan-external-plugins.yml` already emits, so any consumer that already reads verdicts from that schema works uniformly across both repos. What this enables ----------------- Lets downstream consumers (label automation, dashboards, anything that wants per-entry verdict signal) read verdicts directly from the PR rather than scraping job logs or downloading artifacts. The current options are log-scraping (truncated after log retention) or fetching the `scan-verdicts` artifact (retention-limited and only after upload succeeds). What does NOT change -------------------- - The `scan` required check is unaffected (emit-verdict is `continue-on-error: true` at the job level — failures here MUST NOT block the required gate). - Verdict cache, scan flow, and revert-failed-bumps.yml are unchanged. - No new permission scopes (uses `pull-requests: write` at the job level, identical to other PR-commenting jobs in this repo). Schema notes ------------ - `scan.*` axes (clone, schema, binaries, etc.) emit as "skipped" — this workflow runs the policy review only, not per-entry static checks. Shape kept compatible with -internal's schema_v1 so the same consumers work uniformly on both repos. - `policy.has_broad_scope_hooks`, `has_undisclosed_telemetry`, `description_matches_behavior` emit as null — those granular axes aren't surfaced by this workflow's per-entry artifact yet. Consumers that map `null → "?"` for display already handle this gracefully. - `policy.status` is execution state (not outcome). Map source → status: scan-action-run → "ran"; cache-served → "cached". Outcome lives in `policy.passes`. policy.status vocabulary matches the `ran\|cached\|missing\|gated_out\|infra_error` convention from -internal's emit-verdict. PR resolution ------------- `pull_request` events carry the PR number directly. The bump workflow creates bump PRs via GITHUB_TOKEN (which doesn't fire `pull_request` triggers — recursion guard) and dispatches this scan via `workflow_dispatch` on the bump branch; in that case the job looks up the open PR by head ref via REST. No PR found (scan_all dispatch on main, etc.) → no-op with notice. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 15:29:59 -07:00
Bryan Thompson	13a0208f38	Add Skill-bundle plugins section to README (#2067 ) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 15:29:53 -07:00