security-guidance: detect Graphite (gt) commands as commit/push events (#2048 )

Fixes anthropics/claude-plugins-official#2048 — teams using Graphite for stacked PRs (`gt create` / `gt modify` / `gt submit`) never get the commit/push agentic review because the hook matcher only catches literal `git commit` / `git push` Bash calls. gt shells out to git as a subprocess, but the hook fires on Claude's top-level tool call, which is `gt create` — not the `git commit` invocation inside the gt subprocess that Claude Code never observes. Per-edit pattern checks and end-of-turn Stop review still fire (those don't depend on detecting the commit command), so the silent-coverage- gap is bounded to the deepest review layer for Graphite users. Still: that's exactly the layer designed to catch IDOR / auth-bypass / cross-file SSRF, so the gap matters. Semantic mapping (per the reporter): gt create -> commit (like `git commit`) gt modify -> commit + amend (like `git commit --amend`) gt submit -> push (like `git push`) Changes: 1. hooks/hooks.json: extend two PostToolUse `if` matchers. "Bash(git commit:*)" -> "Bash(git commit:*)|Bash(gt create:*)|Bash(gt modify:*)" "Bash(git push:*)" -> "Bash(git push:*)|Bash(gt submit:*)" Without this, the hook subprocess never spawns for gt invocations and the Python regex changes below are dead code. 2. hooks/security_reminder_hook.py: extend three regexes that classify the bash command line. _GIT_COMMIT_RE: now also matches `gt create` and `gt modify`. Used at 4 sites (handler gate, multi-commit count, prompt detection, event classification). Compound commands like `gt create -am a && gt submit` now correctly trigger both the commit and push paths. _GIT_AMEND_RE: now also matches `gt modify` (semantically an amend). The amend code path uses reflog to find the pre-amend SHA and diff against THAT instead of HEAD~1 — same code path now applies to `gt modify`. _GIT_PUSH_RE: now also matches `gt submit`. Tolerates the same `git -C path` / `git -c k=v` global options as before for the git form; gt has its own flag layer that doesn't conflict. Verified locally on macOS Python 3.13: - JSON valid (hooks.json roundtrips). - Existing 45 smoke + extensibility tests still pass. - 76 new tests in test_gt_graphite_workflow.py (added to internal test suite this PR doesn't ship — kept in sg-staging tests/ until we have a story for shipping plugin tests publicly): * 16 parametrized commit-match: native git commit variants + all gt create / gt modify variants from the reporter's repro. * 11 parametrized commit-reject: gt submit, gt log, gtoolkit (word-boundary), agt create, etc. * 9 parametrized amend-match: git commit --amend variants + gt modify variants + chained git+gt. * 7 parametrized amend-reject: regular git commit, gt create, gt submit, echo'd substring noise. * 11 parametrized push-match: git push variants + gt submit variants + chained. * 12 parametrized push-reject: git commit, gt log, gt fetch, gt down, gt restack, gh pr create, agt submit. * 3 compound-command class tests: git+gt mixtures trigger both paths; gt modify chained with gt submit triggers amend + push. * 3 commit-invocation-count tests: gt commands contribute to the multi-commit-detection findall count. * 2 hooks.json static config tests: read the JSON, verify the commit and push `if` clauses include the gt cases. Catches the easy regression where someone updates the Python regex but forgets to widen the matcher. - 121/121 pass total (45 existing + 76 new) in 2.50s. NOT verified end-to-end with a real `gt` install. Reporter has the deterministic Graphite workflow and offered to retest. The regex + matcher widening is a clean superset — current git-only matching still works (verified by the 45-test smoke suite that uses `git commit` / `git push` exclusively), and the new gt cases are pure additions. Not in this PR: - `gt prev` / `gt next` / `gt up` / `gt down` etc. — pure navigation, no commit / push side effect. - `gt restack` — could in principle rewrite commits (so the plugin's reviewed-shas cache becomes stale), but it doesn't create reviewable new content. Out of scope. - `gh pr create` — already explicitly NOT a separate matcher per the existing comment in _GIT_PUSH_RE (gh invokes git push as a child process; the bash hook only sees the top-level `gh pr create`). Same architectural issue as gt but with a different cost/benefit per the existing comment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-10 18:23:36 +00:00 · 2026-05-28 23:33:14 -07:00
4 changed files with 29 additions and 144 deletions
--- a/plugins/security-guidance/hooks/ensure_agent_sdk.py
+++ b/plugins/security-guidance/hooks/ensure_agent_sdk.py
@@ -32,8 +32,6 @@ BUILD_FAILED = 3     # venv create or pip install raised/timed out
 # llm.py also matches Windows venv layout (Lib/site-packages). Don't reuse the
 # value — telemetry rows from older plugin builds still emit 4.
 SKIP_SENTINEL = 5    # another SessionStart is currently building
-HOOK_PY_INCOMPATIBLE = 6  # hook interpreter is <3.10 — SDK syntax can't load
-                          # here no matter how the venv was built. See #2071.


 def _sdk_on_syspath() -> bool:
@@ -64,29 +62,6 @@ def main() -> tuple[int, str, str]:
    err_phase / err_kind are non-empty only on BUILD_FAILED — they let
    telemetry split bootstrap failures by root cause.
    """
-    # Honesty check (fixes the misleading NOOP_VENV in #2071): the SDK
-    # requires Python >=3.10 and uses 3.10+ syntax (match statements,
-    # PEP 604 unions). On a 3.9 hook interpreter we CANNOT import it no
-    # matter how the venv was built — llm.py runs in this same interpreter
-    # and the syntax-level import will SyntaxError. macOS ships 3.9.6 as
-    # the default `python3` and `/usr/bin` precedes Homebrew in PATH, so
-    # this case is the default state for a large share of macOS users.
-    #
-    # sg-python.sh now prefers python3.10+ binaries so most users won't
-    # reach this branch; the fallback to 3.9 is preserved for the
-    # pattern-warning hooks that don't need the SDK. Reporting
-    # HOOK_PY_INCOMPATIBLE here:
-    #   (a) avoids 30-60s of wasted pip install,
-    #   (b) avoids the lie where the venv_py probe says NOOP_VENV but the
-    #       consumer import fails, and
-    #   (c) gives telemetry a clean bucket to size the affected fleet.
-    if sys.version_info < (3, 10):
-        return (
-            HOOK_PY_INCOMPATIBLE,
-            "hook_py",
-            f"py_{sys.version_info[0]}.{sys.version_info[1]}",
-        )
-
    if _sdk_on_syspath():
        return NOOP_SYSTEM, "", ""

@@ -220,56 +195,6 @@ def main() -> tuple[int, str, str]:
            sentinel.unlink(missing_ok=True)


-def _maybe_emit_user_notice(outcome: int, pv: int) -> str | None:
-    """Return a one-time user-visible notice when the agentic reviewer is
-    in a persistent broken state on this machine, or None if we've already
-    shown the notice for this plugin version (or shouldn't show one).
-
-    The marker file is plugin-version-keyed: a future plugin update can
-    re-notify if behavior changes (e.g. we ship out-of-process SDK in v3
-    and want to tell affected users it's fixed). Failures to write the
-    marker degrade to "skip the notice this session" so we don't spam
-    every SessionStart on a read-only home dir.
-
-    Currently only HOOK_PY_INCOMPATIBLE qualifies. BUILD_FAILED is
-    intentionally excluded — it covers transient causes (network failure,
-    pip registry hiccup, in-flight rebuild) where the next session may
-    succeed and a permanent notice would mislead.
-    """
-    if outcome != HOOK_PY_INCOMPATIBLE:
-        return None
-    try:
-        state_dir = Path(
-            os.environ.get("SECURITY_WARNINGS_STATE_DIR")
-            or os.path.expanduser("~/.claude/security")
-        )
-        marker = state_dir / f".agentic_unavailable_notice_v{pv or 0}"
-        if marker.exists():
-            return None
-        state_dir.mkdir(parents=True, exist_ok=True)
-        # Write timestamp + Python version so the marker is self-documenting
-        # if a user goes looking. O_EXCL would be racier with no real win
-        # (two concurrent SessionStarts both showing the notice once is fine).
-        marker.write_text(
-            f"{time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime())} "
-            f"py={sys.version_info[0]}.{sys.version_info[1]}\n"
-        )
-    except OSError:
-        return None
-    return (
-        f"⚠ security-guidance plugin: the cross-file commit reviewer "
-        f"(layer 3 of 3 — catches IDOR, auth-bypass, cross-file SSRF) "
-        f"is unavailable in this environment. It requires Python ≥3.10, "
-        f"but the hook is running on "
-        f"{sys.version_info[0]}.{sys.version_info[1]}.\n\n"
-        f"Pattern checks and the single-shot LLM diff review are still "
-        f"active. To enable the deeper reviewer, install Python 3.10+ "
-        f"(e.g. `brew install python` on macOS) and restart Claude Code.\n\n"
-        f"This notice is shown once per plugin version. "
-        f"See: github.com/anthropics/claude-plugins-official/issues/2071"
-    )
-
-
 if __name__ == "__main__":
    # Tell the harness this is async — venv create + pip install can take
    # 30-60s on a cold cache, well past the default sync hook timeout.
@@ -306,18 +231,4 @@ if __name__ == "__main__":
    pv = _plugin_version_int()
    if pv:
        metrics["pv"] = pv
-    response: dict[str, object] = {"metrics": metrics}
-    # One-time user-visible notice when the agentic reviewer is dead on
-    # arrival. Uses hookSpecificOutput.additionalContext (SessionStart's
-    # supported channel for surfacing text to both the model and the user)
-    # plus systemMessage as a belt-and-suspenders. Marker-file-gated so
-    # this fires exactly once per plugin version per install — see
-    # _maybe_emit_user_notice.
-    notice = _maybe_emit_user_notice(outcome, pv)
-    if notice:
-        response["hookSpecificOutput"] = {
-            "hookEventName": "SessionStart",
-            "additionalContext": notice,
-        }
-        response["systemMessage"] = notice
-    print(json.dumps(response), flush=True)
+    print(json.dumps({"metrics": metrics}), flush=True)
--- a/plugins/security-guidance/hooks/hooks.json
+++ b/plugins/security-guidance/hooks/hooks.json
@@ -37,7 +37,7 @@
          {
            "type": "command",
            "command": "bash \"${CLAUDE_PLUGIN_ROOT}/hooks/sg-python.sh\" \"${CLAUDE_PLUGIN_ROOT}/hooks/security_reminder_hook.py\"",
-            "if": "Bash(git commit:*)",
+            "if": "Bash(git commit:*)|Bash(gt create:*)|Bash(gt modify:*)",
            "asyncRewake": true,
            "rewakeMessage": "Background security review of commit — address or acknowledge the findings below, then continue with the user's original request or continue waiting for their reply:",
            "rewakeSummary": "Commit security review found issues"
@@ -45,7 +45,7 @@
          {
            "type": "command",
            "command": "bash \"${CLAUDE_PLUGIN_ROOT}/hooks/sg-python.sh\" \"${CLAUDE_PLUGIN_ROOT}/hooks/security_reminder_hook.py\"",
-            "if": "Bash(git push:*)",
+            "if": "Bash(git push:*)|Bash(gt submit:*)",
            "asyncRewake": true,
            "rewakeMessage": "Background security review of pushed commits not yet reviewed — address or acknowledge the findings below, then continue with the user's original request or continue waiting for their reply:",
            "rewakeSummary": "Push security review found issues"
--- a/plugins/security-guidance/hooks/security_reminder_hook.py
+++ b/plugins/security-guidance/hooks/security_reminder_hook.py
@@ -594,8 +594,21 @@ _COMMIT_SHA_RE = re.compile(r'^\[[^\]]*?\b([0-9a-f]{7,40})\]', re.MULTILINE)
 # detection — it does NOT tolerate `git -c k=v commit` global options, which
 # keeps this hook aligned with CC's commit attribution on what counts as a
 # commit.
-_GIT_COMMIT_RE = re.compile(r'\bgit\s+commit(?:\s|$)')
-_GIT_AMEND_RE = re.compile(r'\s--amend\b')
+#
+# Also matches `gt create` and `gt modify` — Graphite's stacked-PR wrapper
+# around git. `gt create` produces a new commit (mapped to git commit
+# semantics); `gt modify` amends the current commit (mapped to git commit
+# --amend, also flagged by _GIT_AMEND_RE below). The hooks.json matcher
+# widening for `gt create:*` / `gt modify:*` / `gt submit:*` ships in the
+# same change set — without that widening this regex change is dead code
+# because the hook subprocess never spawns for gt invocations. See #2048.
+_GIT_COMMIT_RE = re.compile(r'\b(?:git\s+commit|gt\s+(?:create|modify))(?:\s|$)')
+# Match either the `--amend` flag (with the leading whitespace boundary
+# preserved from the original) OR `gt modify` which is semantically an
+# amend. The handler treats matches as "find the pre-amend SHA via reflog
+# and diff against THAT, not against the post-amend HEAD's parent" — same
+# code path for both git --amend and gt modify.
+_GIT_AMEND_RE = re.compile(r'(?:\s--amend\b|\bgt\s+modify\b)')

 # Rolling-window cap on LLM commit-review calls. See atomic_check_rate_limit
 # docstring for the rationale that motivated the switch from a lifetime cap.
@@ -624,8 +637,13 @@ COMMIT_REVIEW_RATE_WINDOW_S = int(
 # entry would buy minimal extra coverage (sessions that push only via gh) at
 # the cost of an extra python spawn on every `... && gh pr create` compound
 # (the common case). Those sessions are caught on their next standalone `git push`.
+# Matches `git push` (with optional `-c k=v` / `-C path` global options
+# CC's hooks.json matcher doesn't tolerate) OR `gt submit` — Graphite's
+# stacked-PR push command. gt submit forwards to `git push` internally,
+# but the bash hook fires on Claude's top-level command so we need to
+# recognize gt submit at the matcher level. See #2048.
 _GIT_PUSH_RE = re.compile(
-    r'\bgit(?:\s+-[cC]\s+\S+|\s+--\S+=\S+)*\s+push\b'
+    r'(?:\bgit(?:\s+-[cC]\s+\S+|\s+--\S+=\S+)*\s+push\b|\bgt\s+submit\b)'
 )

 # `git push` stdout: "abc1234..def5678  branch -> branch" (or `+abc..def` on
--- a/plugins/security-guidance/hooks/sg-python.sh
+++ b/plugins/security-guidance/hooks/sg-python.sh
@@ -47,65 +47,21 @@ fi

 probe() {
    # $1..N: the interpreter command (may be multi-word like `py -3`)
-    # Writes "<major>.<minor>" to stdout and exits 0 iff at least Python 3.
-    "$@" -c 'import sys; print(f"{sys.version_info[0]}.{sys.version_info[1]}")' 2>/dev/null
+    # Probe writes the major version to stdout and exits 0 iff it's >=3.
+    "$@" -c 'import sys; print(sys.version_info[0])' 2>/dev/null
 }

-# True iff arg is a "M.m" version string >= 3.10. claude_agent_sdk requires
-# Python >= 3.10; below that, pip install fails ("No matching distribution")
-# and the LLM-powered review (Stop / commit / push) silently no-ops while
-# pattern checks (PostToolUse regex) keep working. macOS ships 3.9.6 as the
-# default `python3` on current versions, so this guard matters in practice.
-# See anthropics/claude-plugins-official#2071.
-is_sdk_compatible() {
-    case "$1" in
-        3.1[0-9]|3.[2-9][0-9]|[4-9].*|[1-9][0-9].*) return 0 ;;
-        *) return 1 ;;
-    esac
-}
-
-# Pass 1 — try minor-versioned binaries in descending order. These are only
-# present if the user explicitly installed them (Homebrew / python.org / pyenv),
-# so picking one here always upgrades over the system `python3`. Highest
-# available wins; the user doesn't have to PATH-prefer it.
-for cmd in "python3.13" "python3.12" "python3.11" "python3.10"; do
-    v=$(probe "$cmd") || continue
-    if is_sdk_compatible "$v"; then
-        exec "$cmd" "$@"
-    fi
-done
-
-# Pass 2 — bare interpreters, but only if SDK-compatible. Covers Linux distros
-# that ship 3.10+ as the default `python3`, and Windows where `python` /
-# `py -3` resolves to the user's python.org install.
 for cmd in "python3" "python" "py -3"; do
+    # Word-split intentionally so `py -3` works
    # shellcheck disable=SC2086
    v=$(probe $cmd) || continue
-    if is_sdk_compatible "$v"; then
+    if [ "$v" = "3" ]; then
        # shellcheck disable=SC2086
        exec $cmd "$@"
    fi
 done

-# Pass 3 — fallback to any Python 3, even <3.10. Pattern-based checks
-# (PostToolUse regex on Edit/Write) only need 3.6+ and are useful on their
-# own; the SDK-dependent paths will detect the version mismatch and degrade
-# inside the Python code. Without this fallback, the entire plugin would
-# stop working on default macOS, which is a regression vs today.
-for cmd in "python3" "python" "py -3"; do
-    # shellcheck disable=SC2086
-    v=$(probe $cmd) || continue
-    # Accept anything that successfully reported a "M.m" string.
-    case "$v" in
-        [0-9]*.[0-9]*)
-            # shellcheck disable=SC2086
-            exec $cmd "$@"
-            ;;
-    esac
-done
-
 echo "security-guidance: no working Python 3 interpreter found." >&2
-echo "  tried: python3.13, python3.12, python3.11, python3.10, python3, python, py -3" >&2
+echo "  tried: python3, python, py -3" >&2
 echo "  on Windows, install Python from https://python.org (NOT the Microsoft Store)" >&2
-echo "  on macOS, install Python 3.10+ via Homebrew (\`brew install python\`)" >&2
 exit 1