Files
claude-plugins-official/plugins/security-guidance/hooks/sg-python.sh
Mohamed Hegazy 1ecf3d1bac security-guidance: purge text=True from subprocess.run + bake PYTHONUTF8=1 (#2099)
URGENT WINDOWS FIX. Sibling of #2056 / PR #2075 but covering 14 more
sites that PR #2075 missed.

The bug class: on Windows with cp1252 default encoding (typical
en-US locale), `subprocess.run(..., text=True)` decodes child stdout
AND stderr via `locale.getpreferredencoding()`. When git emits a
UTF-8 byte that's undefined in cp1252 (e.g. `0x81` from ف, present
in any path/filename/branch ref/commit message containing
Arabic/Hebrew/CJK), Python's internal `_readerthread` raises
UnicodeDecodeError. The thread crash is silent in Python 3.13+ (only
printed to stderr), but `subprocess.run` returns `stdout=None` and
the caller AttributeErrors on `.strip()`. The user sees a misleading
"WinError 267" or similar catch-all message instead of the real
decode failure.

PR #2075 fixed 6 specific helpers in `diffstate.py` / `gitutil.py`.
This commit covers the 14 survivors. Plus a defense-in-depth belt:
`PYTHONUTF8=1` exported by sg-python.sh.

This commit:

1. sg-python.sh: `export PYTHONUTF8=1` (PEP 540). No-op on
   macOS/Linux (already UTF-8). On Windows, makes Python's
   `locale.getpreferredencoding()` return UTF-8 instead of cp1252 —
   so even if a future regression slips in text=True, the decode
   succeeds. Must be set BEFORE Python starts; changing it from
   inside the interpreter has no effect.

2. gitutil.py: convert 8 subprocess.run sites from
   `capture_output=True, text=True` to `capture_output=True` +
   manual `r.stdout.decode("utf-8", errors="replace")`:
     - _git_rev_parse_head           (stdout = SHA, stderr risk)
     - _find_git_index               (stdout = PATH, primary bug site)
     - _temp_index git add           (returncode only, stderr risk)
     - _git_toplevel                 (stdout = PATH, primary bug site)
     - _git_dir                      (stdout = PATH, primary bug site)
     - _git_rev_list_range           (stdout = SHAs, stderr risk)
     - _detect_main_branch           (stdout = ref, stderr risk)
     - merge-base --is-ancestor      (returncode only, stderr risk)

3. security_reminder_hook.py: convert 6 subprocess.run sites
   (rev-parse @{u}/@{u}@{1}/local_ref, merge-base, HEAD lookup,
   reflog SHA resolution) — same pattern.

4. security_reminder_hook.py: fix the misleading log line in
   handle_user_prompt_submit. Was:
     debug_log("Failed to capture git baseline (not a git repo?)")
   Now includes the cwd in the message so the next reporter doesn't
   waste an hour grepping for the real WinError, per reporter's
   secondary finding.

Verified locally on macOS Python 3.13:

  - py_compile clean on all modified files.
  - bash -n sg-python.sh clean.
  - sg-python.sh actually propagates PYTHONUTF8=1 to child Python
    (verified via probe — sys.flags.utf8_mode=1).
  - Existing 353 tests still pass — 0 regression.
  - 25 new tests in test_2099_subprocess_text_true.py:
      * 10 static-shape catchers (one per hooks/*.py file). Any
        future PR that reintroduces text=True OR encoding= in
        subprocess.run fails this check at PR time. Single source
        of truth for the regression class.
      * 2 sg-python.sh verifiers (literal export + actual
        propagation to child Python).
      * 5 macOS end-to-end against a real git repo containing
        non-cp1252 content (`ف.py` filename): _git_toplevel,
        _git_dir, _find_git_index, _git_rev_parse_head,
        _git_rev_list_range all return clean values without
        AttributeError / UnicodeDecodeError.
      * 7 round-trip bytes-decode pattern verifiers (parametrized
        over Arabic ف, Hebrew א, Japanese 案, raw 0x81, multiple
        cp1252-undefined bytes, real-world git diff headers).
      * 1 sanity check that cp1252 strict DOES raise on 0x81
        (proves the test environment can catch the bug class).

  - Full suite: 378/378 pass in 5.56s.

  - End-to-end tmux smoke test driving real claude 2.1.145 CLI:
    Made a git commit via Bash tool call. All 4 hooks fired through
    the fixed plugin path:
      11:28:16.730  Hook called with args: …/plugin/hooks/security_reminder_hook.py
      11:28:16.734  Processing: hook_event=UserPromptSubmit
      11:28:16.825  Captured git baseline: 445f7f213256
      11:28:19.923  Hook called with args: …
      11:28:19.923  Processing: hook_event=PostToolUse, tool=Bash
      11:28:19.971  Commit review: detected git commit in command
      11:28:20.020  Commit review: 1/1 sha(s) resolved, 1 files
      11:28:26.415  Hook called with args: …
      11:28:26.416  Processing: hook_event=Stop
      11:28:26.550  Stop hook: empty review set

    Confirms: PYTHONUTF8=1 export doesn't break anything; converted
    helpers (_git_rev_parse_head, _git_toplevel, _git_dir,
    _find_git_index) run end-to-end without issue on the happy path.

NOT verified end-to-end on Windows with actual non-cp1252 content
in path/filename/stderr. The static-shape catcher pins the
regression class permanently. Reporter's PYTHONUTF8=1 workaround
empirically proves the encoding-mode fix works for the affected
scenario; this commit just bakes it in.

Closes #2099.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 11:29:07 -07:00

123 lines
5.3 KiB
Bash
Executable File

#!/usr/bin/env bash
# Find a working Python 3 interpreter and exec the hook with it.
#
# On Windows + Git Bash, `python3` typically resolves to the Microsoft Store
# stub at C:\Users\<user>\AppData\Local\Microsoft\WindowsApps\python3, which
# exits 49 silently in non-TTY subprocess context (a known Microsoft Store
# stub behavior). This shim
# probes each candidate with `-c ""` and skips any that fails, so the Store
# stub falls through to the real python.org install (`python` in Git Bash) or
# the `py -3` launcher.
#
# Order:
# 1. python3 — canonical on macOS/Linux; the Store stub fails the probe.
# 2. python — python.org installs on Windows; some Linux distros (RHEL 7
# EOL'd 2024-06) point this at Python 2, but `-c ""` succeeds
# on Python 2 too — guard with a version check.
# 3. py -3 — Windows Python launcher.
#
# Args after the shim path are passed straight through to the chosen
# interpreter, so the hooks.json invocation is:
# bash "${CLAUDE_PLUGIN_ROOT}/hooks/sg-python.sh" \
# "${CLAUDE_PLUGIN_ROOT}/hooks/security_reminder_hook.py"
set -e
# Force UTF-8 for ALL Python filesystem + IO operations (PEP 540).
# Without this, Windows Python defaults `locale.getpreferredencoding()` to
# cp1252 — which makes `text=True` in subprocess.run / open() / json.load
# crash the internal reader thread on any byte that's undefined in cp1252
# (e.g. the 0x81 byte from ف, present in any path/filename with
# Arabic/Hebrew/CJK characters). See #2056, #2099.
#
# No-op on macOS/Linux (already UTF-8). Must be set BEFORE Python starts —
# changing it from inside the interpreter has no effect.
export PYTHONUTF8=1
# Git Bash / MSYS on Windows hands script paths to this shim in POSIX form
# (`/c/Users/...`). When we exec a Windows `python.exe` (which we do on
# Windows since `python3` is the Microsoft Store stub), python interprets the
# leading `/` as the root of the current drive — e.g. `/c/Users/...` becomes
# `C:\c\Users\...` or `D:\c\Users\...` (whichever drive the shell is on),
# fails with ENOENT, and every Edit/Write/MultiEdit tool use blocks until the
# session restarts. See anthropics/claude-plugins-official#2043.
#
# Fix: convert absolute path args to native Windows form via `cygpath -w`
# before exec. `cygpath` is a Git Bash builtin; it's absent on macOS/Linux,
# where the `command -v` guard makes this a no-op. `cygpath -w` is idempotent
# for already-Windows paths so the rare mixed-form case is safe.
if command -v cygpath >/dev/null 2>&1; then
converted=()
for a in "$@"; do
case "$a" in
/*) converted+=("$(cygpath -w "$a")") ;;
*) converted+=("$a") ;;
esac
done
set -- "${converted[@]}"
fi
probe() {
# $1..N: the interpreter command (may be multi-word like `py -3`)
# Writes "<major>.<minor>" to stdout and exits 0 iff at least Python 3.
"$@" -c 'import sys; print(f"{sys.version_info[0]}.{sys.version_info[1]}")' 2>/dev/null
}
# True iff arg is a "M.m" version string >= 3.10. claude_agent_sdk requires
# Python >= 3.10; below that, pip install fails ("No matching distribution")
# and the LLM-powered review (Stop / commit / push) silently no-ops while
# pattern checks (PostToolUse regex) keep working. macOS ships 3.9.6 as the
# default `python3` on current versions, so this guard matters in practice.
# See anthropics/claude-plugins-official#2071.
is_sdk_compatible() {
case "$1" in
3.1[0-9]|3.[2-9][0-9]|[4-9].*|[1-9][0-9].*) return 0 ;;
*) return 1 ;;
esac
}
# Pass 1 — try minor-versioned binaries in descending order. These are only
# present if the user explicitly installed them (Homebrew / python.org / pyenv),
# so picking one here always upgrades over the system `python3`. Highest
# available wins; the user doesn't have to PATH-prefer it.
for cmd in "python3.13" "python3.12" "python3.11" "python3.10"; do
v=$(probe "$cmd") || continue
if is_sdk_compatible "$v"; then
exec "$cmd" "$@"
fi
done
# Pass 2 — bare interpreters, but only if SDK-compatible. Covers Linux distros
# that ship 3.10+ as the default `python3`, and Windows where `python` /
# `py -3` resolves to the user's python.org install.
for cmd in "python3" "python" "py -3"; do
# shellcheck disable=SC2086
v=$(probe $cmd) || continue
if is_sdk_compatible "$v"; then
# shellcheck disable=SC2086
exec $cmd "$@"
fi
done
# Pass 3 — fallback to any Python 3, even <3.10. Pattern-based checks
# (PostToolUse regex on Edit/Write) only need 3.6+ and are useful on their
# own; the SDK-dependent paths will detect the version mismatch and degrade
# inside the Python code. Without this fallback, the entire plugin would
# stop working on default macOS, which is a regression vs today.
for cmd in "python3" "python" "py -3"; do
# shellcheck disable=SC2086
v=$(probe $cmd) || continue
# Accept anything that successfully reported a "M.m" string.
case "$v" in
[0-9]*.[0-9]*)
# shellcheck disable=SC2086
exec $cmd "$@"
;;
esac
done
echo "security-guidance: no working Python 3 interpreter found." >&2
echo " tried: python3.13, python3.12, python3.11, python3.10, python3, python, py -3" >&2
echo " on Windows, install Python from https://python.org (NOT the Microsoft Store)" >&2
echo " on macOS, install Python 3.10+ via Homebrew (\`brew install python\`)" >&2
exit 1