Compare commits

...

4 Commits

Author SHA1 Message Date
Mohamed Hegazy
6a63e35e75 security-guidance: lenient UTF-8 decode in 6 git-subprocess helpers (#2056)
Fixes anthropics/claude-plugins-official#2056 — on Windows, when the
worktree contains an untracked file whose name has a character undefined
in cp1252 (accented capitals like Á Í Ï Ð Ý, most CJK, emoji), the
UserPromptSubmit hook crashes:

  Exception in thread Thread-5 (_readerthread):
    UnicodeDecodeError: 'charmap' codec can't decode byte 0x81
  Traceback (most recent call last):
    File diffstate.py, line 338, in _list_untracked
      for p in r.stdout.split('\\0'):
  AttributeError: 'NoneType' object has no attribute 'split'

Non-blocking (UPS failures still let the prompt through) but the
baseline-untracked snapshot is silently lost, so the Stop-hook review
mis-handles pre-existing untracked files.

Root cause (reporter's diagnosis, verified):

1. core.quotePath=false makes git emit raw UTF-8 for non-ASCII filenames.
2. subprocess.run(..., text=True) decodes via
   locale.getpreferredencoding(False) in strict mode — on Windows that
   is cp1252, in which 0x81 / 0x8D / 0x8F / 0x90 / 0x9D are undefined.
   Those bytes appear in the UTF-8 encodings of Á (C3 81), Í (C3 8D),
   Ï (C3 8F), Ð (C3 90), Ý (C3 9D), and a large fraction of CJK / emoji
   codepoints.
3. The decode runs in the subprocess reader thread. The thread raises
   UnicodeDecodeError, threading prints 'Exception in thread Thread-N',
   subprocess.run returns with stdout=None. The handler then does
   None.split('\\0') -> AttributeError, which is NOT in the narrow
   except (TimeoutExpired, FileNotFoundError, OSError) tuple, so it
   escapes the helper, propagates out of UserPromptSubmit's
   ThreadPoolExecutor.result(), and exits the hook non-zero.

This is internally inconsistent: gitutil._git_diff_range,
security_reminder_hook._reflog_amend_lookup (line ~540), and the commit
diff loop (line ~1115) already do bytes + decode utf-8/replace, with
comments explicitly noting that text=True would crash. The fix below
extends that established pattern to the helpers that were holdouts.

Affected helpers (6 total):

  - diffstate._list_untracked            <- reporter, hot path, CRITICAL
  - diffstate.capture_git_baseline       <- reporter, latent
  - diffstate.get_baseline_file_content  <- audit, file content read, HIGH
  - gitutil._git_name_only                <- reporter, latent
  - gitutil._git_status_porcelain         <- reporter, latent
  - gitutil._git_reflog_recent_commits    <- audit, embeds %gs commit msg, HIGH

For each one:

  - Drop text=True from subprocess.run.
  - Decode r.stdout / r.stderr as .decode('utf-8', errors='replace').
  - Add ValueError to the except tuple as defense against any future
    strict-decode regression (UnicodeDecodeError is a ValueError
    subclass; including it explicitly degrades the helper to its
    empty/None return instead of escaping out of the hook).

Verified locally on macOS Python 3.13:

  - py_compile clean on both files.
  - 45 existing smoke + extensibility tests still pass.
  - 21 new internal tests (not in this PR — added to the team's local
    test suite at staging/tests/test_unicode_decode.py):
      * 18 static-shape parametrized: each of the 6 fixed helpers has
        no text=True in its subprocess calls, contains errors='replace',
        and lists ValueError in its except.
      * Deterministic end-to-end: create real git repo + Ávila_report.txt
        untracked, call _list_untracked, verify it returns
        {'Ávila_report.txt': <mtime>} without crashing.
      * Deterministic end-to-end: same for capture_git_baseline (verifies
        the latent stderr-warning case stays valid).
      * Deterministic end-to-end: get_baseline_file_content on a file
        whose content has 山田太郎 + 🎉; verify the bytes round-trip
        through the decode.
  - 66/66 tests pass total (45 existing + 21 new).

NOT verified end-to-end on Windows — would need actual cp1252 strict
decode to fire. Reporter has the deterministic repro and will
re-verify on their Win11 / Python 3.14.x setup before merge.

Not in this PR (defense-in-depth, lower risk):

  - 3 git rev-parse calls returning path output (gitutil._find_git_index,
    _git_toplevel, _git_dir) could fail on Windows if cwd is in a
    non-ASCII install directory. Same fix shape but unreported and
    much lower probability — worth a separate follow-up if anyone
    actually hits it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 23:15:16 -07:00
Bryan Thompson
502de97746 Add vibe-prospecting plugin (#1997) 2026-05-28 15:30:04 -07:00
Bryan Thompson
679f52da9e feat(scan): emit per-entry sticky verdict comments (#2009)
Adds an `emit-verdict` job to scan-plugins.yml that posts a sticky
comment per scanned entry to the corresponding bump PR, with marker
`<!-- bump-pr-verdict:<name> -->`. The body is a schema_v1 JSON block,
the same shape `anthropics/claude-plugins-community-internal`'s
`scan-external-plugins.yml` already emits, so any consumer that already
reads verdicts from that schema works uniformly across both repos.

What this enables
-----------------

Lets downstream consumers (label automation, dashboards, anything that
wants per-entry verdict signal) read verdicts directly from the PR
rather than scraping job logs or downloading artifacts. The current
options are log-scraping (truncated after log retention) or fetching
the `scan-verdicts` artifact (retention-limited and only after upload
succeeds).

What does NOT change
--------------------

- The `scan` required check is unaffected (emit-verdict is
  `continue-on-error: true` at the job level — failures here MUST NOT
  block the required gate).
- Verdict cache, scan flow, and revert-failed-bumps.yml are unchanged.
- No new permission scopes (uses `pull-requests: write` at the job
  level, identical to other PR-commenting jobs in this repo).

Schema notes
------------

- `scan.*` axes (clone, schema, binaries, etc.) emit as "skipped" —
  this workflow runs the policy review only, not per-entry static
  checks. Shape kept compatible with -internal's schema_v1 so the
  same consumers work uniformly on both repos.
- `policy.has_broad_scope_hooks`, `has_undisclosed_telemetry`,
  `description_matches_behavior` emit as null — those granular axes
  aren't surfaced by this workflow's per-entry artifact yet. Consumers
  that map `null → "?"` for display already handle this gracefully.
- `policy.status` is execution state (not outcome). Map source →
  status: scan-action-run → "ran"; cache-served → "cached". Outcome
  lives in `policy.passes`. policy.status vocabulary matches the
  `ran|cached|missing|gated_out|infra_error` convention from
  -internal's emit-verdict.

PR resolution
-------------

`pull_request` events carry the PR number directly. The bump workflow
creates bump PRs via GITHUB_TOKEN (which doesn't fire `pull_request`
triggers — recursion guard) and dispatches this scan via
`workflow_dispatch` on the bump branch; in that case the job looks up
the open PR by head ref via REST. No PR found (scan_all dispatch on
main, etc.) → no-op with notice.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 15:29:59 -07:00
Bryan Thompson
13a0208f38 Add Skill-bundle plugins section to README (#2067)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 15:29:53 -07:00
5 changed files with 292 additions and 26 deletions

View File

@@ -2585,6 +2585,20 @@
},
"homepage": "https://github.com/vercel/vercel-plugin"
},
{
"name": "vibe-prospecting",
"description": "Vibe Prospecting connects Claude to live B2B company and contact data so users can search, match, enrich, filter, and export prospects at scale. It turns natural-language requests into structured GTM workflows for lead generation, CRM enrichment, company research, executive discovery, and multi-step prospecting automation inside Claude Cowork and Claude Code.",
"author": {
"name": "vibeprospecting.ai"
},
"category": "productivity",
"source": {
"source": "url",
"url": "https://github.com/explorium-ai/vibeprospecting-plugin.git",
"sha": "ada4d569dbf70194fe18750ecbc5170e9a3f120a"
},
"homepage": "https://www.vibeprospecting.ai/product/claude-plugin"
},
{
"name": "windsor-ai",
"description": "Connect Claude Code to 325+ business data sources via Windsor.ai. Query marketing, sales, CRM, ecommerce, finance, and analytics data from Google Ads, Meta, HubSpot, Salesforce, Shopify, Stripe, and hundreds more — directly from your terminal.",

View File

@@ -381,3 +381,166 @@ jobs:
echo "::error::Scan step failed without a parseable policy verdict (likely an infra error)."
exit 1
fi
# ─────────────────────────────────────────────────────────────────────────────
# emit-verdict: post a sticky comment per entry to the bump PR with the
# structured verdict, so downstream tooling (label automation, delist
# authoring) can read verdicts directly instead of scraping job logs.
# Sticky comment marker: `<!-- bump-pr-verdict:<name> -->`.
#
# Mirrors the schema_v1 contract from
# anthropics/claude-plugins-community-internal#3908 so the triage scripts
# in mcp-local-directory/scripts/triage/ work uniformly across both repos.
# -official doesn't run per-entry static checks (zombie, schema, binaries,
# etc.) so the `scan.*` axes are emitted as "skipped". The granular policy
# booleans (`has_broad_scope_hooks`, `has_undisclosed_telemetry`,
# `description_matches_behavior`) aren't surfaced by this workflow's
# per-entry artifact yet, so they're emitted as null; the triage
# `triage_bool_to_str` helper maps null → "?" so display is graceful.
# Status describes the execution state, not the outcome — `ran` when the
# scan action evaluated this SHA fresh, `cached` when a prior verdict was
# reused (cf. run-verdicts.json's `source` field). Outcome lives in
# `policy.passes`. policy-sweep.sh dispatches on this exact vocabulary.
#
# PR resolution: pull_request events carry the PR number directly. The
# bump workflow creates bump PRs via GITHUB_TOKEN (which doesn't fire
# pull_request triggers — recursion guard) and dispatches this scan via
# workflow_dispatch on the bump branch. In that case we look up the
# open PR by head ref. No PR (scan_all dispatch on main, etc.) → no-op.
#
# continue-on-error at the job level: emit failure must NOT block the
# `scan` required check. Consumers fall back to log-scraping if the
# comment is absent (gradual migration; no flag day).
# ─────────────────────────────────────────────────────────────────────────────
emit-verdict:
needs: [scan]
if: always() && needs.scan.result != 'skipped' && needs.scan.result != 'cancelled'
runs-on: ubuntu-latest
continue-on-error: true
permissions:
contents: read
pull-requests: write
steps:
- name: Download scan verdicts
uses: actions/download-artifact@v4
with:
name: scan-verdicts
path: /tmp/scan-verdicts
continue-on-error: true
- name: Resolve PR number for this ref
id: pr
env:
GH_TOKEN: ${{ github.token }}
EVENT_NAME: ${{ github.event_name }}
PR_FROM_EVENT: ${{ github.event.pull_request.number }}
REF: ${{ github.ref_name }}
REPO: ${{ github.repository }}
run: |
set -euo pipefail
if [[ "$EVENT_NAME" == "pull_request" && -n "$PR_FROM_EVENT" ]]; then
echo "number=$PR_FROM_EVENT" >> "$GITHUB_OUTPUT"
exit 0
fi
# workflow_dispatch on the bump branch: find the open PR for it.
# head filter takes the form owner:branch.
owner="${REPO%%/*}"
pr=$(gh api "/repos/${REPO}/pulls?state=open&head=${owner}:${REF}&per_page=1" \
--jq '.[0].number // ""')
if [[ -z "$pr" ]]; then
echo "::notice::No open PR for ref ${REF} — sticky comments skipped (verdicts still in scan-verdicts artifact)"
fi
echo "number=$pr" >> "$GITHUB_OUTPUT"
- name: Build and post sticky comments
if: steps.pr.outputs.number != ''
env:
GH_TOKEN: ${{ github.token }}
REPO: ${{ github.repository }}
PR: ${{ steps.pr.outputs.number }}
RUN_ID: ${{ github.run_id }}
run: |
set -euo pipefail
verdicts_path=/tmp/scan-verdicts/run-verdicts.json
# Missing/empty artifact: scan job ran but didn't produce verdicts
# (e.g. the relevance gate said "no changes"). Nothing to comment;
# exit clean.
if [[ ! -s "$verdicts_path" ]]; then
echo "::notice::No run-verdicts.json artifact — nothing to emit"
exit 0
fi
count=$(jq 'length' "$verdicts_path")
if [[ "$count" == "0" ]]; then
echo "::notice::run-verdicts.json is empty — nothing to emit"
exit 0
fi
ran_at=$(date -u +%Y-%m-%dT%H:%M:%SZ)
# scan.* axes: -official doesn't run per-entry static checks; emit
# "skipped" for each so the schema is shape-compatible with -internal.
scan_stub='{"clone":"skipped","subpath_missing":"skipped","schema":"skipped","zombie":"skipped","tool_allowlist":"skipped","binaries":"skipped","unique":"skipped","mcp":"skipped"}'
# Pre-fetch all PR comments once (paginated) for the marker lookup.
gh api --paginate "/repos/$REPO/issues/$PR/comments" \
--jq '.[] | {id, body}' > /tmp/comments.ndjson
jq -c '.[]' "$verdicts_path" | while read -r entry; do
name=$(jq -r '.name' <<< "$entry")
passes=$(jq -r '.passes' <<< "$entry")
summary=$(jq -r '.summary // ""' <<< "$entry")
violations=$(jq -r '.violations // ""' <<< "$entry")
source=$(jq -r '.source // "scan"' <<< "$entry")
# status = execution state (cf. -internal#3908 vocabulary).
# Outcome is in `passes`. Map source → status: scan-action-run
# → "ran"; cache-served → "cached". Anything else falls through
# as "ran" (only those two values appear in run-verdicts.json).
case "$source" in
cache) status="cached" ;;
scan) status="ran" ;;
*) status="ran" ;;
esac
policy=$(jq -n \
--argjson passes "$passes" \
--arg summary "$summary" \
--arg violations "$violations" \
--arg source "$source" \
--arg status "$status" \
'{passes: $passes,
has_broad_scope_hooks: null,
has_undisclosed_telemetry: null,
description_matches_behavior: null,
summary: $summary,
violations: $violations,
source: $source,
status: $status}')
verdict=$(jq -n \
--argjson scan "$scan_stub" \
--argjson policy "$policy" \
--arg ran_at "$ran_at" \
--arg run_id "$RUN_ID" \
'{schema_version: 1, ran_at: $ran_at, run_id: $run_id, scan: $scan, policy: $policy}')
marker="<!-- bump-pr-verdict:$name -->"
body=$(printf '%s\n```json\n%s\n```' "$marker" "$verdict")
# jq's first() short-circuits and avoids SIGPIPE under pipefail if
# duplicate markers exist (shouldn't, but a prior buggy run could
# double-post). -s slurps NDJSON; `// empty` yields no output when
# no match.
existing=$(jq -rs --arg m "$marker" \
'first(.[] | select(.body | startswith($m)) | .id) // empty' \
/tmp/comments.ndjson)
if [[ -n "$existing" ]]; then
gh api -X PATCH "/repos/$REPO/issues/comments/$existing" -f body="$body" >/dev/null
echo "Updated comment $existing for $name"
else
gh api -X POST "/repos/$REPO/issues/$PR/comments" -f body="$body" >/dev/null
echo "Created comment for $name"
fi
done

View File

@@ -42,6 +42,37 @@ plugin-name/
└── README.md # Documentation
```
## Skill-bundle plugins
When a plugin's source repository ships skills (`SKILL.md` files) without a `.claude-plugin/plugin.json` manifest, the marketplace entry can declare the skills directly using `strict: false` and an explicit `skills` array.
```json
{
"name": "example-bundle",
"description": "Brief description of the bundled skills.",
"author": { "name": "Author Name" },
"category": "development",
"source": {
"source": "git-subdir",
"url": "https://github.com/example-org/sdk.git",
"path": "packages/agent-skills",
"ref": "main",
"sha": "<commit sha>"
},
"strict": false,
"skills": [
"./skill-a",
"./skill-b",
"./skill-c"
],
"homepage": "https://github.com/example-org/sdk"
}
```
Each path in `skills` is relative to `source.path` and points at a directory containing a `SKILL.md`. Paths can reach deeper than a single level — for example, `["./libA/skill-1", "./libB/skill-2"]` exposes a curated subset across multiple library subdirectories. Each skill is registered as `<plugin-name>:<skill-name>` in Claude Code.
For the underlying schema, see [Strict mode](https://code.claude.com/docs/en/plugin-marketplaces) in the marketplace documentation.
## License
Please see each linked plugin for the relevant LICENSE file.

View File

@@ -138,7 +138,17 @@ def restore_unreviewed_stop_state(session_id, paths, baseline_sha):
def get_baseline_file_content(session_id, file_path, cwd):
"""Get the content of a file at the baseline SHA. Returns None if unavailable."""
"""Get the content of a file at the baseline SHA. Returns None if unavailable.
Decode the file content as UTF-8 with errors="replace" rather than using
text=True: source files in user repos can be latin-1 / cp1252 / shift-jis
/ etc., and on Windows text=True would decode via locale.getpreferredencoding()
in strict mode and raise UnicodeDecodeError in the subprocess reader
thread — leaving result.stdout=None and propagating AttributeError when
the caller tries to use it. Same class as the existing migrations at
security_reminder_hook.py:540 (reflog subjects) and :1115 (commit
diffs); this helper was missed in that pass. See
anthropics/claude-plugins-official#2056."""
baseline_sha = load_baseline_sha(session_id)
if not baseline_sha:
return None
@@ -151,12 +161,12 @@ def get_baseline_file_content(session_id, file_path, cwd):
return None
result = subprocess.run(
[*GIT_CMD, "show", f"{baseline_sha}:{rel_path}"],
cwd=cwd, capture_output=True, text=True, timeout=5
cwd=cwd, capture_output=True, timeout=5
)
if result.returncode == 0:
return result.stdout
return (result.stdout or b"").decode("utf-8", errors="replace")
return None
except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
except (subprocess.TimeoutExpired, FileNotFoundError, OSError, ValueError):
return None
@@ -173,11 +183,16 @@ def capture_git_baseline(cwd):
and `compute_v2_review_set` subtracts that set so pre-existing untracked
files are not reviewed as Claude-authored.
"""
# stdout is a SHA so text=True is safe on stdout, but a non-ASCII
# filename in `git stash create`'s STDERR warning (e.g. a worktree
# with `Ávila_report.txt` triggers a quotePath/locale warning) would
# trip the stderr reader thread on Windows cp1252. Decode both streams
# leniently for symmetry with _list_untracked. See #2056.
try:
# Check if HEAD exists (i.e., repo has at least one commit)
head_check = subprocess.run(
[*GIT_CMD, "rev-parse", "HEAD"],
cwd=cwd, capture_output=True, text=True, timeout=5
cwd=cwd, capture_output=True, timeout=5
)
if head_check.returncode != 0:
# No commits yet — skip review rather than creating commits in the user's repo
@@ -186,20 +201,20 @@ def capture_git_baseline(cwd):
result = subprocess.run(
[*GIT_CMD, "stash", "create"],
cwd=cwd, capture_output=True, text=True, timeout=15
cwd=cwd, capture_output=True, timeout=15
)
sha = result.stdout.strip()
sha = (result.stdout or b"").decode("utf-8", errors="replace").strip()
if sha:
return sha
# Working tree is clean — stash create returns empty. Use HEAD.
result = subprocess.run(
[*GIT_CMD, "rev-parse", "HEAD"],
cwd=cwd, capture_output=True, text=True, timeout=5
cwd=cwd, capture_output=True, timeout=5
)
sha = result.stdout.strip()
sha = (result.stdout or b"").decode("utf-8", errors="replace").strip()
return sha if sha else None
except (subprocess.TimeoutExpired, FileNotFoundError, OSError) as e:
except (subprocess.TimeoutExpired, FileNotFoundError, OSError, ValueError) as e:
debug_log(f"Failed to capture git baseline: {e}")
return None
@@ -323,19 +338,35 @@ def _list_untracked(cwd):
mtime is captured so an in-place edit during the turn is still reviewed.
Uses ls-files (not status) for the UPS path: the index diff isn't needed,
and ls-files --others only walks the worktree against .gitignore."""
and ls-files --others only walks the worktree against .gitignore.
Decodes stdout/stderr as UTF-8 with errors="replace" instead of using
text=True. With core.quotePath=false git emits raw UTF-8 bytes for
non-ASCII filenames; text=True decodes via locale.getpreferredencoding()
in strict mode — on Windows that's cp1252 with several undefined bytes
(0x81/0x8D/0x8F/0x90/0x9D), all of which appear in UTF-8 encodings of
common accented capitals (Á Í Ï Ð Ý) and most CJK/emoji codepoints.
A non-ASCII filename in the worktree crashed the subprocess reader
thread, left r.stdout=None, and propagated AttributeError out of the
helper — silently losing the baseline snapshot every UserPromptSubmit.
See anthropics/claude-plugins-official#2056. The sibling helpers in
gitutil.py already follow the lenient pattern; this function and
capture_git_baseline / _git_name_only / _git_status_porcelain were
the holdouts."""
try:
repo = _git_toplevel(cwd) or cwd
r = subprocess.run(
[*GIT_CMD, "-c", "core.quotePath=false", "ls-files",
"--others", "--exclude-standard", "-z"],
cwd=repo, capture_output=True, text=True, timeout=15,
cwd=repo, capture_output=True, timeout=15,
)
if r.returncode != 0:
debug_log(f"_list_untracked rc={r.returncode}: {r.stderr[:200]}")
stderr_str = (r.stderr or b"").decode("utf-8", errors="replace")
debug_log(f"_list_untracked rc={r.returncode}: {stderr_str[:200]}")
return {}
stdout = (r.stdout or b"").decode("utf-8", errors="replace")
out = {}
for p in r.stdout.split("\0"):
for p in stdout.split("\0"):
if not p:
continue
try:
@@ -346,7 +377,9 @@ def _list_untracked(cwd):
debug_log(f"_list_untracked: capped at {UNTRACKED_BASELINE_CAP}")
break
return out
except (subprocess.TimeoutExpired, FileNotFoundError, OSError) as e:
except (subprocess.TimeoutExpired, FileNotFoundError, OSError, ValueError) as e:
# ValueError guards against any future strict-decode regression
# so the helper degrades to {} instead of crashing the hook.
debug_log(f"_list_untracked error: {e}")
return {}

View File

@@ -259,19 +259,29 @@ def _git_reflog_recent_commits(repo_root, max_age_s=120, max_n=5):
# %gs (the reflog subject) is `commit: <commit-msg first line>` and can
# contain `|`; put it LAST so split("|", 2) leaves it intact. %H is
# hex and %ct is integer, so the first two fields are delimiter-safe.
#
# Bytes + decode utf-8/replace: %gs embeds commit-message subjects
# which git stores as raw bytes — commits can be authored in
# latin-1 / cp1252 / shift-jis etc., and text=True would raise
# UnicodeDecodeError in the subprocess reader thread on Windows
# cp1252 (subprocess.run returns r.stdout=None, then
# r.stdout.splitlines() AttributeErrors). Mirrors the existing
# migration at security_reminder_hook.py:540 — same pattern was
# missed here. See anthropics/claude-plugins-official#2056.
r = subprocess.run(
[*GIT_CMD, "log", "-g", "-n", str(max_n),
"--format=%H|%ct|%gs", "HEAD"],
cwd=repo_root, capture_output=True, text=True, timeout=5,
cwd=repo_root, capture_output=True, timeout=5,
)
except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
except (subprocess.TimeoutExpired, FileNotFoundError, OSError, ValueError):
return [], 0
if r.returncode != 0:
return [], 0
stdout = (r.stdout or b"").decode("utf-8", errors="replace")
import time as _time
now = int(_time.time())
fresh, stale = [], 0
for idx, line in enumerate(r.stdout.splitlines()):
for idx, line in enumerate(stdout.splitlines()):
parts = line.split("|", 2)
if len(parts) != 3:
continue
@@ -306,23 +316,31 @@ def _git_name_only(cwd, base, include_untracked=False):
must distinguish None (error → don't trust as a filter) from set()
(genuinely nothing changed). `-c core.quotePath=false -z` keeps non-ASCII
and space-containing paths intact."""
# Decode stdout/stderr as UTF-8 with errors="replace" instead of using
# text=True. core.quotePath=false makes git emit raw UTF-8 for non-ASCII
# paths, and text=True on Windows decodes via cp1252 strict — a non-ASCII
# changed path would crash the subprocess reader thread, leave
# result.stdout=None, and propagate AttributeError out of the helper.
# Same fix shape as diffstate._list_untracked. See #2056.
def _run(env):
result = subprocess.run(
[*GIT_CMD, "-c", "core.quotePath=false", "diff", "--name-only", "-z", base],
cwd=cwd, capture_output=True, text=True, timeout=30,
cwd=cwd, capture_output=True, timeout=30,
env=env,
)
if result.returncode != 0:
debug_log(f"_git_name_only({base!r}) rc={result.returncode}: {result.stderr[:200]}")
stderr_str = (result.stderr or b"").decode("utf-8", errors="replace")
debug_log(f"_git_name_only({base!r}) rc={result.returncode}: {stderr_str[:200]}")
return None
return {p for p in result.stdout.split("\0") if p}
stdout = (result.stdout or b"").decode("utf-8", errors="replace")
return {p for p in stdout.split("\0") if p}
try:
if not include_untracked:
return _run(None)
with _temp_index(cwd) as env:
return _run(env)
except (subprocess.TimeoutExpired, FileNotFoundError, OSError) as e:
except (subprocess.TimeoutExpired, FileNotFoundError, OSError, ValueError) as e:
debug_log(f"_git_name_only({base!r}) error: {e}")
return None
@@ -339,17 +357,22 @@ def _git_status_porcelain(cwd):
collapses to `dir/`). Required so the untracked set subtracts cleanly
against the UPS-time `_list_untracked` snapshot, which uses ls-files and
therefore always lists individual files."""
# Lenient decode: same UTF-8 + errors="replace" pattern as the
# sibling helpers — a non-ASCII path in the worktree would otherwise
# crash the cp1252 reader thread on Windows. See #2056.
try:
r = subprocess.run(
[*GIT_CMD, "-c", "core.quotePath=false", "status",
"--porcelain=v1", "-uall", "-z"],
cwd=cwd, capture_output=True, text=True, timeout=30,
cwd=cwd, capture_output=True, timeout=30,
)
if r.returncode != 0:
debug_log(f"_git_status_porcelain rc={r.returncode}: {r.stderr[:200]}")
stderr_str = (r.stderr or b"").decode("utf-8", errors="replace")
debug_log(f"_git_status_porcelain rc={r.returncode}: {stderr_str[:200]}")
return None, None
tracked, untracked = set(), set()
entries = r.stdout.split("\0")
stdout = (r.stdout or b"").decode("utf-8", errors="replace")
entries = stdout.split("\0")
i = 0
while i < len(entries):
e = entries[i]
@@ -368,7 +391,9 @@ def _git_status_porcelain(cwd):
i += 1
i += 1
return tracked, untracked
except (subprocess.TimeoutExpired, FileNotFoundError, OSError) as e:
except (subprocess.TimeoutExpired, FileNotFoundError, OSError, ValueError) as e:
# ValueError guards against any future strict-decode regression
# so the helper degrades to (None, None) instead of crashing.
debug_log(f"_git_status_porcelain error: {e}")
return None, None