Legacy code often cannot build locally by nature — CICS/IMS programs
have no local translator and the real runtime may be a mainframe the
user doesn't have. Stopping transform on a failed legacy smoke compile
would block exactly those systems.
- transform Step 0a: the target toolchain remains required (its tests
cannot run without it); a failed or impossible legacy compile no
longer stops the run — the equivalence strategy switches to recorded
traces / golden-master fixtures, and that downgrade is stated in the
plan and in TRANSFORMATION_NOTES.md so reviewers know the strength of
the proof
- preflight: a red legacy toolchain now yields Ready-with-gaps for
transform/reimagine instead of Not-ready
The previous commit round-tripped the catalog through a JSON serializer,
which escaped non-ASCII characters in every other plugin's description.
Restore the file from main and change only the code-modernization entry.
Viewer (assets/topology-viewer.html):
- inline a minified d3 subset (hierarchy/pack, zoom, selection,
interpolateZoom, ease; ISC license) instead of loading from a CDN —
the page is now fully self-contained and works on air-gapped networks
- handle duplicate node ids (unique-suffix; edges bind to the first
occurrence) and store parent references directly, fixing
level-of-detail and selection corruption with messy generated data
- share one reveal rule between drawing, edge culling, and hit-testing
so edges no longer draw into collapsed containers
- pre-bucket edges by kind and keep a per-node adjacency map; the
hover/selection pass no longer scans every edge each frame
- cancel in-flight fly-to animations when a new one starts; clamp
fly-to zoom to the zoom extent; derive max zoom from the smallest
leaf so deep estates stay reachable
- render dead-end candidates (new deadEnds field) with a dashed
outline and a sidebar badge
- clicking a node during a flow walkthrough exits the walkthrough;
search results clear on selection and Escape; surrogate-safe label
truncation; clearer stats line; explicit empty-topology message
Commands:
- new /modernize-status: read-only progress report — artifact inventory
with timestamps, staleness flags, secrets-hygiene checks, next step
- map: deadEnds in the topology schema; datastore names must be logical
identifiers with credentials stripped from URLs/DSNs
- brief: read topology.json + .mmd files (not the interactive HTML);
staleness check against inputs; effort unit aligned to person-months
- transform: secret-safe characterization-test prompt; diff -y fallback
when delta is missing; credential-safe diff selection
- reimagine: target vision is everything after the first argument (was
silently truncated to one word); masking rules in spec/scaffold/
handoff prompts
- brief/transform/reimagine: human-approval gates phrased as explicit
stop-and-wait instead of 'enter plan mode'
- preflight: delta in the tool table; brief added to the verdict list
- README: preflight/status in the workflow; legacy/ deny list also
covers Write; plugin + marketplace descriptions updated
Fixes from an adversarial review of the new viewer:
- pin d3 to 7.9.0 and load it via dynamic import with an explicit
error panel when the CDN is unreachable (previously a blocked CDN
produced a silent dark page — a real concern for restricted networks)
- coerce ids/names/loc at intake: a single missing name or non-numeric
loc previously threw inside the render loop or propagated NaN through
the pack layout, blanking the canvas with no error
- normalize flows/steps/edges defensively (null entries, missing steps,
numeric ids vs string lookups)
- mirror the level-of-detail reveal rule in the hit test so clicks
can't select nodes that aren't drawn
- scope the Escape shortcut so clearing the search box doesn't reset
the viewport; set zoom clickDistance(4) so trackpad jitter doesn't
swallow selection clicks
- round canvas backing-store size (fractional devicePixelRatio caused
a reallocation every frame on 125%/150% display scaling)
- modernize-map: use braced ${CLAUDE_PLUGIN_ROOT} so substitution
actually happens, assert the injection marker exists in the template,
and correct the documented failure mode
modernize-map previously rendered the call graph and data lineage as
static Mermaid diagrams, which become unreadable once a node has ~10+
edges — exactly the shape of real legacy systems. It now builds an
interactive viewer from a shipped template (assets/topology-viewer.html):
a zoomable circle-pack of domains/modules sized by LOC, rendered to
canvas with level-of-detail reveal, dependency edges with per-kind
toggles, search with fly-to, a per-node detail sidebar, and a flow
walkthrough mode. Small domain-level .mmd exports remain for docs.
- topology.json now has a documented schema (hierarchy + edges + entry
points + observations + flows) consumed by the viewer
- map traces 2-4 business flows anchored to personas (claimant,
operator, auditor), each step in plain business language mapped to
the modules that implement it; the viewer plays them as numbered
paths
- brief gains a Business Walkthroughs section connecting each persona
flow to the phase that replaces it
- new modernize-preflight command: detects the stack, checks analysis
tooling, smoke-compiles a real source file with the legacy toolchain,
inventories missing copybooks/descriptors/binary-only artifacts, and
writes a per-command readiness verdict
- transform now verifies legacy + target toolchains before its plan
gate instead of failing at test time
- README: commands updated, optional-tooling section reframed as 'what
to give Claude'
Refreshes the marketplace listing description for the AWS Startup Advisor
plugin at the maintainer's request.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds an additional marketplace entry named `vanta` pointing to the same
source as the existing `vanta-mcp-plugin` entry (VantaInc/vanta-mcp-plugin
at the pinned SHA `345d86b5`). No code change — same upstream the existing
entry already uses; this exposes the plugin under the developer's canonical
plugin name (`vanta`, per the repo's plugin.json) alongside the existing slug.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fixes#2159. The Stop hook emits feedback via
`hookSpecificOutput: {hookEventName: "Stop", additionalContext}`, but
`Stop`/`SubagentStop` are NOT members of CC's `hookSpecificOutput`
discriminated union (coreSchemas.ts — valid members are PreToolUse,
PostToolUse, UserPromptSubmit, SessionStart, etc.). So the emitted shape
violates CC's documented hook-output schema.
Impact is CC-version-dependent — important nuance, established empirically:
- Reporter (array0224-cloud) on CLI 2.1.150 / 2.1.152: CC rejects the
Stop feedback; the block/reason never reaches the model, so the
auto-rewake/fix loop is lost. (Detection still runs + logs.)
- On CLI 2.1.160 (current) the asyncRewake completion path is lenient:
its gate is `isSyncHookJSONOutput` (hooks.ts) which is just
`!(json.async === true)` — NOT a strict schema parse. So the invalid
hookSpecificOutput is tolerated: metrics + rewakeSummary are still
consumed and the model still receives the findings. I could NOT
reproduce the rejection on 2.1.160, and BQ confirms Stop-path
vulns_found metrics are recorded normally (~21k with-vuln fires / 3d),
i.e. NOT dropped. (An earlier draft of this message claimed metrics
were dropped — that was wrong; corrected after checking telemetry +
repro'ing the old plugin on 2.1.160.)
So this is defensive schema-correctness: the plugin should emit output
that conforms to CC's documented union regardless of how strictly a given
CC version validates it. The reporter's environment validates strictly;
relying on the current version's leniency is fragile.
Fix (CC's documented asyncRewake "clean pattern" — hooks.ts: "error text
on stderr, JSON on stdout"):
- For Stop/SubagentStop, emit_metrics writes guidance to stderr (the
asyncRewake body channel CC delivers via `stderr || stdout`) and sets
top-level `decision: "block"` + `reason` (valid SyncHookJSONOutput
fields; also the documented sync Stop-hook contract for the `-p`
fallback). It does NOT emit a Stop hookSpecificOutput.
- PostToolUse (commit-review, push-sweep) is unchanged — valid union
member, keeps the modern hookSpecificOutput protocol.
Verified locally on macOS Python 3.13:
- py_compile clean.
- 11 new tests (test_2159_stop_hook_schema.py) pin the contract: Stop
output carries no hookSpecificOutput, uses top-level decision/reason,
writes guidance to stderr; the emitted hookEventName (when present) is
a valid union member. 2 existing tests that asserted the buggy
Stop->hookSpecificOutput shape were corrected. Full suite 464/464
pass + 2 skipped.
- END-TO-END in /tmux on CLI 2.1.160:
* FIXED plugin (2.0.3): staged pickle.loads + os.system, benign edit
pulls the file into review_set; Stop LLM review found 2 critical
vulns; CC delivered a clean rewake ("Background security review
found issues" + both findings). All hooks (UPS, PostToolUse[Edit]
pattern, PostToolUse[Bash] commit-review, Stop) fired clean; zero
schema rejections / errors / http_err in the debug log.
* OLD plugin (2.0.2) on the SAME 2.1.160: also delivered Stop feedback
(confirming the no-repro-on-latest finding above) — which proves the
fix carries NO regression risk on current CC while making the output
robust for the stricter versions where it actually breaks.
Version bumped 2.0.2 -> 2.0.3 per the per-PR-bump policy (#2114: a bump is
the only way the fix reaches the existing fleet — relevant for users still
on the CC versions where this breaks).
Closes#2159.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sagemaker-ai was dropped from the marketplace in #1762 (validate-plugins
adoption) due to a manifest/YAML error. AWS has since fixed it; the plugin
validates clean at awslabs/agent-plugins@187edde (claude plugin validate passes).
Re-listed as a git-subdir source SHA-pinned to current monorepo HEAD,
matching its sibling AWS entries (deploy-on-aws, databases-on-aws).
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Both plugins in awslabs/agent-plugins had their subpaths edited in commit
187edde (after the morning bump cron pinned them to f16aaf2a), so they fell
behind again on merge. Manual catch-up bump to current monorepo HEAD.
- databases-on-aws: 4 files changed under plugins/databases-on-aws/ (v1.1.0)
- deploy-on-aws: 7 files changed under plugins/deploy-on-aws/ (v1.2.0)
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PR #2112's telemetry visibility surfaced an immediate finding from
the first 3h of v2.0.1 data: **2,406 phase=2 / err=99 sessions** —
"venv stage / uncategorized" — dominating BUILD_FAILED. The original
err_kind detection patterns were all pip-flavored (pip_no_match,
dns_fail, ssl_verify, etc.) and didn't catch venv-creation failure
modes, so they all collapsed to the catch-all _uncategorized (99)
bucket.
This PR fills the gap on two axes.
## 1. Five new venv-specific err_kind categories (codes 11-15)
Each gated on `err_phase == "venv"` so the same substring doesn't
mis-fire in pip-phase failures:
- 11 `venv_ensurepip_fail` — Debian/Ubuntu without python3-venv
installed; stderr matches "ensurepip is not available" or
"ensurepip ... returned non-zero". Predicted to be the biggest
chunk based on Linux distro market share.
- 12 `venv_path_too_long` — Windows MAX_PATH (260) or POSIX
ENAMETOOLONG. Triggered when state_dir + venv layout exceeds
the path limit (deep Lib/site-packages/<pkg>/<...> paths).
- 13 `venv_no_module` — `python3 -m venv` itself missing
("No module named 'venv'"). Rare but distinctive.
- 14 `venv_already_exists` — Errno 17 / "file exists" — sentinel
race past O_EXCL or stale dir survived `--clear`.
- 15 `venv_setup_failed` — generic "virtual environment was not
created successfully" catch-all for venv setup failures that
don't match a more specific category.
All 5 occupy reserved slots in SDK_BOOTSTRAP_ERR_CODES per the
APPEND-ONLY contract from PR #2112.
## 2. `sdk_bootstrap_stderr_sig` integer hash
For "other:<tail>" err_kinds (which encode to _uncategorized = 99),
emit a bounded integer hash (0-999) of the first ~30 chars of the
stderr tail. This restores cardinality to the _uncategorized bucket
in BQ aggregation without unbounded keyspace — same stderr message
always maps to the same bucket, so a real failure mode replicating
across thousands of machines clusters cleanly. Bounded at 1000
buckets: well below any "high cardinality" alarm but wide enough to
distinguish ~30 distinct dominant patterns (birthday-paradox
collision probability ~50% at ~37 distinct inputs).
The field auto-omits (`if sig:` gate) when err_kind is categorized
— no key-budget cost on the common-case categorized failures.
## Version bump 2.0.1 → 2.0.2
PR #2114 confirmed the version-bump mechanism is the only way to
propagate code changes to the existing fleet — without a bump, CC's
plugin updater short-circuits on string-equality of installation
version vs marketplace version. Following the policy we established:
**bump patch on every functional PR**.
By 17:31:42Z on 2026-06-01 (1m22s after #2114 merged), v2.0.1 was
already appearing in BQ. v2.0.2 should follow the same propagation
curve — ~30% adoption within 3 hours, full convergence within a few
days.
## Verified locally
- py_compile clean.
- 15 new tests in test_venv_failure_deepdive.py (added to internal
test suite at sg-staging/tests/, not in this PR):
* 5 parametrized: each new err_kind maps to its expected code (11-15).
* 1 APPEND-ONLY regression: existing codes 1-10 + 99 unchanged.
* 6 stderr_sig: non-other inputs → 0; None/empty → 0; deterministic
same-input → same-output; bounded to 0-999; distinct inputs →
distinct hashes (5/5 with P(collision) ≈ 1%); leading-chars focus
(path-varying stderr with shared 30-char prefix collide as designed).
* 1 static-shape catcher: every new `err_kind = "venv_..."` branch
in main() is guarded by `err_phase == "venv"`. Catches the
regression where someone adds a venv pattern without the phase
gate and starts mis-categorizing pip-phase failures.
* 1 map-coverage: all err_kind strings assigned anywhere in
ensure_agent_sdk.main() are present in SDK_BOOTSTRAP_ERR_CODES
(catches new categories added in code but forgotten in the map).
* 1 emit-shape: the metric block uses `_encode_stderr_sig`, the
`sdk_bootstrap_stderr_sig` key is written conditionally on `if
sig:`. Catches the regression where someone removes the
helper or makes the emit unconditional (would pad every
categorized BUILD_FAILED row with a zero-valued field).
- Full suite: 452/452 pass + 2 skipped (live API tests, opt-in).
## What this unblocks in BQ
```sql
-- For the 2,406 sessions/3h that were phase=2/err=99 on v2.0.1,
-- v2.0.2+ will split them across the new categories. Query:
SELECT
CAST(JSON_VALUE(additional_metadata, "$.sdk_bootstrap_err") AS INT64) AS err,
CAST(JSON_VALUE(additional_metadata, "$.sdk_bootstrap_stderr_sig") AS INT64) AS sig,
COUNT(*) AS sessions
FROM `proj-product-data-nhme.raw_events.claude_code_internal_event`
WHERE _PARTITIONTIME >= ...
AND CAST(JSON_VALUE(additional_metadata, "$.sdk_bootstrap") AS INT64) = 3
AND CAST(JSON_VALUE(additional_metadata, "$.sdk_bootstrap_phase") AS INT64) = 2 -- venv
GROUP BY err, sig
ORDER BY sessions DESC
```
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The nvidia-skills entry was added in PR #2088 with:
"source": {
"source": "git-subdir",
"url": "https://github.com/NVIDIA/skills.git",
"path": "plugins/nvidia-skills",
"ref": "main"
}
It's missing the required `sha` field. The marketplace validator
enforces invariant I5 ("source.sha is missing or not a 40-char hex
SHA") on every git-subdir source — without it, the action fails:
##[error]invariant I5: nvidia-skills: source.sha is missing or not
a 40-char hex SHA
This has been silently failing the "Validate Plugins" CI on every
PR that touches marketplace.json since #2088 merged on 2026-05-03.
Confirmed by checking the last 5 completed validate runs on main —
all 5 ❌, including PR #2114 (security-guidance bump that you merged
earlier today). The validator failure was getting swallowed because
all the other PR-level checks (Check MCP URLs, Scan Plugins, Validate
Plugin Licenses) were passing, and humans were `gh pr merge --admin`-ing
through it.
Fix: add the sha field pinned to the current upstream HEAD of
github.com/NVIDIA/skills.git on the `main` branch.
Resolved via: git ls-remote https://github.com/NVIDIA/skills.git refs/heads/main
SHA: 62b685a20ac45285cafd1e22782abbed33172c17
This mirrors the shape of other git-subdir entries with both `ref`
and `sha` (e.g. 42crunch-api-security-testing pins ref="v1.5.5",
sha="b404d99a...", adobe-for-creativity pins ref="main", sha="8d74ee6b...").
Unblocks every in-flight PR that touches marketplace.json — including
PR #2154 (security-guidance venv-deepdive) which is currently
red-blocked on this.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* bump: switch to per-entry PR mode (one PR per stale plugin)
Replaces the single batched bump PR with one PR per stale plugin so a
single failing plugin no longer blocks the rest. Pins to a feature
branch of the bump-plugin-shas action that adds 'pr-mode: per-entry';
re-pin to the merge commit on the action's main when that lands.
- pr-mode: per-entry → one PR per plugin on bump/<slug>
- max_bumps default lowered 130 → 30 (per-entry scans cost more)
- scan dispatch fanned out over pr-urls JSON (one per per-entry branch)
- header comments updated for per-entry semantics
* bump: re-pin to merged composite action SHA on -community main
The pr-mode: per-entry input now lives on main of the bump-plugin-shas
action (merged at e2019b2a). Update the pin and drop the now-stale
header comment that tracked the feature branch.
* bump: dispatch all three required checks per per-entry PR
Bump PRs are opened with GITHUB_TOKEN, which doesn't fire on:pull_request
(recursion guard). The per-entry cutover already dispatched scan-plugins.yml
per branch to satisfy the `scan` required check, but `check` (Check MCP URLs)
and `validate` (Validate Plugins) are also required on main and likewise
never fired — leaving every bump PR BLOCKED on missing checks (observed on
the batched #2079, which only cleared after a human-authored push re-fired
the pull_request workflows).
Fix: dispatch all three workflows per per-entry bump branch. Each runs its
job unconditionally on workflow_dispatch, so the check run lands on the
branch HEAD (= PR head) and satisfies the required check.
- validate-plugins.yml: add workflow_dispatch trigger (check-mcp-urls.yml
already had one). gh workflow run requires the trigger on the default
branch; this lands together with the per-entry bump so main stays
consistent.
- bump-plugin-shas.yml: loop the dispatch over
{scan-plugins,check-mcp-urls,validate-plugins}; tolerate a single
transient dispatch failure (warn, don't abort) so one hiccup can't
strand the rest of the batch.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* bump: fail the per-entry check-dispatch step when a dispatch fails
The dispatch step logged each failed gh workflow run as a warning and exited 0, so a transient API error or rate limit could leave a per-entry bump PR missing a required check while the bump run still showed green. The composite action skips slugs with an open PR, so the stranded PR was never retried.
Attempt every dispatch (one failure must not strand the other branches), record failures via a temp file (the while loop runs in a pipe subshell), then emit an error and exit non-zero if any dispatch failed, so the bump run goes red and the affected PR can be re-dispatched.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The 8 PRs we shipped since 2026-05-26 (#2076, #2077, #2078, #2086,
#2091, #2100, #2101, #2105) all changed plugin code without bumping
the version. CC's plugin updater uses string equality for the
freshness check (pluginOperations.ts:1835):
const isUpToDate =
installation.version === newVersion ||
installation.installPath === versionedPath ||
installation.installPath === zipPath
if (isUpToDate) return { alreadyUpToDate: true }
Users who installed v2.0.0 anywhere between 2026-05-26 and 2026-05-31
have `installation.version === "2.0.0"` in their installed_plugins.json.
The marketplace also advertises "2.0.0" (until this commit), so
isUpToDate returns true and the plugin cache directory is never
refreshed — they keep running whatever 2.0.0 code was current on the
day they installed. The marketplace git pull happens; the per-user
cache install does NOT.
Empirical evidence: in BQ today (5/31) on Windows v2.0.0 fires,
**73% emit sdk_bootstrap outcome 4 (SKIP_WIN32)** — a code path
retired in PR #2055's Windows-enable fix. Those users are running a
plugin tree that pre-dates the fix, even though their telemetry
shows pv=20000.
The fix is a one-line version bump. Once the marketplace advertises
2.0.1, every CC autoupdate cycle sees installation.version (2.0.0)
!= newVersion (2.0.1), installs the new version, and the user's next
session loads the fixed code.
This PR:
1. plugins/security-guidance/.claude-plugin/plugin.json: 2.0.0 → 2.0.1
2. .claude-plugin/marketplace.json security-guidance entry: 2.0.0 → 2.0.1
What 2.0.1 carries (versus 2.0.0 as published 5/26):
- #2076 — Graphite gt commit/push detection
- #2077 — hookSpecificOutput.additionalContext on async-rewake exit-2
- #2078 — CLAUDE_CONFIG_DIR support
- #2086 — core.quotePath=false on diff feeders (Arabic/Hebrew/CJK paths)
- #2091 — fix Bash(...|...) if-clause regression from #2076
- #2100 — drop text=True from subprocess.run, bake PYTHONUTF8=1 (Windows non-cp1252 path crash)
- #2101 — core.quotePath=false on GIT_CMD globally
- #2105 — output_format → output_config.format API migration (#2098)
Verified locally:
- plugin.json + marketplace.json both valid JSON.
- _read_plugin_version_int() returns 20001 (was 20000).
- Existing test suite passes — 408 tests, no regressions caused by
the version bump itself. (29 unrelated failures are from
test_telemetry_failure_signals.py which expects PR #2112's
not-yet-merged code.)
Going forward: bumping `patch` on every functional PR closes this
gap entirely. Without that policy, every fix only reaches NEW
installs, never the existing fleet.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fills two failure-visibility gaps in plugin telemetry.
## Gap 1: HTTP errors from _call_claude invisible
Before: a 4xx/5xx response from the LLM API caused `_call_claude` to
return None and produce ZERO fingerprint in tengu_hook_plugin_metrics.
A failed call looked identical to "no review needed". The recent
deprecation-400 outage (PR #2105, output_format → output_config.format,
#2098) was invisible in aggregate dashboards until a user manually
reported errors from their debug log. Cohort-specific or partial
outages would never show up in BQ.
Fix: add `http_err_last` (most recent status) and `http_err_count`
to the existing `_USAGE` accumulator in `_base.py`. `_usage_metrics()`
snapshots them whenever count > 0 (skip-path no-pollute contract
preserved when count == 0). All `_call_claude` error sites now call
the new `_record_http_error()` helper alongside the existing
`_last_call_claude_http_error` module-state assignment.
Now any future API failure category is queryable in BQ in real time:
SELECT
DATE(server_timestamp, "America/Los_Angeles") AS d,
CAST(JSON_VALUE(additional_metadata, "$.http_err_last") AS INT64) AS code,
COUNT(*) AS n
FROM ... WHERE event_name = "tengu_hook_plugin_metrics"
AND JSON_VALUE(additional_metadata, "$.pluginId") LIKE "%security-guidance%"
AND JSON_VALUE(additional_metadata, "$.http_err_count") IS NOT NULL
GROUP BY d, code ORDER BY d, n DESC
## Gap 2: sdk_bootstrap_phase / sdk_bootstrap_err always NULL in BQ
Before: ensure_agent_sdk.py emitted these as strings (e.g. "pip",
"dns_fail"). CC's plugin-metrics pipeline silently drops
plugin-emitted string values — only bool|finite-number plugin metrics
reach BigQuery. (CC-core fields like `subscription_type` are exempt
because they're injected downstream of plugin validation.)
Confirmed empirically: ~185K BUILD_FAILED rows in BQ over the past 2
days had `sdk_bootstrap_phase` = NULL and `sdk_bootstrap_err` = NULL
despite the Python code emitting them. ~28K BUILD_FAILED sessions/day
had no diagnostic split — flying blind on whether the failures are
pip-no-match vs dns-fail vs ssl-verify vs proxy-auth etc.
Fix: encode phase + err_kind as stable integers via
SDK_BOOTSTRAP_PHASE_CODES and SDK_BOOTSTRAP_ERR_CODES. Phase: 1=pre,
2=venv, 3=pip, 4=main. Err: 10 known categories (1-10), 11-98
reserved, 99 = uncategorized catch-all (covers "exc:<X>", "other:<X>",
and unmapped strings). APPEND-ONLY for telemetry stability.
Also corrects the misleading "CC accepts string metric values"
comment in ensure_agent_sdk.py that led to the bug originally.
Verified locally on macOS Python 3.13:
- py_compile clean.
- 32 new tests in test_telemetry_failure_signals.py (added to
internal test suite at sg-staging/tests/, not in this PR):
* 4 HTTP-error tracking unit tests: _record_http_error increments
count + tracks most-recent; handles None/invalid; -1 for
network/timeout.
* 4 _usage_metrics emission tests: empty when no activity;
successful call has no http_err fields; failure-only has http_err
and no api_calls; mixed has both.
* 1 contract test: every emitted value is bool|finite-number
(catches future regression of the string-dropping bug class).
* 13 sdk_bootstrap encoding tests (parametrized over the 10 known
err_kind categories + 5 catch-all shapes): each maps to the
right integer; unknown phase = 0; unknown err = 99.
* 1 static-shape regression catcher: every `err_kind = "..."`
string in ensure_agent_sdk.main() must be in
SDK_BOOTSTRAP_ERR_CODES (otherwise new err_kinds silently
collapse to 99).
* 2 emit-shape regression catchers: the assignments in main() go
through _encode_phase / _encode_err_kind helpers (no raw
strings); no literal string values for sdk_bootstrap_phase/err.
* 1 comment-accuracy: the misleading "CC accepts string metric
values" comment is gone.
- Full suite: 437/437 pass + 2 skipped (live API tests, opt-in).
NOT verified end-to-end against BQ — would require shipping +
observing in production for 24h to confirm the http_err and
sdk_bootstrap_phase/err fields actually appear in
tengu_hook_plugin_metrics rows. The unit tests pin the contract; if
the wire shape is broken, BQ will show NULL for the new fields and we
revisit (with the same diagnostic the BUILD_FAILED bug gave us).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fixes#2098. The Anthropic Messages API moved structured-output
schema specification from a top-level `output_format` field to a
nested `output_config.format` field, per
https://platform.claude.com/docs/en/build-with-claude/structured-outputs.
Per docs the old form "will continue working for a transition period"
— and indeed for api-key + non-streaming auth it still returns HTTP
200 (verified via live API). But OAuth Bearer users with CLI 2.1.158
hit `invalid_request_error: output_format: This field is deprecated.
Use 'output_config.format' instead.` consistently — reporter saw 462
errors in one day. The trigger appears to be auth mode + possibly
stream:true (their controlled curl bypass used Bearer + stream=true);
api-key + non-streaming was my initial repro attempt and didn't fire.
The bug only affected `_call_claude` (the legacy direct-urllib path).
The agentic `_agentic_review` path goes through claude_agent_sdk →
subprocesses to the `claude` CLI binary, which already uses the new
`output_config.format` shape correctly (per src/utils/sideQuery.ts:263
in claude-cli-internal). So this PR only needs to fix the plugin's
direct HTTP path.
This commit:
1. llm.py: rewrite the payload literal in `_call_claude` to use
`output_config: { format: { type: 'json_schema', schema: ... } }`
instead of top-level `output_format`.
2. llm.py: in the adaptive-thinking branch, MERGE `effort: "high"`
into the existing `output_config` dict instead of reassigning.
Reassignment would silently clobber the format schema set in (1).
The pre-existing code did `payload["output_config"] = {"effort":
"high"}` which was correct WHEN output_format was top-level (and
output_config wasn't otherwise used). With the migration the
existing dict carries the schema, so we extend it not replace it.
Verified locally on macOS Python 3.13:
- py_compile clean.
- Existing 401 tests still pass — 0 regression.
- 6 new tests in test_2098_output_config_format.py (added to
internal test suite at sg-staging/tests/, not in this PR):
* 2 static-shape: the `_call_claude` source no longer contains
top-level `"output_format":` AND uses `output_config`. The
adaptive-thinking branch does NOT reassign output_config (and
DOES set output_config['effort']). Catches the regression
class where a future refactor reintroduces either bug.
* 2 payload-shape unit (mocked urllib): both thinking_budget=0
and thinking_budget>0+adaptive code paths produce a payload
with the correct `output_config.format` shape AND no
`output_format` top-level. The adaptive path verifies both
`format` and `effort` coexist in output_config (i.e., the
merge fix works).
* 2 live-API gating (skip-on-no-key): the new shape returns
HTTP 200 against api.anthropic.com; the old shape's current
status is recorded for canary purposes (still 200 for
api-key today, but reporter shows it's 400 for OAuth).
- Full suite: 405/405 pass + 2 skipped (live API tests, opt-in).
- The reporter's exact deprecation 400 message reproduces if you
swap auth to OAuth Bearer + stream:true (could not test locally
without extracting the keychain OAuth token, which was out of
scope). The fix shape is API-contract-level so it doesn't depend
on which auth mode triggers the 400.
NOT verified end-to-end via OAuth-authenticated plugin invocation on
my machine (auto-mode classifier correctly declined to extract the
keychain token). Reporter's 462 production errors + the docs
migration notice + the live-API HTTP 200 on the new form are
sufficient evidence to ship.
Closes#2098.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Followup to PR #2086 (which added the flag to 4 specific git call
sites) and PR #2100 (text=True purge for #2099).
The Windows reporter for #2099 noticed more git invocations still
lacked the flag — rev-parse path queries (--show-toplevel, --git-dir,
--git-common-dir), reflog %gs subjects, and `git show <sha>:<path>`
all output paths but the per-site PR #2086 approach missed them. The
result: an Arabic-named directory shows up via _git_diff_range but
rev-parse-emitted paths get C-quoted, breaking downstream
os.path.isabs() checks.
Fix: add `-c core.quotePath=false` to GIT_CMD itself as the 4th
config-set. Every subprocess.run using the *GIT_CMD splat picks it
up automatically — diff feeders, rev-parse path queries, reflog log,
ls-files, status, git show. No more per-site flag duplication.
This commit:
1. gitutil.py: add -c core.quotePath=false to GIT_CMD.
2. Remove the now-redundant per-site flags at the 7 call sites that
previously had inline -c core.quotePath=false (cleanup, since the
global setting subsumes them):
gitutil.py: _git_diff_range, _git_name_only, _git_status_porcelain,
get_git_diff (4 sites)
diffstate.py: _list_untracked git ls-files (1 site)
security_reminder_hook.py: commit-review git diff + git show (2 sites)
Verified locally on latest main (post PR #2100 merge) with macOS
Python 3.13:
- py_compile clean on all 3 modified files.
- Bare main BEFORE my fix: 400/401 pass — 1 failure proves the gap
(test_git_cmd_contains_quotepath_false catches the missing flag).
- Main + my fix: 401/401 pass.
- 23 new tests in test_quotepath_global.py (added to internal test
suite at sg-staging/tests/, not in this PR):
* 1 GIT_CMD-level: GIT_CMD list contains core.quotePath=false as
a (-c, value) pair. Single source of truth — single place a
future PR will be caught if the flag gets dropped.
* 10 static-shape (one per hooks/*.py): every subprocess.run
uses the *GIT_CMD splat (no bare git invocation that would
bypass the global flag).
* 12 end-to-end (parametrized over Arabic, Hebrew, CJK directory
names): real git repo, _git_diff_range emits unquoted diff,
extract_file_paths_from_diff and parse_diff_into_files keep
the non-ASCII path in their output, _git_toplevel returns the
non-ASCII path intact.
- 1 staleness fix in test_diff_parser_non_ascii.py
(test_no_bare_git_diff_or_show_without_flag): updated to accept
EITHER inline core.quotePath=false OR *GIT_CMD splat (which
globally provides it).
NOT verified end-to-end on Windows with a non-ASCII repo root path.
The new global-flag test pins the contract permanently, and the
parametrized macOS tests confirm parser behavior on ASCII-control
paths in non-ASCII directories. The Windows-specific rev-parse
quoting behavior follows from the same git contract our macOS test
environment exercises (POSIX git always emits raw UTF-8 regardless
of quotePath; on Windows the flag is what makes output raw).
Closes the #2099 followup specifically about _git_diff_range /
rev-parse --show-toplevel / git log %gs paths slipping past.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
URGENT WINDOWS FIX. Sibling of #2056 / PR #2075 but covering 14 more
sites that PR #2075 missed.
The bug class: on Windows with cp1252 default encoding (typical
en-US locale), `subprocess.run(..., text=True)` decodes child stdout
AND stderr via `locale.getpreferredencoding()`. When git emits a
UTF-8 byte that's undefined in cp1252 (e.g. `0x81` from ف, present
in any path/filename/branch ref/commit message containing
Arabic/Hebrew/CJK), Python's internal `_readerthread` raises
UnicodeDecodeError. The thread crash is silent in Python 3.13+ (only
printed to stderr), but `subprocess.run` returns `stdout=None` and
the caller AttributeErrors on `.strip()`. The user sees a misleading
"WinError 267" or similar catch-all message instead of the real
decode failure.
PR #2075 fixed 6 specific helpers in `diffstate.py` / `gitutil.py`.
This commit covers the 14 survivors. Plus a defense-in-depth belt:
`PYTHONUTF8=1` exported by sg-python.sh.
This commit:
1. sg-python.sh: `export PYTHONUTF8=1` (PEP 540). No-op on
macOS/Linux (already UTF-8). On Windows, makes Python's
`locale.getpreferredencoding()` return UTF-8 instead of cp1252 —
so even if a future regression slips in text=True, the decode
succeeds. Must be set BEFORE Python starts; changing it from
inside the interpreter has no effect.
2. gitutil.py: convert 8 subprocess.run sites from
`capture_output=True, text=True` to `capture_output=True` +
manual `r.stdout.decode("utf-8", errors="replace")`:
- _git_rev_parse_head (stdout = SHA, stderr risk)
- _find_git_index (stdout = PATH, primary bug site)
- _temp_index git add (returncode only, stderr risk)
- _git_toplevel (stdout = PATH, primary bug site)
- _git_dir (stdout = PATH, primary bug site)
- _git_rev_list_range (stdout = SHAs, stderr risk)
- _detect_main_branch (stdout = ref, stderr risk)
- merge-base --is-ancestor (returncode only, stderr risk)
3. security_reminder_hook.py: convert 6 subprocess.run sites
(rev-parse @{u}/@{u}@{1}/local_ref, merge-base, HEAD lookup,
reflog SHA resolution) — same pattern.
4. security_reminder_hook.py: fix the misleading log line in
handle_user_prompt_submit. Was:
debug_log("Failed to capture git baseline (not a git repo?)")
Now includes the cwd in the message so the next reporter doesn't
waste an hour grepping for the real WinError, per reporter's
secondary finding.
Verified locally on macOS Python 3.13:
- py_compile clean on all modified files.
- bash -n sg-python.sh clean.
- sg-python.sh actually propagates PYTHONUTF8=1 to child Python
(verified via probe — sys.flags.utf8_mode=1).
- Existing 353 tests still pass — 0 regression.
- 25 new tests in test_2099_subprocess_text_true.py:
* 10 static-shape catchers (one per hooks/*.py file). Any
future PR that reintroduces text=True OR encoding= in
subprocess.run fails this check at PR time. Single source
of truth for the regression class.
* 2 sg-python.sh verifiers (literal export + actual
propagation to child Python).
* 5 macOS end-to-end against a real git repo containing
non-cp1252 content (`ف.py` filename): _git_toplevel,
_git_dir, _find_git_index, _git_rev_parse_head,
_git_rev_list_range all return clean values without
AttributeError / UnicodeDecodeError.
* 7 round-trip bytes-decode pattern verifiers (parametrized
over Arabic ف, Hebrew א, Japanese 案, raw 0x81, multiple
cp1252-undefined bytes, real-world git diff headers).
* 1 sanity check that cp1252 strict DOES raise on 0x81
(proves the test environment can catch the bug class).
- Full suite: 378/378 pass in 5.56s.
- End-to-end tmux smoke test driving real claude 2.1.145 CLI:
Made a git commit via Bash tool call. All 4 hooks fired through
the fixed plugin path:
11:28:16.730 Hook called with args: …/plugin/hooks/security_reminder_hook.py
11:28:16.734 Processing: hook_event=UserPromptSubmit
11:28:16.825 Captured git baseline: 445f7f213256
11:28:19.923 Hook called with args: …
11:28:19.923 Processing: hook_event=PostToolUse, tool=Bash
11:28:19.971 Commit review: detected git commit in command
11:28:20.020 Commit review: 1/1 sha(s) resolved, 1 files
11:28:26.415 Hook called with args: …
11:28:26.416 Processing: hook_event=Stop
11:28:26.550 Stop hook: empty review set
Confirms: PYTHONUTF8=1 export doesn't break anything; converted
helpers (_git_rev_parse_head, _git_toplevel, _git_dir,
_find_git_index) run end-to-end without issue on the happy path.
NOT verified end-to-end on Windows with actual non-cp1252 content
in path/filename/stderr. The static-shape catcher pins the
regression class permanently. Reporter's PYTHONUTF8=1 workaround
empirically proves the encoding-mode fix works for the affected
scenario; this commit just bakes it in.
Closes#2099.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
URGENT REGRESSION FIX. PR #2076 (Graphite gt workflow) gated the
PostToolUse commit/push hooks with:
"if": "Bash(git commit:*)|Bash(gt create:*)|Bash(gt modify:*)"
"if": "Bash(git push:*)|Bash(gt submit:*)"
mirroring the regex-OR idiom that `matcher` uses
("Edit|Write|MultiEdit|NotebookEdit"). But `if` is NOT a regex —
it's a SINGLE permission-rule string. The CC harness's dispatch
filter parses the entire `if` value as one rule of shape
`ToolName(rule_content)` via:
let firstParen = H.indexOf("(");
let lastParen = H.lastIndexOf(")"); // searches from END
if (lastParen !== H.length - 1) return { toolName: H };
let toolName = H.slice(0, firstParen);
let ruleContent = H.slice(firstParen + 1, lastParen);
Applied to the broken commit clause:
toolName = "Bash"
ruleContent = "git commit:*)|Bash(gt create:*)|Bash(gt modify:*"
The garbled `ruleContent` never matches any real command, so the
hook never fires — for ANY workflow, not just gt. The plugin's
deepest review layer was dead in production for all users on builds
shipping PR #2076.
Fix shape: split into separate hook entries, each with its own
well-formed single-rule `if` clause. The Python hook self-routes
commit vs push via the bash-command regexes and dedups concurrent
spawns via `_claim_bash_hook_once`, so multiple entries firing the
same script is safe.
This commit:
1. hooks.json: 5 precise entries (one per command shape) instead of
the broken |-joined 2-entry form. Restores the original commit/
push behavior bit-for-bit (`Bash(git commit:*)` + `Bash(git push:*)`
are unchanged from pre-#2076), and adds 3 separate entries for
the Graphite commands (`Bash(gt create:*)`, `Bash(gt modify:*)`,
`Bash(gt submit:*)`). No git behavior change.
The earlier draft used the broader `Bash(git *)` + `Bash(gt *)`
per the reporter's suggestion, but that has a real cost: every
`git status` / `git log` / `git diff` would spawn the Python
hook only to early-exit via the regex matcher. Precise per-command
entries avoid the spawn overhead and match the pre-#2076 cost
profile exactly.
2. security_reminder_hook.py: widen `_GIT_COMMIT_RE` to tolerate
`git -C <path>` and `git -c k=v` global options between `git`
and `commit` (mirrors `_GIT_PUSH_RE`'s long-standing tolerance).
Without this, `git -C /repo commit` is silently dropped by the
handler — reporter flagged this as the secondary finding.
Verified locally on macOS Python 3.13:
- hooks.json valid JSON, 5 `if` clauses each parses to a single
`{toolName: "Bash", ruleContent: "<command>:*"}` pair.
- py_compile security_reminder_hook.py clean.
- 9-case regex sanity: all 4 commit forms match (bare, -C path,
-c k=v, gt create/modify); 3 non-commit forms reject (status,
gt submit, gt log). Pre-fix would reject -C path form.
- 7 new tests in test_2089_if_clause_validity.py + 2 updated tests
in test_gt_graphite_workflow.py:
* 12 sanity tests for a Python parser mirroring harness's BA(H)
— pinned so a future refactor can't silently start accepting
the broken form.
* 2 hooks.json validity: every `if` clause parses as a single
valid rule; at least one if-gated hook exists.
* 1 post-fix structure: separate entries cover git AND gt.
* 2 updated gt-coverage: SOME clause covers git, SOME clause
covers gt (no longer requires both in the same |-joined
clause, which was the broken shape).
TDD-verified the test catches the bug: temporarily restored
main's broken |-joined hooks.json, ran the new test, saw
`test_every_if_clause_is_single_valid_rule` fail with a clear
error explaining #2089's cause. Restored fix, test passes.
- Full suite: 336/353 pass (17 unrelated failures from open PRs
#2078 / #2086 not in this branch).
NOT verified end-to-end with a real CC instance triggering the hooks
on a git or gt commit. The static-shape tests catch the regression
class and the regex sanity tests pin the `git -C` tolerance, but
the asyncRewake feedback loop needs runtime verification.
Closes#2089. Restores the closes for #2048 that PR #2076 attempted.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fixes anthropics/claude-plugins-official#2082 — diff feeders use git's
default quotePath setting, which C-quotes any path with a non-ASCII
byte. The downstream parsers in gitutil.parse_diff_into_files /
gitutil.extract_file_paths_from_diff match the diff header with
`re.match(r'^a/(.+?) b/(.+)$', ...)`, which only sees the raw
`a/path b/path` form. The C-quoted `"a/\303\201vila/..."` form
slips past the regex, the `continue` fires, and the file is silently
dropped from review.
Effect: a vulnerable file like `Ávila/payment.py` with
`os.system('curl ' + user_input)` never reaches the LLM reviewer.
False negative in exactly the direction the plugin exists to catch.
Sibling of #2056 / #2075: those fixed the UTF-8 decode of the
subprocess output (text=True crashed the reader thread on Windows
cp1252). This one fixes the diff-feeder commands themselves — the
name-only helpers (_git_name_only, _git_status_porcelain) already
pass core.quotePath=false for this exact reason; the diff-text
feeders were the holdouts.
Fix: add `-c core.quotePath=false` to 4 git invocations:
- gitutil._git_diff_range (push-sweep feed)
- gitutil.get_git_diff (Stop-hook feed)
- security_reminder_hook commit-review `git diff` (amend delta)
- security_reminder_hook commit-review `git show` (post-amend)
With the flag, git emits raw UTF-8 in the diff header
(`a/Ávila/payment.py`), the regex matches, and both files (the
non-ASCII vulnerable one + any ASCII control file) flow through to
review correctly.
Verified locally on macOS Python 3.13:
- py_compile clean on both files.
- Existing 45 smoke + extensibility tests still pass.
- 8 new tests in test_diff_parser_non_ascii.py (added to internal
test suite at sg-staging/tests/, not in this PR):
* 2 static-shape: gitutil._git_diff_range and get_git_diff both
contain `core.quotePath=false` in their source.
* 2 commit-review static: every subprocess.run in
handle_commit_review_posttooluse that mentions `"diff"` or
`"show"` also passes the flag. Catches the regression
class where a new diff/show call site is added without
plumbing the flag through.
* 4 end-to-end with a real git repo containing a
`Ávila/payment.py` baseline-and-edit:
- WITHOUT flag: header is C-quoted, both parsers drop the
non-ASCII file (demonstrates the bug).
- WITH flag: header is raw UTF-8, both parsers see the file.
- parse_diff_into_files (the other parse path) also keeps
the file with the flag.
- get_git_diff end-to-end produces unquoted output whose
file list includes the non-ASCII path.
- 53/53 pass total (45 existing + 8 new) in 3.41s.
NOT verified end-to-end with a real CC commit-review fire on a
non-ASCII path. The static-shape tests catch the regression and the
end-to-end git-repo tests pin parser behavior, but the actual
LLM-review-with-vuln-found path requires runtime verification against
an Anthropic-API-credentialed CC session.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fixes#1868 — when CLAUDE_CONFIG_DIR is set to a non-default location
(e.g. ~/.config/claude for XDG compliance, or a multi-tenant install
path), the plugin still wrote state files to the hardcoded ~/.claude/
path, leaving stale state and breaking CLAUDE_CONFIG_DIR's purpose.
Resolution precedence (highest first):
1. SECURITY_WARNINGS_STATE_DIR — plugin-specific override (existing)
2. CLAUDE_CONFIG_DIR/security — CC's config-dir env (new — #1868)
3. ~/.claude/security — default fallback (unchanged)
Empty-string env vars (e.g. CLAUDE_CONFIG_DIR= in a misconfigured
shell) are treated as not-set so the empty path doesn't collide with
os.path.join and silently write to /security at the filesystem root.
Implementation: a single state_dir() helper in _base.py is the source
of truth for resolution. All five modules that previously had inline
SECURITY_WARNINGS_STATE_DIR / ~/.claude/security resolutions
(_base.py, session_state.py, ensure_agent_sdk.py, llm.py, and one
site in security_reminder_hook.py) now call state_dir() instead.
Re-implementing the precedence inline risks drift — one module gets
a future fix, others don't.
The helper is called per-invocation rather than cached at import time
so test monkeypatches of the env vars take effect, and so a long-
running test or future shared-process scenario can change the env
between calls and have the next call observe the new value. The
per-call cost is negligible compared to the subprocess-spawn cost
the hooks pay every fire in production.
Three hardcoded ~/.claude/security strings remain but are NOT
functional resolutions:
- _base.py:39: the fallback BRANCH inside state_dir() itself
- ensure_agent_sdk.py:6, :11: docstring text describing default
location for users
Verified locally on macOS Python 3.13:
- py_compile clean on all 5 modified files.
- Existing 45 smoke + extensibility tests still pass.
- 14 new tests in test_claude_config_dir.py (added to internal test
suite at sg-staging/tests/, not in this PR):
* 7 resolution-semantics: default fallback, CLAUDE_CONFIG_DIR
override, SECURITY_WARNINGS_STATE_DIR beats both, tilde
expansion, empty-string handling (CLAUDE_CONFIG_DIR= must
fall back, NOT join to /security).
* 4 static-shape: each of session_state / ensure_agent_sdk /
llm / security_reminder_hook either imports state_dir from
_base OR has zero resolution patterns. Catches the
regression where someone adds a new state-file writer and
re-implements resolution inline, missing the
CLAUDE_CONFIG_DIR branch.
* 3 end-to-end: with CLAUDE_CONFIG_DIR set, get_state_file /
get_lock_file return paths under <CLAUDE_CONFIG_DIR>/security/;
save_state round-trip writes a file to the redirected path
and re-reads the same contents.
- 59/59 pass total (45 existing + 14 new) in 2.54s.
NOT verified end-to-end with a real CC instance setting
CLAUDE_CONFIG_DIR. The shape tests catch the regression class
(hardcoded ~/.claude/), and the end-to-end test pins the behavior
that user state files actually land at the redirected path.
Closes#1868.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fixes#1358, #1375, and #1783 — three related complaints about the
hook output protocol used at the three asyncRewake exit-2 sites
(handle_commit_review_posttooluse, handle_push_sweep_posttooluse,
handle_stop_hook).
The old shape at each site was:
emit_metrics({...}) # JSON to stdout (metrics)
sys.stderr.write(banner + guidance + suffix) # plain text to stderr
sys.exit(2) # asyncRewake trigger
That triggered three reported problems:
#1375: CC's hook system parsing stdout for a SyncHookJSONOutput sees
only the bare metrics dict — no findings reason — and on older
CC versions surfaces a 'json output validation failed' error
because stderr's plain text isn't valid JSON.
#1783: CC's UI shows 'Permission to use Edit has been denied' with no
permissionDecisionReason — the stderr text is invisible to that
UI surface; CC only renders fields it can find in the JSON.
#1358: Reporters experienced the exit(2) as 'gating' behavior rather
than 'warning' behavior. The pattern-warning path in main()
was migrated to exit(0) + hookSpecificOutput.additionalContext
long ago; these three asyncRewake sites were never updated.
Fix: extend emit_metrics() to accept additional_context, system_message,
and hook_event_name kwargs, and emit them in the same SyncHookJSONOutput
line as the metrics. CC's parser stops scanning stdout after the first
{-prefixed line, so the findings must ride in that same line — calling
emit_metrics twice or adding a second print(json.dumps(...)) would
silently drop the second emission.
At each of the three call sites: route the guidance text that used to
go to stderr through additional_context instead. The stderr.write is
dropped — additionalContext carries the same text to the model via the
JSON channel, and the legacy stderr surface is what triggered #1375's
JSON validation error on older CC clients.
exit(2) is preserved at all three sites. That's the documented mechanism
for triggering the asyncRewake 'force fix' feedback loop (per the
inline comment at the stop-hook site); switching to exit(0) without
verifying CC's protocol-version support risks dropping the rewake
entirely and silently losing all the findings the hook just computed.
For push-sweep specifically: emit_metrics had to move from an
unconditional pre-emission (line ~1680) to two conditional sites (one
in the no-vulns branch with exit(0), one in the with-vulns branch with
exit(2)) because the with-vulns branch needs to attach additional_context
and CC reads only the first JSON line — a second emit would be ignored.
Behavior is preserved: every push-sweep fire emits exactly one metrics
line, just at a slightly later point in the function body.
Verified locally on macOS Python 3.13:
- py_compile clean.
- Existing 45 smoke + extensibility tests still pass.
- 21 new tests in test_hook_output_protocol.py (added to internal
test suite at sg-staging/tests/, not in this PR):
* 6 backward-compat: emit_metrics with metrics only, with
rewake_summary, etc. — verifies the legacy callers still
produce the same output shape.
* 5 additional_context shape: lands in hookSpecificOutput,
round-trips the value, default hook_event_name is sensible,
empty/None doesn't pollute the JSON with an empty hSO block.
* 3 system_message shape: lands in systemMessage, empty/None
suppressed, round-trips.
* 1 combined: metrics + rewake_summary + additional_context +
system_message + hook_event_name all merge into one JSON line.
* 6 round-trip safety: emoji, quotes, backslashes, newlines,
Unicode (山田太郎 + 🎉), tabs, null bytes — all survive the
json.dumps cycle.
* 6 static-shape: each of the three asyncRewake handlers
(commit_review, push_sweep, stop_hook) is checked to confirm
it passes additional_context to emit_metrics and no longer
writes the PROVENANCE_BANNER guidance to stderr. Catches the
regression class where a new exit(2) site forgets to plumb
guidance through the JSON channel.
- 66/66 pass total (45 existing + 21 new) in 2.57s.
NOT verified end-to-end with a real CC instance triggering all three
hooks. The static-shape tests + the JSON round-trip tests should catch
any regression in the emit_metrics output, but the actual interaction
with CC's asyncRewake / rewakeMessage flow (especially: does
hookSpecificOutput.additionalContext successfully appear in the
rewakeMessage that CC sends to the model?) needs runtime verification
against a CC version that supports the modern protocol.
The reporter for #1375 specifically called out that CC's older
versions surfaced 'json output validation failed' on the old stderr-
only output; this fix changes the stdout shape to valid JSON with the
findings included, which should resolve that error class.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fixesanthropics/claude-plugins-official#2048 — teams using Graphite
for stacked PRs (`gt create` / `gt modify` / `gt submit`) never get
the commit/push agentic review because the hook matcher only catches
literal `git commit` / `git push` Bash calls. gt shells out to git
as a subprocess, but the hook fires on Claude's top-level tool call,
which is `gt create` — not the `git commit` invocation inside the
gt subprocess that Claude Code never observes.
Per-edit pattern checks and end-of-turn Stop review still fire (those
don't depend on detecting the commit command), so the silent-coverage-
gap is bounded to the deepest review layer for Graphite users. Still:
that's exactly the layer designed to catch IDOR / auth-bypass /
cross-file SSRF, so the gap matters.
Semantic mapping (per the reporter):
gt create -> commit (like `git commit`)
gt modify -> commit + amend (like `git commit --amend`)
gt submit -> push (like `git push`)
Changes:
1. hooks/hooks.json: extend two PostToolUse `if` matchers.
"Bash(git commit:*)"
-> "Bash(git commit:*)|Bash(gt create:*)|Bash(gt modify:*)"
"Bash(git push:*)"
-> "Bash(git push:*)|Bash(gt submit:*)"
Without this, the hook subprocess never spawns for gt invocations
and the Python regex changes below are dead code.
2. hooks/security_reminder_hook.py: extend three regexes that classify
the bash command line.
_GIT_COMMIT_RE: now also matches `gt create` and `gt modify`.
Used at 4 sites (handler gate, multi-commit count, prompt
detection, event classification). Compound commands like
`gt create -am a && gt submit` now correctly trigger both the
commit and push paths.
_GIT_AMEND_RE: now also matches `gt modify` (semantically an
amend). The amend code path uses reflog to find the pre-amend
SHA and diff against THAT instead of HEAD~1 — same code path
now applies to `gt modify`.
_GIT_PUSH_RE: now also matches `gt submit`. Tolerates the same
`git -C path` / `git -c k=v` global options as before for the
git form; gt has its own flag layer that doesn't conflict.
Verified locally on macOS Python 3.13:
- JSON valid (hooks.json roundtrips).
- Existing 45 smoke + extensibility tests still pass.
- 76 new tests in test_gt_graphite_workflow.py (added to internal
test suite this PR doesn't ship — kept in sg-staging tests/ until
we have a story for shipping plugin tests publicly):
* 16 parametrized commit-match: native git commit variants +
all gt create / gt modify variants from the reporter's repro.
* 11 parametrized commit-reject: gt submit, gt log, gtoolkit
(word-boundary), agt create, etc.
* 9 parametrized amend-match: git commit --amend variants +
gt modify variants + chained git+gt.
* 7 parametrized amend-reject: regular git commit, gt create,
gt submit, echo'd substring noise.
* 11 parametrized push-match: git push variants + gt submit
variants + chained.
* 12 parametrized push-reject: git commit, gt log, gt fetch,
gt down, gt restack, gh pr create, agt submit.
* 3 compound-command class tests: git+gt mixtures trigger both
paths; gt modify chained with gt submit triggers
amend + push.
* 3 commit-invocation-count tests: gt commands contribute to
the multi-commit-detection findall count.
* 2 hooks.json static config tests: read the JSON, verify the
commit and push `if` clauses include the gt cases. Catches
the easy regression where someone updates the Python regex
but forgets to widen the matcher.
- 121/121 pass total (45 existing + 76 new) in 2.50s.
NOT verified end-to-end with a real `gt` install. Reporter has the
deterministic Graphite workflow and offered to retest. The regex +
matcher widening is a clean superset — current git-only matching still
works (verified by the 45-test smoke suite that uses `git commit` /
`git push` exclusively), and the new gt cases are pure additions.
Not in this PR:
- `gt prev` / `gt next` / `gt up` / `gt down` etc. — pure
navigation, no commit / push side effect.
- `gt restack` — could in principle rewrite commits (so the
plugin's reviewed-shas cache becomes stale), but it doesn't
create reviewable new content. Out of scope.
- `gh pr create` — already explicitly NOT a separate matcher per
the existing comment in _GIT_PUSH_RE (gh invokes git push as a
child process; the bash hook only sees the top-level
`gh pr create`). Same architectural issue as gt but with a
different cost/benefit per the existing comment.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fixesanthropics/claude-plugins-official#2056 — on Windows, when the
worktree contains an untracked file whose name has a character undefined
in cp1252 (accented capitals like Á Í Ï Ð Ý, most CJK, emoji), the
UserPromptSubmit hook crashes:
Exception in thread Thread-5 (_readerthread):
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81
Traceback (most recent call last):
File diffstate.py, line 338, in _list_untracked
for p in r.stdout.split('\\0'):
AttributeError: 'NoneType' object has no attribute 'split'
Non-blocking (UPS failures still let the prompt through) but the
baseline-untracked snapshot is silently lost, so the Stop-hook review
mis-handles pre-existing untracked files.
Root cause (reporter's diagnosis, verified):
1. core.quotePath=false makes git emit raw UTF-8 for non-ASCII filenames.
2. subprocess.run(..., text=True) decodes via
locale.getpreferredencoding(False) in strict mode — on Windows that
is cp1252, in which 0x81 / 0x8D / 0x8F / 0x90 / 0x9D are undefined.
Those bytes appear in the UTF-8 encodings of Á (C3 81), Í (C3 8D),
Ï (C3 8F), Ð (C3 90), Ý (C3 9D), and a large fraction of CJK / emoji
codepoints.
3. The decode runs in the subprocess reader thread. The thread raises
UnicodeDecodeError, threading prints 'Exception in thread Thread-N',
subprocess.run returns with stdout=None. The handler then does
None.split('\\0') -> AttributeError, which is NOT in the narrow
except (TimeoutExpired, FileNotFoundError, OSError) tuple, so it
escapes the helper, propagates out of UserPromptSubmit's
ThreadPoolExecutor.result(), and exits the hook non-zero.
This is internally inconsistent: gitutil._git_diff_range,
security_reminder_hook._reflog_amend_lookup (line ~540), and the commit
diff loop (line ~1115) already do bytes + decode utf-8/replace, with
comments explicitly noting that text=True would crash. The fix below
extends that established pattern to the helpers that were holdouts.
Affected helpers (6 total):
- diffstate._list_untracked <- reporter, hot path, CRITICAL
- diffstate.capture_git_baseline <- reporter, latent
- diffstate.get_baseline_file_content <- audit, file content read, HIGH
- gitutil._git_name_only <- reporter, latent
- gitutil._git_status_porcelain <- reporter, latent
- gitutil._git_reflog_recent_commits <- audit, embeds %gs commit msg, HIGH
For each one:
- Drop text=True from subprocess.run.
- Decode r.stdout / r.stderr as .decode('utf-8', errors='replace').
- Add ValueError to the except tuple as defense against any future
strict-decode regression (UnicodeDecodeError is a ValueError
subclass; including it explicitly degrades the helper to its
empty/None return instead of escaping out of the hook).
Verified locally on macOS Python 3.13:
- py_compile clean on both files.
- 45 existing smoke + extensibility tests still pass.
- 21 new internal tests (not in this PR — added to the team's local
test suite at staging/tests/test_unicode_decode.py):
* 18 static-shape parametrized: each of the 6 fixed helpers has
no text=True in its subprocess calls, contains errors='replace',
and lists ValueError in its except.
* Deterministic end-to-end: create real git repo + Ávila_report.txt
untracked, call _list_untracked, verify it returns
{'Ávila_report.txt': <mtime>} without crashing.
* Deterministic end-to-end: same for capture_git_baseline (verifies
the latent stderr-warning case stays valid).
* Deterministic end-to-end: get_baseline_file_content on a file
whose content has 山田太郎 + 🎉; verify the bytes round-trip
through the decode.
- 66/66 tests pass total (45 existing + 21 new).
NOT verified end-to-end on Windows — would need actual cp1252 strict
decode to fire. Reporter has the deterministic repro and will
re-verify on their Win11 / Python 3.14.x setup before merge.
Not in this PR (defense-in-depth, lower risk):
- 3 git rev-parse calls returning path output (gitutil._find_git_index,
_git_toplevel, _git_dir) could fail on Windows if cwd is in a
non-ASCII install directory. Same fix shape but unreported and
much lower probability — worth a separate follow-up if anyone
actually hits it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes#410, #2037, #2045, #1640, #1280, #1329, #1341, #255,
anthropics/claude-code#46720 (partial closes on overlap with other rules).
The plugin's substring-only XSS / browser-DOM rules
(new_function_injection, react_dangerously_set_html, document_write_xss,
innerHTML_xss, outerHTML_xss, insertAdjacentHTML_xss) fired on any file
containing the trigger substring — including:
* Markdown documentation explaining XSS sinks
* Blog posts / READMEs that name browser APIs
* Python tutorials referencing dangerouslySetInnerHTML
* Plugin skill files with example HTML strings
* .yaml / .json configs that happen to contain the literal string
* .gitignore / Dockerfile / Makefile
These constructs have no meaning outside JS/TS source. Add a
path_filter: lambda p: p.endswith(_JS_EXTS) to each so they fire only
on .js, .jsx, .ts, .tsx, .mjs, .cjs, .mts, .cts, .vue, .svelte.
Cross-checked against the existing _JS_EXTS-gated rules
(regex_exec_substring, child_process_exec, exec_substring) — same
pattern, same constant, same intent. Uses the module-level _JS_EXTS
tuple so future extension changes propagate to all 6 rules atomically.
Verified locally on macOS Python 3.13:
- py_compile clean.
- 45-test existing smoke + extensibility suite still passes.
- 151 new parametrized tests in test_xss_gate.py (added to internal
test suite this PR doesn't ship): each gated rule x every
JS-family extension accepts, x every non-JS path (.md / .py /
.yaml / .json / .txt / .html / Dockerfile / Makefile / .gitignore
/ .sh / .go / .rs / .rb) rejects. 196 tests pass total.
Doesn't address everything in the false-positive cluster — issues that
require Python-rule gating (#1114 .env.schema exec), tighter substring
scoping (#660 pickle in usernames), or hook-protocol changes (#1358
exit-2 vs warning, #1375 plain-text-vs-JSON output) need separate PRs.
This PR covers the JS-substring subset cleanly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
run:gh workflow run scan-plugins.yml --ref bump/plugin-shas
PR_URLS:${{ steps.bump.outputs.pr-urls }}
run:|
set -euo pipefail
dispatch_failures="$(mktemp)"
jq -c '.[]' <<<"$PR_URLS" | while read -r entry; do
branch=$(jq -r '.branch' <<<"$entry")
name=$(jq -r '.name' <<<"$entry")
for wf in scan-plugins check-mcp-urls validate-plugins; do
echo "Dispatching ${wf}.yml against $branch ($name)"
if ! gh workflow run "${wf}.yml" --ref "$branch"; then
echo "::error::Failed to dispatch ${wf}.yml against $branch ($name) — required check will be missing; re-dispatch with: gh workflow run ${wf}.yml --ref $branch"
echo "${wf} ${branch}" >> "$dispatch_failures"
fi
done
done
if [ -s "$dispatch_failures" ]; then
echo "::error::$(wc -l < "$dispatch_failures" | tr -d ' ') required-check dispatch(es) failed; the affected bump PR(s) are blocked until re-dispatched (see annotations above)."
@@ -7,7 +7,7 @@ A structured workflow and set of specialist agents for modernizing legacy codeba
Legacy modernization fails most often not because the target technology is wrong, but because teams skip steps: they transform code before understanding it, reimagine architecture before extracting business rules, or ship without a harness that would catch behavior drift. This plugin enforces a sequence:
The discovery commands (`assess`, `map`, `extract-rules`) build artifacts under `analysis/<system>/`. The `brief` command synthesizes them into an approval gate. The build commands (`reimagine`, `transform`) write new code under `modernized/`. The `harden` command audits the legacy system and produces a reviewable remediation patch. Each step has a dedicated slash command, and specialist agents (legacy analyst, business rules extractor, architecture critic, security auditor, test engineer) are invoked from within those commands — or directly — to keep the work honest.
@@ -20,25 +20,36 @@ Commands take a `<system-dir>` argument and assume the system being modernized l
`/modernize-assess` works best with [`scc`](https://github.com/boyter/scc) (LOC + complexity + COCOMO) or [`cloc`](https://github.com/AlDanial/cloc), and falls back to `find`/`wc` if neither is installed. Portfolio mode also benefits from [`lizard`](https://github.com/terryyin/lizard) (cyclomatic complexity). The commands degrade gracefully without them, but the metrics will be coarser.
The commands degrade gracefully, but each of these makes the output meaningfully better — run `/modernize-preflight <system-dir>` to check all of them at once and get a readiness report:
- **Analysis tools**: [`scc`](https://github.com/boyter/scc) (LOC + complexity + COCOMO) or [`cloc`](https://github.com/AlDanial/cloc); [`lizard`](https://github.com/terryyin/lizard) for portfolio mode. Without them, metrics fall back to `find`/`wc` and get coarser.
- **A working build toolchain** for the legacy stack (e.g. GnuCOBOL for COBOL) — required before `/modernize-transform` can prove behavioral equivalence, and verified by preflight with a real smoke compile against your code.
- **The whole system in the tree**: deployment descriptors (JCL, CICS definitions, route configs), copybooks/includes, and DDL/schemas. Entry-point detection and data lineage in `/modernize-map` are guesswork without them.
- **Production telemetry** (optional): an observability MCP server or batch job logs enable the runtime overlay in `/modernize-assess` and timing annotations on critical paths.
## Commands
The commands are designed to be run in order, but each produces a standalone artifact so you can stop, review, and resume.
Environment readiness check, meant to run first: detects the legacy stack, checks analysis tooling, **smoke-compiles a real source file** with the legacy toolchain (the errors this surfaces — missing copybooks, wrong dialect flags — are the ones that otherwise appear mid-transform), inventories missing includes / deployment descriptors / binary-only artifacts, and probes for telemetry. Produces `analysis/<system>/PREFLIGHT.md` with a per-command Ready / Ready-with-gaps / Not-ready verdict.
### `/modernize-assess <system-dir>` — or — `/modernize-assess --portfolio <parent-dir>`
Inventory the legacy codebase: languages, line counts, complexity, build system, integrations, technical debt, security posture, documentation gaps, and a COCOMO-derived effort estimate. Produces `analysis/<system>/ASSESSMENT.md` and `analysis/<system>/ARCHITECTURE.mmd`. Spawns `legacy-analyst` (×2) and `security-auditor` in parallel for deep reads. With `--portfolio`, sweeps every subdirectory of a parent directory and writes a sequencing heat-map to `analysis/portfolio.html`.
### `/modernize-map <system-dir>`
Build a dependency and topology map of the **legacy** system: program/module call graph, data lineage (programs ↔ data stores), entry points, dead-end candidates, and one traced critical-path business flow. Writes a re-runnable extraction script and produces `analysis/<system>/topology.json` (machine-readable), `analysis/<system>/TOPOLOGY.html` (rendered Mermaid + architect observations), and standalone `call-graph.mmd`, `data-lineage.mmd`, and `critical-path.mmd`.

Build a dependency and topology map of the **legacy** system: program/module call graph, data lineage (programs ↔ data stores), entry points, dead-end candidates, and 2–4 traced business flows each anchored to a persona (the claimant, the operator, the auditor — not the maintainer). Writes a re-runnable extraction script and produces `analysis/<system>/topology.json` plus `analysis/<system>/TOPOLOGY.html` — an **interactive zoomable map** (circle-pack of domains/modules sized by LOC, dependency edges with per-kind toggles, search, click-for-details sidebar, and a walkthrough mode that plays each persona flow as a numbered path with a plain-language narrative). Built from a template shipped with the plugin, so it works on systems far too dense for a static diagram. Small domain-level `call-graph.mmd`, `data-lineage.mmd`, and `critical-path.mmd` are still exported for docs and PRs.
Mine the business rules embedded in the legacy code — calculations, validations, eligibility, state transitions, policies — into Given/When/Then "Rule Cards" with `file:line` citations and confidence ratings. Spawns three `business-rules-extractor` agents in parallel (calculations, validations, lifecycle). Produces `analysis/<system>/BUSINESS_RULES.md` and `analysis/<system>/DATA_OBJECTS.md`.
Synthesize the discovery artifacts into a phased **Modernization Brief** — the single document a steering committee approves and engineering executes: target architecture, strangler-fig phase plan with entry/exit criteria, behavior contract, validation strategy, open questions, and an approval block. Reads `ASSESSMENT.md`, `TOPOLOGY.html`, and `BUSINESS_RULES.md` and **stops if any are missing** — run the discovery commands first. Produces `analysis/<system>/MODERNIZATION_BRIEF.md` and enters plan mode as a human-in-the-loop gate.
Synthesize the discovery artifacts into a phased **Modernization Brief** — the single document a steering committee approves and engineering executes: target architecture, strangler-fig phase plan with entry/exit criteria, persona-based business walkthroughs (the section non-technical approvers actually read), behavior contract, validation strategy, open questions, and an approval block. Reads `ASSESSMENT.md`, `TOPOLOGY.html`, and `BUSINESS_RULES.md` and **stops if any are missing** — run the discovery commands first. Produces `analysis/<system>/MODERNIZATION_BRIEF.md` and enters plan mode as a human-in-the-loop gate.
Greenfield rebuild from extracted intent rather than a structural port. Mines a spec (`analysis/<system>/AI_NATIVE_SPEC.md`), designs a target architecture and has it adversarially reviewed (`analysis/<system>/REIMAGINED_ARCHITECTURE.md`), then **scaffolds services with executable acceptance tests** under `modernized/<system>-reimagined/` and writes a `CLAUDE.md` knowledge handoff for the new system. Two human-in-the-loop checkpoints. Spawns `business-rules-extractor`, `legacy-analyst` (×2), `architecture-critic`, and general-purpose scaffolding agents.
@@ -46,6 +57,9 @@ Greenfield rebuild from extracted intent rather than a structural port. Mines a
Surgical, single-module strangler-fig rewrite. Plans first (HITL gate), then writes characterization tests via `test-engineer`, then an idiomatic target implementation under `modernized/<system>/<module>/`, proves equivalence by running the tests, and produces `TRANSFORMATION_NOTES.md` mapping legacy → modern with deliberate deviations called out. Reviewed by `architecture-critic`.
### `/modernize-status <system-dir>`
Read-only progress report: artifact inventory with timestamps per workflow stage, staleness flags (e.g. a brief older than the assessment it was built from), secrets-hygiene checks (quarantine file gitignored and never committed), and the single most useful next command. Run it anytime you come back to a modernization after a break.
### `/modernize-harden <system-dir>`
Security hardening pass on the **legacy** system: OWASP/CWE scan, dependency CVEs, secrets, injection. Spawns `security-auditor`. Produces `analysis/<system>/SECURITY_FINDINGS.md` ranked Critical / High / Medium / Low and a reviewed `analysis/<system>/security_remediation.patch` with minimal fixes for the Critical/High findings. The patch is reviewed by a second `security-auditor` pass before you see it. **Never edits `legacy/`** — you review and apply the patch yourself when ready, then re-run to verify. Useful as a pre-modernization step when the legacy system will keep running in production during the migration.
@@ -81,17 +95,21 @@ This plugin ships commands and agents, but modernization projects benefit from a
"Edit(modernized/**)"
],
"deny":[
"Edit(legacy/**)"
"Edit(legacy/**)",
"Write(legacy/**)"
]
}
}
```
Adjust `legacy/` and `modernized/` to match your actual layout. The key invariants: `Edit` under `legacy/`is denied, and writes are scoped to `analysis/` (for documents) and `modernized/` (for the new code). Every command in this plugin respects this — `/modernize-harden` writes a patch to `analysis/` rather than editing `legacy/` in place.
Adjust `legacy/` and `modernized/` to match your actual layout. The key invariants: `Edit`/`Write` under `legacy/`are denied, and writes are scoped to `analysis/` (for documents) and `modernized/` (for the new code). Note this guards the file tools — shell commands that mutate files (`sed -i`, `git apply`) still go through the normal Bash permission prompt, so review those prompts with the same invariant in mind. Every command in this plugin respects this — `/modernize-harden` writes a patch to `analysis/` rather than editing `legacy/` in place.
## Typical Workflow
```bash
# 0. Check the environment is ready (tools, toolchain, source completeness)
/modernize-preflight billing
# 1. Inventory the legacy system (or sweep a portfolio of them)
/modernize-assess billing
@@ -112,6 +130,9 @@ Adjust `legacy/` and `modernized/` to match your actual layout. The key invarian
# 6. Security-harden the legacy system that's still in production
"description":"Security review for Claude-generated code. Pattern-based warnings on edits, LLM-powered diff review on Stop, and an agentic commit reviewer that catches injection, XSS, SSRF, hardcoded secrets, and 25+ other vulnerability classes.",
"rewakeMessage":"Background security review of pushed commits not yet reviewed — address or acknowledge the findings below, then continue with the user's original request or continue waiting for their reply:",
"rewakeSummary":"Push security review found issues"
"rewakeMessage":"Background security review of commit — address or acknowledge the findings below, then continue with the user's original request or continue waiting for their reply:",
"rewakeSummary":"Commit security review found issues"
"rewakeMessage":"Background security review of commit — address or acknowledge the findings below, then continue with the user's original request or continue waiting for their reply:",
"rewakeSummary":"Commit security review found issues"
"rewakeMessage":"Background security review of pushed commits not yet reviewed — address or acknowledge the findings below, then continue with the user's original request or continue waiting for their reply:",
"rewakeSummary":"Push security review found issues"
@@ -94,6 +94,9 @@ Only use exec() if you absolutely need shell features and the input is guarantee
},
{
"ruleName":"new_function_injection",
# JS-only construct: gate to JS/TS files so docs/.md and other prose
# mentioning "new Function" don't trip the warning.
"path_filter":lambdap:p.endswith(_JS_EXTS),
"substrings":["new Function"],
"reminder":"\u26a0\ufe0f Security Warning: Using new Function() with string interpolation is a CODE INJECTION vulnerability. If any variable is concatenated or interpolated into the function body string, an attacker controlling that variable can execute arbitrary code. Use safe alternatives: for property access use obj[key] or array.reduce((o, k) => o[k], root); for computation use a safe expression parser. NEVER interpolate untrusted strings into new Function() bodies.",
},
@@ -107,16 +110,24 @@ Only use exec() if you absolutely need shell features and the input is guarantee
"reminder":"⚠️ Security Warning: dangerouslySetInnerHTML can lead to XSS vulnerabilities if used with untrusted content. Ensure all content is properly sanitized using an HTML sanitizer library like DOMPurify, or use safe alternatives.",
},
{
"ruleName":"document_write_xss",
# Browser DOM API: only meaningful in JS/TS source.
"path_filter":lambdap:p.endswith(_JS_EXTS),
"substrings":["document.write"],
"reminder":"⚠️ Security Warning: document.write() can be exploited for XSS attacks and has performance issues. Use DOM manipulation methods like createElement() and appendChild() instead.",
},
{
"ruleName":"innerHTML_xss",
# Browser DOM API: only meaningful in JS/TS source. Closes FPs like
# docs/example HTML, playground/self-contained skills that hardcode
# innerHTML strings with zero user input (#410).
"path_filter":lambdap:p.endswith(_JS_EXTS),
"substrings":[".innerHTML =",".innerHTML="],
"reminder":"⚠️ Security Warning: Setting innerHTML with untrusted content can lead to XSS vulnerabilities. Use textContent for plain text or safe DOM methods for HTML content. If you need HTML support, consider using an HTML sanitizer library such as DOMPurify.",
},
@@ -217,11 +228,15 @@ Additionally, validate user inputs:
},
{
"ruleName":"outerHTML_xss",
# Browser DOM API: only meaningful in JS/TS source.
"path_filter":lambdap:p.endswith(_JS_EXTS),
"substrings":[".outerHTML =",".outerHTML="],
"reminder":"⚠️ Security Warning: Use textContent or sanitize with DOMPurify. outerHTML assignment is an XSS sink equivalent to innerHTML.",
},
{
"ruleName":"insertAdjacentHTML_xss",
# Browser DOM API: only meaningful in JS/TS source.
"path_filter":lambdap:p.endswith(_JS_EXTS),
"substrings":[".insertAdjacentHTML("],
"reminder":"⚠️ Security Warning: Use insertAdjacentText() or sanitize with DOMPurify. insertAdjacentHTML is an XSS sink.",
# Force UTF-8 for ALL Python filesystem + IO operations (PEP 540).
# Without this, Windows Python defaults `locale.getpreferredencoding()` to
# cp1252 — which makes `text=True` in subprocess.run / open() / json.load
# crash the internal reader thread on any byte that's undefined in cp1252
# (e.g. the 0x81 byte from ف, present in any path/filename with
# Arabic/Hebrew/CJK characters). See #2056, #2099.
#
# No-op on macOS/Linux (already UTF-8). Must be set BEFORE Python starts —
# changing it from inside the interpreter has no effect.
exportPYTHONUTF8=1
# Git Bash / MSYS on Windows hands script paths to this shim in POSIX form
# (`/c/Users/...`). When we exec a Windows `python.exe` (which we do on
# Windows since `python3` is the Microsoft Store stub), python interprets the
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.