Harden code-modernization plugin from a real CardDemo dry run

Fixes found by running the discovery workflow against the AWS CardDemo
mainframe sample (~50 KLOC of COBOL/CICS/JCL/BMS/VSAM):

- modernize-assess: add scc -> cloc -> find/wc fallback chain with the
  COCOMO-II formula so Step 1 works when scc isn't installed; same for
  portfolio-mode cloc/lizard. Drop the reference to a specific
  agent-spawning tool name (just "in parallel"). Sharpen the structural-
  map subagent prompt: 5-12 domains, subgraph clustering, ~40-edge cap,
  repo-relative paths, dangling-reference check.
- modernize-map: expand the parse-target list with the things a
  literal-minded reader would miss on a real mainframe codebase — CICS
  CSD DEFINE TRANSACTION/FILE for entry points and online file I/O,
  EXEC CICS file ops, SELECT...ASSIGN TO joined with JCL DD,
  EXEC SQL table refs (not JCL DD), SEND/RECEIVE MAP, dynamic
  data-name XCTL resolution, COBOL fixed-format column slicing. Without
  these the dead-code list is wrong (most CICS programs look unreachable).
  Also write a machine-readable topology.json alongside the summary.
- modernize-extract-rules: add a Priority (P0/P1/P2) field with a
  heuristic, and an optional Suspected-defect field. modernize-brief
  reads P0 rules to build the behavior contract, but the Rule Card had
  no priority slot — the chain was broken.
- modernize-brief: read the new P0 tags; flag low-confidence P0 rules as
  SME blockers.
- modernize-reimagine: drop "for the demo" wording.
- security-auditor agent: add mainframe/COBOL coverage items (RACF,
  JCL/PROC creds, BMS field validation, DB2 dynamic SQL, copybook PII)
  and mark web-only items as such so it adapts to the target stack.
- README: add Optional Tooling section and a symlink example for the
  expected layout.
This commit is contained in:
Morgan Lunt
2026-05-11 16:28:27 -07:00
parent 718818146e
commit 22a1b25977
7 changed files with 102 additions and 31 deletions

View File

@@ -11,19 +11,44 @@ connect? This is the map an engineer needs before touching anything.
## What to produce
Write a one-off analysis script (Python or shell — your choice) that parses
the source under `legacy/$1` and extracts:
the source under `legacy/$1` and extracts the four datasets below. Cover
the parse targets that are real for the stack you're looking at — these are
the ones LLMs reliably miss:
- **Program/module call graph** — who calls whom (for COBOL: `CALL` statements
and CICS `LINK`/`XCTL`; for Java: class-level imports/invocations; for Node:
`require`/`import`)
- **Data dependency graph** — which programs read/write which data stores
(COBOL: copybooks + VSAM/DB2 in JCL DD statements; Java: JPA entities/tables;
Node: model files)
- **Entry points** — batch jobs, transaction IDs, HTTP routes, CLI commands
- **Dead-end candidates** — modules with no inbound edges (potential dead code)
- **Program/module call graph** — who calls whom.
- COBOL/CICS: `CALL '...'` and `EXEC CICS LINK/XCTL PROGRAM(...)`. Most
`PROGRAM(...)` targets are **data-names, not literals** — resolve them
against working-storage `VALUE` clauses and any menu/route copybooks
before declaring an edge unresolvable.
- Java: class-level imports/invocations. Node: `require`/`import`.
- **Data dependency graph** — which programs read/write which data stores.
- COBOL batch: `SELECT ... ASSIGN TO <ddname>` joined with JCL `DD`
statements (this is the *only* way to attribute file I/O to a program).
- COBOL/CICS online: `EXEC CICS READ/WRITE/REWRITE/DELETE/STARTBR/READNEXT/
READPREV ... FILE(...)` joined with `DEFINE FILE` in the CSD.
- DB2: `EXEC SQL ... END-EXEC` table references — *not* JCL DD; DB2 access
is via plan/package binds.
- BMS: `SEND MAP`/`RECEIVE MAP` ↔ map source under `bms/` and copybooks
under `cpy-bms/` (or wherever the maps live).
- Java: JPA/MyBatis entities & tables. Node: model files.
- **Entry points** — whatever the stack's outermost invokers are. Mainframe:
JCL `EXEC PGM=` steps **and** CICS `DEFINE TRANSACTION ... PROGRAM(...)`
from the CSD — without the CSD, every online program looks unreachable.
Web: HTTP routes. CLI: argv parsing.
- **Dead-end candidates** — modules with no inbound edges. **Only trust this
once the entry-point and call-edge types above are all in the graph**, and
suppress the dead claim for any module that could be the target of an
unresolved dynamic call. A naive grep-only graph will mark most CICS
programs dead.
For COBOL fixed-format, slice columns 8-72 and skip `*` indicator lines
(column 7) before regex matching, or you'll match sequence numbers and
commented-out code.
Save the script as `analysis/$1/extract_topology.py` (or `.sh`) so it can be
re-run and audited. Run it. Show the raw output.
re-run and audited. Have it write a machine-readable
`analysis/$1/topology.json` and print a human summary. Run it; show the
summary (cap at ~200 lines for very large estates).
## Render