2026-03-19 23:17:36 +00:00
|
|
|
|
# math-olympiad
|
|
|
|
|
|
|
|
|
|
|
|
Competition math solver with adversarial verification.
|
|
|
|
|
|
|
|
|
|
|
|
## The problem
|
|
|
|
|
|
|
2026-03-30 19:22:03 +00:00
|
|
|
|
Self-verification gets fooled. A verifier that sees the reasoning is biased
|
|
|
|
|
|
toward agreement. arXiv:2503.21934 ("Proof or Bluff") showed 85.7% self-verified
|
|
|
|
|
|
IMO success drops to <5% under human grading.
|
2026-03-19 23:17:36 +00:00
|
|
|
|
|
|
|
|
|
|
## The approach
|
|
|
|
|
|
|
2026-03-30 19:22:03 +00:00
|
|
|
|
- **Context-isolated verification**: verifier sees only the clean proof, never
|
|
|
|
|
|
the reasoning trace
|
|
|
|
|
|
- **Pattern-armed adversarial checks**: not "is this correct?" but "does this
|
|
|
|
|
|
accidentally prove RH?" / "extract the general lemma, find a 2×2
|
|
|
|
|
|
counterexample"
|
2026-03-19 23:17:36 +00:00
|
|
|
|
- **Calibrated abstention**: says "no confident solution" rather than bluff
|
|
|
|
|
|
- **Presentation pass**: produces clean LaTeX/PDF after verification passes
|
|
|
|
|
|
|
|
|
|
|
|
## Validation
|
|
|
|
|
|
|
2026-03-30 19:22:03 +00:00
|
|
|
|
17/18 IMO+Putnam 2025 problems solved, 0 false positives, 2 novel proofs found.
|
|
|
|
|
|
See the skill's eval data in the
|
|
|
|
|
|
[anthropic monorepo](https://github.com/anthropics/anthropic/tree/staging/sandbox/sandbox/ralph/math_skills/eval_harness).
|
2026-03-19 23:17:36 +00:00
|
|
|
|
|
|
|
|
|
|
## Install
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
/plugin install math-olympiad@claude-plugins-official
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## Use
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
> Solve this IMO problem: [statement]
|
|
|
|
|
|
```
|
|
|
|
|
|
|
2026-03-30 19:22:03 +00:00
|
|
|
|
The skill auto-triggers on "IMO", "Putnam", "olympiad", "verify this proof",
|
|
|
|
|
|
etc.
|