mirror of
https://github.com/anthropics/claude-plugins-official.git
synced 2026-05-21 22:42:40 +00:00
math-olympiad: add LICENSE, marketplace entry, and prettier formatting
- Add Apache 2.0 LICENSE file
- Register plugin in marketplace.json
- Run prettier (prose-wrap=always, 80 cols) over all plugin markdown
- Simplify model tier naming in reference docs
🏠 Remote-Dev: homespace
This commit is contained in:
@@ -4,18 +4,25 @@ Competition math solver with adversarial verification.
|
||||
|
||||
## The problem
|
||||
|
||||
Self-verification gets fooled. A verifier that sees the reasoning is biased toward agreement. arXiv:2503.21934 ("Proof or Bluff") showed 85.7% self-verified IMO success drops to <5% under human grading.
|
||||
Self-verification gets fooled. A verifier that sees the reasoning is biased
|
||||
toward agreement. arXiv:2503.21934 ("Proof or Bluff") showed 85.7% self-verified
|
||||
IMO success drops to <5% under human grading.
|
||||
|
||||
## The approach
|
||||
|
||||
- **Context-isolated verification**: verifier sees only the clean proof, never the reasoning trace
|
||||
- **Pattern-armed adversarial checks**: not "is this correct?" but "does this accidentally prove RH?" / "extract the general lemma, find a 2×2 counterexample"
|
||||
- **Context-isolated verification**: verifier sees only the clean proof, never
|
||||
the reasoning trace
|
||||
- **Pattern-armed adversarial checks**: not "is this correct?" but "does this
|
||||
accidentally prove RH?" / "extract the general lemma, find a 2×2
|
||||
counterexample"
|
||||
- **Calibrated abstention**: says "no confident solution" rather than bluff
|
||||
- **Presentation pass**: produces clean LaTeX/PDF after verification passes
|
||||
|
||||
## Validation
|
||||
|
||||
17/18 IMO+Putnam 2025 problems solved, 0 false positives, 2 novel proofs found. See the skill's eval data in the [anthropic monorepo](https://github.com/anthropics/anthropic/tree/staging/sandbox/sandbox/ralph/math_skills/eval_harness).
|
||||
17/18 IMO+Putnam 2025 problems solved, 0 false positives, 2 novel proofs found.
|
||||
See the skill's eval data in the
|
||||
[anthropic monorepo](https://github.com/anthropics/anthropic/tree/staging/sandbox/sandbox/ralph/math_skills/eval_harness).
|
||||
|
||||
## Install
|
||||
|
||||
@@ -29,4 +36,5 @@ Self-verification gets fooled. A verifier that sees the reasoning is biased towa
|
||||
> Solve this IMO problem: [statement]
|
||||
```
|
||||
|
||||
The skill auto-triggers on "IMO", "Putnam", "olympiad", "verify this proof", etc.
|
||||
The skill auto-triggers on "IMO", "Putnam", "olympiad", "verify this proof",
|
||||
etc.
|
||||
|
||||
Reference in New Issue
Block a user