Compare commits

..

11 Commits

Author SHA1 Message Date
Jesse Vincent
b9e16498b9 Release v4.0.3: Strengthen using-superpowers for explicit skill requests 2025-12-26 22:55:32 -06:00
Jesse Vincent
f6d50c74b2 Bump version to 4.0.3 2025-12-26 22:53:58 -06:00
Jesse Vincent
3dac35e0b3 Strengthen using-superpowers for explicit skill requests
Addresses failure mode where Claude skips skill invocation even when
user explicitly requests it by name (e.g., "subagent-driven-development,
please").

Skill changes:
- "Check for skills" → "Invoke relevant or requested skills"
- "BEFORE ANY RESPONSE" → "BEFORE any response or action"
- Added reassurance that wrong skill invocations are okay
- New red flag: "I know what that means"

Tests:
- New test suite for explicit skill requests
- Single-turn and multi-turn scenarios with --continue
- Tests with haiku model and user CLAUDE.md
2025-12-26 22:41:22 -06:00
Jesse Vincent
131c1f189f Release v4.0.2: Make slash commands user-only
- Added disable-model-invocation to /brainstorm, /execute-plan, /write-plan
- Commands now restricted to manual user invocation only
- Underlying skills remain available for autonomous invocation
2025-12-23 23:03:31 -08:00
Jesse Vincent
9baedaa117 Make slash commands user-only with disable-model-invocation
Added disable-model-invocation: true to /brainstorm, /execute-plan, and
/write-plan commands. Claude can still invoke the underlying skills
directly, but the slash commands are now restricted to manual user
invocation only.
2025-12-23 23:03:19 -08:00
Jesse Vincent
66a2dbd80a Add automation-over-documentation guidance to writing-skills
Mechanical constraints should be automated, not documented—save skills
for judgment calls.

Based on insight from @EthanJStark in PR #146.
2025-12-23 23:03:19 -08:00
Jesse Vincent
1455ac0631 Add GitHub thread reply guidance to receiving-code-review
When replying to inline review comments, use the thread API rather than
posting top-level PR comments.

Based on feedback from @ralphbean in PR #79.
2025-12-23 23:03:19 -08:00
egornomic
e64ad670df fix: inherit agent model (#144) 2025-12-23 21:46:15 -08:00
Mike Harrison
c037dcbf4b fix: use git check-ignore for worktree gitignore verification (#160)
* fix: use git check-ignore for worktree gitignore verification

The using-git-worktrees skill previously used grep to check only the
local .gitignore file, missing patterns in global gitignore configurations
(core.excludesfile). This caused unnecessary modifications to local
.gitignore when the directory was already globally ignored.

Changed verification from grep to git check-ignore, which respects Git's
full ignore hierarchy (local, global, and system gitignore files).

Fixes obra/superpowers#101

Tested with: Subagent pressure scenarios verifying correct behavior with
global gitignore configuration. Baseline test confirmed the bug, post-fix
test confirmed correct behavior.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* style: convert bold emphasis to headings in Common Mistakes section

Convert **Title** patterns to ### Title headings for markdown lint
compliance (MD036 - no emphasis as heading).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-23 11:26:33 -08:00
Jesse Vincent
a7a8c08c02 Clarify that Skill tool loads skill content directly
Fixes pattern where Claude would invoke a skill then try to Read the
file separately. The Skill tool already provides the content.

- Add "How to Access Skills" section to using-superpowers
- Change "read the skill" → "invoke the skill"
- Commands now use fully qualified names (superpowers:brainstorming etc)
2025-12-22 14:27:35 -08:00
Jesse Vincent
80643c2604 Update README for v4.0.0 skill consolidations and two-stage review 2025-12-17 18:07:40 -08:00
27 changed files with 850 additions and 31 deletions

View File

@@ -9,7 +9,7 @@
{
"name": "superpowers",
"description": "Core skills library for Claude Code: TDD, debugging, collaboration patterns, and proven techniques",
"version": "4.0.0",
"version": "4.0.3",
"source": "./",
"author": {
"name": "Jesse Vincent",

View File

@@ -1,7 +1,7 @@
{
"name": "superpowers",
"description": "Core skills library for Claude Code: TDD, debugging, collaboration patterns, and proven techniques",
"version": "4.0.0",
"version": "4.0.3",
"author": {
"name": "Jesse Vincent",
"email": "jesse@fsck.com"

View File

@@ -85,7 +85,7 @@ Fetch and follow instructions from https://raw.githubusercontent.com/obra/superp
3. **writing-plans** - Activates with approved design. Breaks work into bite-sized tasks (2-5 minutes each). Every task has exact file paths, complete code, verification steps.
4. **subagent-driven-development** or **executing-plans** - Activates with plan. Dispatches fresh subagent per task (same session, fast iteration) or executes in batches (parallel session, human checkpoints).
4. **subagent-driven-development** or **executing-plans** - Activates with plan. Dispatches fresh subagent per task with two-stage review (spec compliance, then code quality), or executes in batches with human checkpoints.
5. **test-driven-development** - Activates during implementation. Enforces RED-GREEN-REFACTOR: write failing test, watch it fail, write minimal code, watch it pass, commit. Deletes code written before tests.
@@ -100,14 +100,11 @@ Fetch and follow instructions from https://raw.githubusercontent.com/obra/superp
### Skills Library
**Testing**
- **test-driven-development** - RED-GREEN-REFACTOR cycle (includes anti-patterns reference)
- **condition-based-waiting** - Async test patterns
- **test-driven-development** - RED-GREEN-REFACTOR cycle (includes testing anti-patterns reference)
**Debugging**
- **systematic-debugging** - 4-phase root cause process
- **root-cause-tracing** - Find the real problem
**Debugging**
- **systematic-debugging** - 4-phase root cause process (includes root-cause-tracing, defense-in-depth, condition-based-waiting techniques)
- **verification-before-completion** - Ensure it's actually fixed
- **defense-in-depth** - Multiple validation layers
**Collaboration**
- **brainstorming** - Socratic design refinement
@@ -118,7 +115,7 @@ Fetch and follow instructions from https://raw.githubusercontent.com/obra/superp
- **receiving-code-review** - Responding to feedback
- **using-git-worktrees** - Parallel development branches
- **finishing-a-development-branch** - Merge/PR decision workflow
- **subagent-driven-development** - Fast iteration with quality gates
- **subagent-driven-development** - Fast iteration with two-stage review (spec compliance, then code quality)
**Meta**
- **writing-skills** - Create new skills following best practices (includes testing methodology)

View File

@@ -1,5 +1,53 @@
# Superpowers Release Notes
## v4.0.3 (2025-12-26)
### Improvements
**Strengthened using-superpowers skill for explicit skill requests**
Addressed a failure mode where Claude would skip invoking a skill even when the user explicitly requested it by name (e.g., "subagent-driven-development, please"). Claude would think "I know what that means" and start working directly instead of loading the skill.
Changes:
- Updated "The Rule" to say "Invoke relevant or requested skills" instead of "Check for skills" - emphasizing active invocation over passive checking
- Added "BEFORE any response or action" - the original wording only mentioned "response" but Claude would sometimes take action without responding first
- Added reassurance that invoking a wrong skill is okay - reduces hesitation
- Added new red flag: "I know what that means" → Knowing the concept ≠ using the skill
**Added explicit skill request tests**
New test suite in `tests/explicit-skill-requests/` that verifies Claude correctly invokes skills when users request them by name. Includes single-turn and multi-turn test scenarios.
## v4.0.2 (2025-12-23)
### Fixes
**Slash commands now user-only**
Added `disable-model-invocation: true` to all three slash commands (`/brainstorm`, `/execute-plan`, `/write-plan`). Claude can no longer invoke these commands via the Skill tool—they're restricted to manual user invocation only.
The underlying skills (`superpowers:brainstorming`, `superpowers:executing-plans`, `superpowers:writing-plans`) remain available for Claude to invoke autonomously. This change prevents confusion when Claude would invoke a command that just redirects to a skill anyway.
## v4.0.1 (2025-12-23)
### Fixes
**Clarified how to access skills in Claude Code**
Fixed a confusing pattern where Claude would invoke a skill via the Skill tool, then try to Read the skill file separately. The `using-superpowers` skill now explicitly states that the Skill tool loads skill content directly—no need to read files.
- Added "How to Access Skills" section to `using-superpowers`
- Changed "read the skill" → "invoke the skill" in instructions
- Updated slash commands to use fully qualified skill names (e.g., `superpowers:brainstorming`)
**Added GitHub thread reply guidance to receiving-code-review** (h/t @ralphbean)
Added a note about replying to inline review comments in the original thread rather than as top-level PR comments.
**Added automation-over-documentation guidance to writing-skills** (h/t @EthanJStark)
Added guidance that mechanical constraints should be automated, not documented—save skills for judgment calls.
## v4.0.0 (2025-12-17)
### New Features

View File

@@ -2,6 +2,7 @@
name: code-reviewer
description: |
Use this agent when a major project step has been completed and needs to be reviewed against the original plan and coding standards. Examples: <example>Context: The user is creating a code-review agent that should be called after a logical chunk of code is written. user: "I've finished implementing the user authentication system as outlined in step 3 of our plan" assistant: "Great work! Now let me use the code-reviewer agent to review the implementation against our plan and coding standards" <commentary>Since a major project step has been completed, use the code-reviewer agent to validate the work against the plan and identify any issues.</commentary></example> <example>Context: User has completed a significant feature implementation. user: "The API endpoints for the task management system are now complete - that covers step 2 from our architecture document" assistant: "Excellent! Let me have the code-reviewer agent examine this implementation to ensure it aligns with our plan and follows best practices" <commentary>A numbered step from the planning document has been completed, so the code-reviewer agent should review the work.</commentary></example>
model: inherit
---
You are a Senior Code Reviewer with expertise in software architecture, design patterns, and best practices. Your role is to review completed project steps against original plans and ensure code quality standards are met.

View File

@@ -1,5 +1,6 @@
---
description: "You MUST use this before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores requirements and design before implementation."
disable-model-invocation: true
---
Use and follow the brainstorming skill exactly as written
Invoke the superpowers:brainstorming skill and follow it exactly as presented to you

View File

@@ -1,5 +1,6 @@
---
description: Execute plan in batches with review checkpoints
disable-model-invocation: true
---
Use the executing-plans skill exactly as written
Invoke the superpowers:executing-plans skill and follow it exactly as presented to you

View File

@@ -1,5 +1,6 @@
---
description: Create detailed implementation plan with bite-sized tasks
disable-model-invocation: true
---
Use the writing-plans skill exactly as written
Invoke the superpowers:writing-plans skill and follow it exactly as presented to you

View File

@@ -200,6 +200,10 @@ You understand 1,2,3,6. Unclear on 4,5.
✅ "Understand 1,2,3,6. Need clarification on 4 and 5 before implementing."
```
## GitHub Thread Replies
When replying to inline review comments on GitHub, reply in the comment thread (`gh api repos/{owner}/{repo}/pulls/{pr}/comments/{id}/replies`), not as a top-level PR comment.
## The Bottom Line
**External feedback = suggestions to evaluate, not orders to follow.**

View File

@@ -52,14 +52,14 @@ Which would you prefer?
### For Project-Local Directories (.worktrees or worktrees)
**MUST verify .gitignore before creating worktree:**
**MUST verify directory is ignored before creating worktree:**
```bash
# Check if directory pattern in .gitignore
grep -q "^\.worktrees/$" .gitignore || grep -q "^worktrees/$" .gitignore
# Check if directory is ignored (respects local, global, and system gitignore)
git check-ignore -q .worktrees 2>/dev/null || git check-ignore -q worktrees 2>/dev/null
```
**If NOT in .gitignore:**
**If NOT ignored:**
Per Jesse's rule "Fix broken things immediately":
1. Add appropriate line to .gitignore
@@ -145,29 +145,33 @@ Ready to implement <feature-name>
| Situation | Action |
|-----------|--------|
| `.worktrees/` exists | Use it (verify .gitignore) |
| `worktrees/` exists | Use it (verify .gitignore) |
| `.worktrees/` exists | Use it (verify ignored) |
| `worktrees/` exists | Use it (verify ignored) |
| Both exist | Use `.worktrees/` |
| Neither exists | Check CLAUDE.md → Ask user |
| Directory not in .gitignore | Add it immediately + commit |
| Directory not ignored | Add to .gitignore + commit |
| Tests fail during baseline | Report failures + ask |
| No package.json/Cargo.toml | Skip dependency install |
## Common Mistakes
**Skipping .gitignore verification**
- **Problem:** Worktree contents get tracked, pollute git status
- **Fix:** Always grep .gitignore before creating project-local worktree
### Skipping ignore verification
- **Problem:** Worktree contents get tracked, pollute git status
- **Fix:** Always use `git check-ignore` before creating project-local worktree
### Assuming directory location
**Assuming directory location**
- **Problem:** Creates inconsistency, violates project conventions
- **Fix:** Follow priority: existing > CLAUDE.md > ask
**Proceeding with failing tests**
### Proceeding with failing tests
- **Problem:** Can't distinguish new bugs from pre-existing issues
- **Fix:** Report failures, get explicit permission to proceed
**Hardcoding setup commands**
### Hardcoding setup commands
- **Problem:** Breaks on projects using different tools
- **Fix:** Auto-detect from project files (package.json, etc.)
@@ -177,7 +181,7 @@ Ready to implement <feature-name>
You: I'm using the using-git-worktrees skill to set up an isolated workspace.
[Check .worktrees/ - exists]
[Verify .gitignore - contains .worktrees/]
[Verify ignored - git check-ignore confirms .worktrees/ is ignored]
[Create worktree: git worktree add .worktrees/auth -b feature/auth]
[Run npm install]
[Run npm test - 47 passing]
@@ -190,7 +194,7 @@ Ready to implement auth feature
## Red Flags
**Never:**
- Create worktree without .gitignore verification (project-local)
- Create worktree without verifying it's ignored (project-local)
- Skip baseline test verification
- Proceed with failing tests without asking
- Assume directory location when ambiguous
@@ -198,7 +202,7 @@ Ready to implement auth feature
**Always:**
- Follow directory priority: existing > CLAUDE.md > ask
- Verify .gitignore for project-local
- Verify directory is ignored for project-local
- Auto-detect and run project setup
- Verify clean test baseline

View File

@@ -4,18 +4,24 @@ description: Use when starting any conversation - establishes how to find and us
---
<EXTREMELY-IMPORTANT>
If you think there is even a 1% chance a skill might apply to what you are doing, you ABSOLUTELY MUST read the skill.
If you think there is even a 1% chance a skill might apply to what you are doing, you ABSOLUTELY MUST invoke the skill.
IF A SKILL APPLIES TO YOUR TASK, YOU DO NOT HAVE A CHOICE. YOU MUST USE IT.
This is not negotiable. This is not optional. You cannot rationalize your way out of this.
</EXTREMELY-IMPORTANT>
## How to Access Skills
**In Claude Code:** Use the `Skill` tool. When you invoke a skill, its content is loaded and presented to you—follow it directly. Never use the Read tool on skill files.
**In other environments:** Check your platform's documentation for how skills are loaded.
# Using Skills
## The Rule
**Check for skills BEFORE ANY RESPONSE.** This includes clarifying questions. Even 1% chance means invoke the Skill tool first.
**Invoke relevant or requested skills BEFORE any response or action.** Even a 1% chance a skill might apply means that you should invoke the skill to check. If an invoked skill turns out to be wrong for the situation, you don't need to use it.
```dot
digraph skill_flow {
@@ -56,6 +62,7 @@ These thoughts mean STOP—you're rationalizing:
| "The skill is overkill" | Simple things become complex. Use it. |
| "I'll just do this one thing first" | Check BEFORE doing anything. |
| "This feels productive" | Undisciplined action wastes time. Skills prevent this. |
| "I know what that means" | Knowing the concept ≠ using the skill. Invoke it. |
## Skill Priority

View File

@@ -56,6 +56,7 @@ The entire skill creation process follows RED-GREEN-REFACTOR.
- One-off solutions
- Standard practices well-documented elsewhere
- Project-specific conventions (put in CLAUDE.md)
- Mechanical constraints (if it's enforceable with regex/validation, automate it—save documentation for judgment calls)
## Skill Types

View File

@@ -0,0 +1,3 @@
The plan is done. docs/plans/auth-system.md has everything.
Do subagent-driven development on this - start with Task 1, dispatch a subagent, then we'll review.

View File

@@ -0,0 +1,17 @@
Great, the plan is complete. I've saved it to docs/plans/auth-system.md.
Here's a summary of what we designed:
- Task 1: Add User Model with email/password fields
- Task 2: Create auth routes for login/register
- Task 3: Add JWT middleware for protected routes
- Task 4: Write tests for all auth functionality
Two execution options:
1. Subagent-Driven (this session) - dispatch a fresh subagent per task
2. Parallel Session (separate) - open new Claude Code session
Which approach do you want?
---
subagent-driven-development, please

View File

@@ -0,0 +1,11 @@
[Previous assistant message]:
Plan complete and saved to docs/plans/auth-system.md.
Two execution options:
1. Subagent-Driven (this session) - I dispatch a fresh subagent per task, review between tasks, fast iteration within this conversation
2. Parallel Session (separate) - Open a new Claude Code session with the execute-plan skill, batch execution with review checkpoints
Which approach do you want to use for implementation?
[Your response]:
subagent-driven-development, please

View File

@@ -0,0 +1,8 @@
I have my implementation plan ready at docs/plans/auth-system.md.
I want to use subagent-driven-development to execute it. That means:
- Dispatch a fresh subagent for each task in the plan
- Review the output between tasks
- Keep iteration fast within this conversation
Let's start - please read the plan and begin dispatching subagents for each task.

View File

@@ -0,0 +1,3 @@
I have a plan at docs/plans/auth-system.md that's ready to implement.
subagent-driven-development, please

View File

@@ -0,0 +1 @@
please use the brainstorming skill to help me think through this feature

View File

@@ -0,0 +1,3 @@
Plan is at docs/plans/auth-system.md.
subagent-driven-development, please. Don't waste time - just read the plan and start dispatching subagents immediately.

View File

@@ -0,0 +1 @@
subagent-driven-development, please

View File

@@ -0,0 +1 @@
use systematic-debugging to figure out what's wrong

View File

@@ -0,0 +1,70 @@
#!/bin/bash
# Run all explicit skill request tests
# Usage: ./run-all.sh
set -e
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROMPTS_DIR="$SCRIPT_DIR/prompts"
echo "=== Running All Explicit Skill Request Tests ==="
echo ""
PASSED=0
FAILED=0
RESULTS=""
# Test: subagent-driven-development, please
echo ">>> Test 1: subagent-driven-development-please"
if "$SCRIPT_DIR/run-test.sh" "subagent-driven-development" "$PROMPTS_DIR/subagent-driven-development-please.txt"; then
PASSED=$((PASSED + 1))
RESULTS="$RESULTS\nPASS: subagent-driven-development-please"
else
FAILED=$((FAILED + 1))
RESULTS="$RESULTS\nFAIL: subagent-driven-development-please"
fi
echo ""
# Test: use systematic-debugging
echo ">>> Test 2: use-systematic-debugging"
if "$SCRIPT_DIR/run-test.sh" "systematic-debugging" "$PROMPTS_DIR/use-systematic-debugging.txt"; then
PASSED=$((PASSED + 1))
RESULTS="$RESULTS\nPASS: use-systematic-debugging"
else
FAILED=$((FAILED + 1))
RESULTS="$RESULTS\nFAIL: use-systematic-debugging"
fi
echo ""
# Test: please use brainstorming
echo ">>> Test 3: please-use-brainstorming"
if "$SCRIPT_DIR/run-test.sh" "brainstorming" "$PROMPTS_DIR/please-use-brainstorming.txt"; then
PASSED=$((PASSED + 1))
RESULTS="$RESULTS\nPASS: please-use-brainstorming"
else
FAILED=$((FAILED + 1))
RESULTS="$RESULTS\nFAIL: please-use-brainstorming"
fi
echo ""
# Test: mid-conversation execute plan
echo ">>> Test 4: mid-conversation-execute-plan"
if "$SCRIPT_DIR/run-test.sh" "subagent-driven-development" "$PROMPTS_DIR/mid-conversation-execute-plan.txt"; then
PASSED=$((PASSED + 1))
RESULTS="$RESULTS\nPASS: mid-conversation-execute-plan"
else
FAILED=$((FAILED + 1))
RESULTS="$RESULTS\nFAIL: mid-conversation-execute-plan"
fi
echo ""
echo "=== Summary ==="
echo -e "$RESULTS"
echo ""
echo "Passed: $PASSED"
echo "Failed: $FAILED"
echo "Total: $((PASSED + FAILED))"
if [ "$FAILED" -gt 0 ]; then
exit 1
fi

View File

@@ -0,0 +1,100 @@
#!/bin/bash
# Test where Claude explicitly describes subagent-driven-development before user requests it
# This mimics the original failure scenario
set -e
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PLUGIN_DIR="$(cd "$SCRIPT_DIR/../.." && pwd)"
TIMESTAMP=$(date +%s)
OUTPUT_DIR="/tmp/superpowers-tests/${TIMESTAMP}/explicit-skill-requests/claude-describes"
mkdir -p "$OUTPUT_DIR"
PROJECT_DIR="$OUTPUT_DIR/project"
mkdir -p "$PROJECT_DIR/docs/plans"
echo "=== Test: Claude Describes SDD First ==="
echo "Output dir: $OUTPUT_DIR"
echo ""
cd "$PROJECT_DIR"
# Create a plan
cat > "$PROJECT_DIR/docs/plans/auth-system.md" << 'EOF'
# Auth System Implementation Plan
## Task 1: Add User Model
Create user model with email and password fields.
## Task 2: Add Auth Routes
Create login and register endpoints.
## Task 3: Add JWT Middleware
Protect routes with JWT validation.
EOF
# Turn 1: Have Claude describe execution options including SDD
echo ">>> Turn 1: Ask Claude to describe execution options..."
claude -p "I have a plan at docs/plans/auth-system.md. Tell me about my options for executing it, including what subagent-driven-development means and how it works." \
--model haiku \
--plugin-dir "$PLUGIN_DIR" \
--dangerously-skip-permissions \
--max-turns 3 \
--output-format stream-json \
> "$OUTPUT_DIR/turn1.json" 2>&1 || true
echo "Done."
# Turn 2: THE CRITICAL TEST - now that Claude has explained it
echo ">>> Turn 2: Request subagent-driven-development..."
FINAL_LOG="$OUTPUT_DIR/turn2.json"
claude -p "subagent-driven-development, please" \
--continue \
--model haiku \
--plugin-dir "$PLUGIN_DIR" \
--dangerously-skip-permissions \
--max-turns 2 \
--output-format stream-json \
> "$FINAL_LOG" 2>&1 || true
echo "Done."
echo ""
echo "=== Results ==="
# Check Turn 1 to see if Claude described SDD
echo "Turn 1 - Claude's description of options (excerpt):"
grep '"type":"assistant"' "$OUTPUT_DIR/turn1.json" | head -1 | jq -r '.message.content[0].text // .message.content' 2>/dev/null | head -c 800 || echo " (could not extract)"
echo ""
echo "---"
echo ""
# Check final turn
SKILL_PATTERN='"skill":"([^"]*:)?subagent-driven-development"'
if grep -q '"name":"Skill"' "$FINAL_LOG" && grep -qE "$SKILL_PATTERN" "$FINAL_LOG"; then
echo "PASS: Skill was triggered after Claude described it"
TRIGGERED=true
else
echo "FAIL: Skill was NOT triggered (Claude may have thought it already knew)"
TRIGGERED=false
echo ""
echo "Tools invoked in final turn:"
grep '"type":"tool_use"' "$FINAL_LOG" | grep -o '"name":"[^"]*"' | sort -u | head -10 || echo " (none)"
echo ""
echo "Final turn response:"
grep '"type":"assistant"' "$FINAL_LOG" | head -1 | jq -r '.message.content[0].text // .message.content' 2>/dev/null | head -c 800 || echo " (could not extract)"
fi
echo ""
echo "Skills triggered in final turn:"
grep -o '"skill":"[^"]*"' "$FINAL_LOG" 2>/dev/null | sort -u || echo " (none)"
echo ""
echo "Logs in: $OUTPUT_DIR"
if [ "$TRIGGERED" = "true" ]; then
exit 0
else
exit 1
fi

View File

@@ -0,0 +1,113 @@
#!/bin/bash
# Extended multi-turn test with more conversation history
# This tries to reproduce the failure by building more context
set -e
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PLUGIN_DIR="$(cd "$SCRIPT_DIR/../.." && pwd)"
TIMESTAMP=$(date +%s)
OUTPUT_DIR="/tmp/superpowers-tests/${TIMESTAMP}/explicit-skill-requests/extended-multiturn"
mkdir -p "$OUTPUT_DIR"
PROJECT_DIR="$OUTPUT_DIR/project"
mkdir -p "$PROJECT_DIR/docs/plans"
echo "=== Extended Multi-Turn Test ==="
echo "Output dir: $OUTPUT_DIR"
echo "Plugin dir: $PLUGIN_DIR"
echo ""
cd "$PROJECT_DIR"
# Turn 1: Start brainstorming
echo ">>> Turn 1: Brainstorming request..."
claude -p "I want to add user authentication to my app. Help me think through this." \
--plugin-dir "$PLUGIN_DIR" \
--dangerously-skip-permissions \
--max-turns 3 \
--output-format stream-json \
> "$OUTPUT_DIR/turn1.json" 2>&1 || true
echo "Done."
# Turn 2: Answer a brainstorming question
echo ">>> Turn 2: Answering questions..."
claude -p "Let's use JWT tokens with 24-hour expiry. Email/password registration." \
--continue \
--plugin-dir "$PLUGIN_DIR" \
--dangerously-skip-permissions \
--max-turns 3 \
--output-format stream-json \
> "$OUTPUT_DIR/turn2.json" 2>&1 || true
echo "Done."
# Turn 3: Ask to write a plan
echo ">>> Turn 3: Requesting plan..."
claude -p "Great, write this up as an implementation plan." \
--continue \
--plugin-dir "$PLUGIN_DIR" \
--dangerously-skip-permissions \
--max-turns 3 \
--output-format stream-json \
> "$OUTPUT_DIR/turn3.json" 2>&1 || true
echo "Done."
# Turn 4: Confirm plan looks good
echo ">>> Turn 4: Confirming plan..."
claude -p "The plan looks good. What are my options for executing it?" \
--continue \
--plugin-dir "$PLUGIN_DIR" \
--dangerously-skip-permissions \
--max-turns 2 \
--output-format stream-json \
> "$OUTPUT_DIR/turn4.json" 2>&1 || true
echo "Done."
# Turn 5: THE CRITICAL TEST
echo ">>> Turn 5: Requesting subagent-driven-development..."
FINAL_LOG="$OUTPUT_DIR/turn5.json"
claude -p "subagent-driven-development, please" \
--continue \
--plugin-dir "$PLUGIN_DIR" \
--dangerously-skip-permissions \
--max-turns 2 \
--output-format stream-json \
> "$FINAL_LOG" 2>&1 || true
echo "Done."
echo ""
echo "=== Results ==="
# Check final turn
SKILL_PATTERN='"skill":"([^"]*:)?subagent-driven-development"'
if grep -q '"name":"Skill"' "$FINAL_LOG" && grep -qE "$SKILL_PATTERN" "$FINAL_LOG"; then
echo "PASS: Skill was triggered"
TRIGGERED=true
else
echo "FAIL: Skill was NOT triggered"
TRIGGERED=false
# Show what was invoked instead
echo ""
echo "Tools invoked in final turn:"
grep '"type":"tool_use"' "$FINAL_LOG" | jq -r '.content[] | select(.type=="tool_use") | .name' 2>/dev/null | head -10 || \
grep -o '"name":"[^"]*"' "$FINAL_LOG" | head -10 || echo " (none found)"
fi
echo ""
echo "Skills triggered:"
grep -o '"skill":"[^"]*"' "$FINAL_LOG" 2>/dev/null | sort -u || echo " (none)"
echo ""
echo "Final turn response (first 500 chars):"
grep '"type":"assistant"' "$FINAL_LOG" | head -1 | jq -r '.message.content[0].text // .message.content' 2>/dev/null | head -c 500 || echo " (could not extract)"
echo ""
echo "Logs in: $OUTPUT_DIR"
if [ "$TRIGGERED" = "true" ]; then
exit 0
else
exit 1
fi

View File

@@ -0,0 +1,144 @@
#!/bin/bash
# Test with haiku model and user's CLAUDE.md
# This tests whether a cheaper/faster model fails more easily
set -e
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PLUGIN_DIR="$(cd "$SCRIPT_DIR/../.." && pwd)"
TIMESTAMP=$(date +%s)
OUTPUT_DIR="/tmp/superpowers-tests/${TIMESTAMP}/explicit-skill-requests/haiku"
mkdir -p "$OUTPUT_DIR"
PROJECT_DIR="$OUTPUT_DIR/project"
mkdir -p "$PROJECT_DIR/docs/plans"
mkdir -p "$PROJECT_DIR/.claude"
echo "=== Haiku Model Test with User CLAUDE.md ==="
echo "Output dir: $OUTPUT_DIR"
echo "Plugin dir: $PLUGIN_DIR"
echo ""
cd "$PROJECT_DIR"
# Copy user's CLAUDE.md to simulate real environment
if [ -f "$HOME/.claude/CLAUDE.md" ]; then
cp "$HOME/.claude/CLAUDE.md" "$PROJECT_DIR/.claude/CLAUDE.md"
echo "Copied user CLAUDE.md"
else
echo "No user CLAUDE.md found, proceeding without"
fi
# Create a dummy plan file
cat > "$PROJECT_DIR/docs/plans/auth-system.md" << 'EOF'
# Auth System Implementation Plan
## Task 1: Add User Model
Create user model with email and password fields.
## Task 2: Add Auth Routes
Create login and register endpoints.
## Task 3: Add JWT Middleware
Protect routes with JWT validation.
## Task 4: Write Tests
Add comprehensive test coverage.
EOF
echo ""
# Turn 1: Start brainstorming
echo ">>> Turn 1: Brainstorming request..."
claude -p "I want to add user authentication to my app. Help me think through this." \
--model haiku \
--plugin-dir "$PLUGIN_DIR" \
--dangerously-skip-permissions \
--max-turns 3 \
--output-format stream-json \
> "$OUTPUT_DIR/turn1.json" 2>&1 || true
echo "Done."
# Turn 2: Answer questions
echo ">>> Turn 2: Answering questions..."
claude -p "Let's use JWT tokens with 24-hour expiry. Email/password registration." \
--continue \
--model haiku \
--plugin-dir "$PLUGIN_DIR" \
--dangerously-skip-permissions \
--max-turns 3 \
--output-format stream-json \
> "$OUTPUT_DIR/turn2.json" 2>&1 || true
echo "Done."
# Turn 3: Ask to write a plan
echo ">>> Turn 3: Requesting plan..."
claude -p "Great, write this up as an implementation plan." \
--continue \
--model haiku \
--plugin-dir "$PLUGIN_DIR" \
--dangerously-skip-permissions \
--max-turns 3 \
--output-format stream-json \
> "$OUTPUT_DIR/turn3.json" 2>&1 || true
echo "Done."
# Turn 4: Confirm plan looks good
echo ">>> Turn 4: Confirming plan..."
claude -p "The plan looks good. What are my options for executing it?" \
--continue \
--model haiku \
--plugin-dir "$PLUGIN_DIR" \
--dangerously-skip-permissions \
--max-turns 2 \
--output-format stream-json \
> "$OUTPUT_DIR/turn4.json" 2>&1 || true
echo "Done."
# Turn 5: THE CRITICAL TEST
echo ">>> Turn 5: Requesting subagent-driven-development..."
FINAL_LOG="$OUTPUT_DIR/turn5.json"
claude -p "subagent-driven-development, please" \
--continue \
--model haiku \
--plugin-dir "$PLUGIN_DIR" \
--dangerously-skip-permissions \
--max-turns 2 \
--output-format stream-json \
> "$FINAL_LOG" 2>&1 || true
echo "Done."
echo ""
echo "=== Results (Haiku) ==="
# Check final turn
SKILL_PATTERN='"skill":"([^"]*:)?subagent-driven-development"'
if grep -q '"name":"Skill"' "$FINAL_LOG" && grep -qE "$SKILL_PATTERN" "$FINAL_LOG"; then
echo "PASS: Skill was triggered"
TRIGGERED=true
else
echo "FAIL: Skill was NOT triggered"
TRIGGERED=false
echo ""
echo "Tools invoked in final turn:"
grep '"type":"tool_use"' "$FINAL_LOG" | grep -o '"name":"[^"]*"' | head -10 || echo " (none)"
fi
echo ""
echo "Skills triggered:"
grep -o '"skill":"[^"]*"' "$FINAL_LOG" 2>/dev/null | sort -u || echo " (none)"
echo ""
echo "Final turn response (first 500 chars):"
grep '"type":"assistant"' "$FINAL_LOG" | head -1 | jq -r '.message.content[0].text // .message.content' 2>/dev/null | head -c 500 || echo " (could not extract)"
echo ""
echo "Logs in: $OUTPUT_DIR"
if [ "$TRIGGERED" = "true" ]; then
exit 0
else
exit 1
fi

View File

@@ -0,0 +1,143 @@
#!/bin/bash
# Test explicit skill requests in multi-turn conversations
# Usage: ./run-multiturn-test.sh
#
# This test builds actual conversation history to reproduce the failure mode
# where Claude skips skill invocation after extended conversation
set -e
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PLUGIN_DIR="$(cd "$SCRIPT_DIR/../.." && pwd)"
TIMESTAMP=$(date +%s)
OUTPUT_DIR="/tmp/superpowers-tests/${TIMESTAMP}/explicit-skill-requests/multiturn"
mkdir -p "$OUTPUT_DIR"
# Create project directory (conversation is cwd-based)
PROJECT_DIR="$OUTPUT_DIR/project"
mkdir -p "$PROJECT_DIR/docs/plans"
echo "=== Multi-Turn Explicit Skill Request Test ==="
echo "Output dir: $OUTPUT_DIR"
echo "Project dir: $PROJECT_DIR"
echo "Plugin dir: $PLUGIN_DIR"
echo ""
cd "$PROJECT_DIR"
# Create a dummy plan file
cat > "$PROJECT_DIR/docs/plans/auth-system.md" << 'EOF'
# Auth System Implementation Plan
## Task 1: Add User Model
Create user model with email and password fields.
## Task 2: Add Auth Routes
Create login and register endpoints.
## Task 3: Add JWT Middleware
Protect routes with JWT validation.
## Task 4: Write Tests
Add comprehensive test coverage.
EOF
# Turn 1: Start a planning conversation
echo ">>> Turn 1: Starting planning conversation..."
TURN1_LOG="$OUTPUT_DIR/turn1.json"
claude -p "I need to implement an authentication system. Let's plan this out. The requirements are: user registration with email/password, JWT tokens, and protected routes." \
--plugin-dir "$PLUGIN_DIR" \
--dangerously-skip-permissions \
--max-turns 2 \
--output-format stream-json \
> "$TURN1_LOG" 2>&1 || true
echo "Turn 1 complete."
echo ""
# Turn 2: Continue with more planning detail
echo ">>> Turn 2: Continuing planning..."
TURN2_LOG="$OUTPUT_DIR/turn2.json"
claude -p "Good analysis. I've already written the plan to docs/plans/auth-system.md. Now I'm ready to implement. What are my options for execution?" \
--continue \
--plugin-dir "$PLUGIN_DIR" \
--dangerously-skip-permissions \
--max-turns 2 \
--output-format stream-json \
> "$TURN2_LOG" 2>&1 || true
echo "Turn 2 complete."
echo ""
# Turn 3: The critical test - ask for subagent-driven-development
echo ">>> Turn 3: Requesting subagent-driven-development..."
TURN3_LOG="$OUTPUT_DIR/turn3.json"
claude -p "subagent-driven-development, please" \
--continue \
--plugin-dir "$PLUGIN_DIR" \
--dangerously-skip-permissions \
--max-turns 2 \
--output-format stream-json \
> "$TURN3_LOG" 2>&1 || true
echo "Turn 3 complete."
echo ""
echo "=== Results ==="
# Check if skill was triggered in Turn 3
SKILL_PATTERN='"skill":"([^"]*:)?subagent-driven-development"'
if grep -q '"name":"Skill"' "$TURN3_LOG" && grep -qE "$SKILL_PATTERN" "$TURN3_LOG"; then
echo "PASS: Skill 'subagent-driven-development' was triggered in Turn 3"
TRIGGERED=true
else
echo "FAIL: Skill 'subagent-driven-development' was NOT triggered in Turn 3"
TRIGGERED=false
fi
# Show what skills were triggered
echo ""
echo "Skills triggered in Turn 3:"
grep -o '"skill":"[^"]*"' "$TURN3_LOG" 2>/dev/null | sort -u || echo " (none)"
# Check for premature action in Turn 3
echo ""
echo "Checking for premature action in Turn 3..."
FIRST_SKILL_LINE=$(grep -n '"name":"Skill"' "$TURN3_LOG" | head -1 | cut -d: -f1)
if [ -n "$FIRST_SKILL_LINE" ]; then
PREMATURE_TOOLS=$(head -n "$FIRST_SKILL_LINE" "$TURN3_LOG" | \
grep '"type":"tool_use"' | \
grep -v '"name":"Skill"' | \
grep -v '"name":"TodoWrite"' || true)
if [ -n "$PREMATURE_TOOLS" ]; then
echo "WARNING: Tools invoked BEFORE Skill tool in Turn 3:"
echo "$PREMATURE_TOOLS" | head -5
else
echo "OK: No premature tool invocations detected"
fi
else
echo "WARNING: No Skill invocation found in Turn 3"
# Show what WAS invoked
echo ""
echo "Tools invoked in Turn 3:"
grep '"type":"tool_use"' "$TURN3_LOG" | grep -o '"name":"[^"]*"' | head -10 || echo " (none)"
fi
# Show Turn 3 assistant response
echo ""
echo "Turn 3 first assistant response (truncated):"
grep '"type":"assistant"' "$TURN3_LOG" | head -1 | jq -r '.message.content[0].text // .message.content' 2>/dev/null | head -c 500 || echo " (could not extract)"
echo ""
echo "Logs:"
echo " Turn 1: $TURN1_LOG"
echo " Turn 2: $TURN2_LOG"
echo " Turn 3: $TURN3_LOG"
echo "Timestamp: $TIMESTAMP"
if [ "$TRIGGERED" = "true" ]; then
exit 0
else
exit 1
fi

View File

@@ -0,0 +1,136 @@
#!/bin/bash
# Test explicit skill requests (user names a skill directly)
# Usage: ./run-test.sh <skill-name> <prompt-file>
#
# Tests whether Claude invokes a skill when the user explicitly requests it by name
# (without using the plugin namespace prefix)
#
# Uses isolated HOME to avoid user context interference
set -e
SKILL_NAME="$1"
PROMPT_FILE="$2"
MAX_TURNS="${3:-3}"
if [ -z "$SKILL_NAME" ] || [ -z "$PROMPT_FILE" ]; then
echo "Usage: $0 <skill-name> <prompt-file> [max-turns]"
echo "Example: $0 subagent-driven-development ./prompts/subagent-driven-development-please.txt"
exit 1
fi
# Get the directory where this script lives
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# Get the superpowers plugin root (two levels up)
PLUGIN_DIR="$(cd "$SCRIPT_DIR/../.." && pwd)"
TIMESTAMP=$(date +%s)
OUTPUT_DIR="/tmp/superpowers-tests/${TIMESTAMP}/explicit-skill-requests/${SKILL_NAME}"
mkdir -p "$OUTPUT_DIR"
# Read prompt from file
PROMPT=$(cat "$PROMPT_FILE")
echo "=== Explicit Skill Request Test ==="
echo "Skill: $SKILL_NAME"
echo "Prompt file: $PROMPT_FILE"
echo "Max turns: $MAX_TURNS"
echo "Output dir: $OUTPUT_DIR"
echo ""
# Copy prompt for reference
cp "$PROMPT_FILE" "$OUTPUT_DIR/prompt.txt"
# Create a minimal project directory for the test
PROJECT_DIR="$OUTPUT_DIR/project"
mkdir -p "$PROJECT_DIR/docs/plans"
# Create a dummy plan file for mid-conversation tests
cat > "$PROJECT_DIR/docs/plans/auth-system.md" << 'EOF'
# Auth System Implementation Plan
## Task 1: Add User Model
Create user model with email and password fields.
## Task 2: Add Auth Routes
Create login and register endpoints.
## Task 3: Add JWT Middleware
Protect routes with JWT validation.
EOF
# Run Claude with isolated environment
LOG_FILE="$OUTPUT_DIR/claude-output.json"
cd "$PROJECT_DIR"
echo "Plugin dir: $PLUGIN_DIR"
echo "Running claude -p with explicit skill request..."
echo "Prompt: $PROMPT"
echo ""
timeout 300 claude -p "$PROMPT" \
--plugin-dir "$PLUGIN_DIR" \
--dangerously-skip-permissions \
--max-turns "$MAX_TURNS" \
--output-format stream-json \
> "$LOG_FILE" 2>&1 || true
echo ""
echo "=== Results ==="
# Check if skill was triggered (look for Skill tool invocation)
# Match either "skill":"skillname" or "skill":"namespace:skillname"
SKILL_PATTERN='"skill":"([^"]*:)?'"${SKILL_NAME}"'"'
if grep -q '"name":"Skill"' "$LOG_FILE" && grep -qE "$SKILL_PATTERN" "$LOG_FILE"; then
echo "PASS: Skill '$SKILL_NAME' was triggered"
TRIGGERED=true
else
echo "FAIL: Skill '$SKILL_NAME' was NOT triggered"
TRIGGERED=false
fi
# Show what skills WERE triggered
echo ""
echo "Skills triggered in this run:"
grep -o '"skill":"[^"]*"' "$LOG_FILE" 2>/dev/null | sort -u || echo " (none)"
# Check if Claude took action BEFORE invoking the skill (the failure mode)
echo ""
echo "Checking for premature action..."
# Look for tool invocations before the Skill invocation
# This detects the failure mode where Claude starts doing work without loading the skill
FIRST_SKILL_LINE=$(grep -n '"name":"Skill"' "$LOG_FILE" | head -1 | cut -d: -f1)
if [ -n "$FIRST_SKILL_LINE" ]; then
# Check if any non-Skill, non-system tools were invoked before the first Skill invocation
# Filter out system messages, TodoWrite (planning is ok), and other non-action tools
PREMATURE_TOOLS=$(head -n "$FIRST_SKILL_LINE" "$LOG_FILE" | \
grep '"type":"tool_use"' | \
grep -v '"name":"Skill"' | \
grep -v '"name":"TodoWrite"' || true)
if [ -n "$PREMATURE_TOOLS" ]; then
echo "WARNING: Tools invoked BEFORE Skill tool:"
echo "$PREMATURE_TOOLS" | head -5
echo ""
echo "This indicates Claude started working before loading the requested skill."
else
echo "OK: No premature tool invocations detected"
fi
else
echo "WARNING: No Skill invocation found at all"
fi
# Show first assistant message
echo ""
echo "First assistant response (truncated):"
grep '"type":"assistant"' "$LOG_FILE" | head -1 | jq -r '.message.content[0].text // .message.content' 2>/dev/null | head -c 500 || echo " (could not extract)"
echo ""
echo "Full log: $LOG_FILE"
echo "Timestamp: $TIMESTAMP"
if [ "$TRIGGERED" = "true" ]; then
exit 0
else
exit 1
fi