● Workflow & Ops
qa-plan
Generate a QA test plan from code changes. Analyzes git diffs, traces blast radius, risk-scores each area, and outputs prioritized test cases with smoke checklist and exit criteria.
QA Plan Generator
Systematically generate QA test plans from code changes by tracing blast radius, scoring risk, and producing prioritized test cases.
When to Use
- After fixing a bug — generate targeted QA for what changed
- After implementing a feature — generate tests + regression checks
- Before a release — comprehensive QA covering all changes since last release
- On-demand — "QA the health module" or "test plan for this PR"
- Anytime you're unsure what to test
Input Modes
The skill accepts flexible input. Detection priority (first match wins):
- PR number: starts with
#or is purely numeric →/qa-plan #419 - Branch name:
git rev-parse --verify <arg>succeeds →/qa-plan fix/custom-drugs - Module/directory: directory exists in the project →
/qa-plan health - Freeform: anything else — matched against files/modules →
/qa-plan "the offline sync changes" - No args (default): reads unstaged + staged git diff →
/qa-plan
Workflow
Execute these 6 steps in order. Steps 2 and 3 use parallel subagents for speed.
Step 1: Gather Changes
Based on the input mode, collect the raw change data.
First, detect the default branch:
DEFAULT_BRANCH=$(git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's@^refs/remotes/origin/@@') || DEFAULT_BRANCH="main"
For git diff (default — no args):
# Working tree changes (staged + unstaged) vs HEAD
git diff HEAD
# Also capture untracked files (new files not yet git-added)
git ls-files --others --exclude-standard
# If empty (clean tree), fall back to branch commits vs default branch
git diff ${DEFAULT_BRANCH}...HEAD
Early exit: If all diffs are empty and no untracked files exist, report "No changes detected. Specify a branch, module, or PR number." and STOP.
For branch comparison:
git diff ${DEFAULT_BRANCH}...<branch-name>
For PR number:
gh pr diff <number>
If gh is not installed, fall back to: git log --oneline ${DEFAULT_BRANCH}...HEAD and read changed files manually.
For module name:
git diff ${DEFAULT_BRANCH}...HEAD -- '<module>/**'
For freeform input:
- Search for files/directories matching the description
- Fall back to full project diff if no match
From the diff, extract and record:
CHANGE INVENTORY
================
Files changed: [list of file paths]
Functions modified: [function/method names + file locations]
Model fields changed: [new, removed, or type-changed fields]
Imports added/removed:[new cross-module coupling]
API shape changes: [new/removed/renamed response fields]
Validation changes: [new error codes, changed validators]
Query filter changes: [modified .filter() / .exclude() / WHERE clauses]
Signal changes: [new/modified signal handlers]
Also extract the root cause / motivation from:
- Commit messages in the diff
- PR description (if PR input)
- Ask the user if unclear
Step 2: Discover Project Context
Launch parallel Explore subagents to discover the project landscape. This step makes the skill project-agnostic — it learns whatever codebase it's in.
Agent A — Documentation Discovery:
Find and read:
- Root CLAUDE.md (project-level docs)
- Module-level CLAUDE.md or README files for changed modules
- Any architecture docs, flow diagrams, or dependency maps
- CHANGELOG.md (recent bug history — informs risk scoring)
Extract:
- Framework identity (Django, FastAPI, Express, Rails, etc.)
- Mobile stack (React Native, Flutter, Swift, etc.)
- Database (PostgreSQL, SQLite, etc.)
- Key architectural patterns (multi-tenancy, FIFO, event sourcing, etc.)
- Known hot spots / historically buggy areas
Agent B — Code Structure Discovery:
Find and catalog:
- Test file locations and naming patterns
- Signal/event handler registrations for changed models
- Middleware stack (if web framework)
- URL/route configuration for changed endpoints
- Service layer files that wrap the changed models
- Cross-module import graph for changed files
Degraded mode (no CLAUDE.md or docs found):
- Infer framework from
package.json/requirements.txt/Gemfile/go.mod - Infer architecture from directory structure
- Note "Project Context: INFERRED (no documentation found)" in the output header
- Blast radius analysis in Step 3 will have reduced confidence — flag this in caveats
Step 3: Blast Radius Analysis (2-Hop)
Launch parallel Explore subagents to trace dependencies outward. Cap at 10 subagents max. If >20 changed files, group by module and launch one agent per module. If subagents are unavailable, execute sequentially.
Hop 1 — Direct Dependencies:
For each changed file, find:
- Callers: What imports or calls the changed functions? (grep for the function name across the project)
- Callees: What does the changed code call? (read the changed functions, note outbound calls)
- Signal consumers: If a model was changed, what signal handlers fire on its save/delete?
- Serializers/Views: Which serializers expose the changed model fields? Which views use them?
- Mobile consumers: Which mobile service files call the changed API endpoints?
- Test files: Which existing tests cover the changed code?
Hop 2 — Indirect Dependencies:
For each Hop 1 result, find:
- What calls the Hop 1 callers? (e.g., if a service changed, what views call that service? what tasks?)
- What downstream calculations consume the Hop 1 outputs? (e.g., reports that read from the changed model)
- What mobile screens call the Hop 1 mobile services?
Output format:
BLAST RADIUS
============
[Changed] health/serializers.py:TreatmentEventSerializer.validate()
├─ [Hop 1] health/views.py:TreatmentEventViewSet.bulk_create()
│ ├─ [Hop 2] mobile/src/services/healthService.js:createBulkTreatment()
│ └─ [Hop 2] mobile/app/(health)/add-treatment.js (UI)
├─ [Hop 1] health/views.py:TreatmentEventViewSet.create()
│ └─ [Hop 2] mobile/src/services/healthService.js:createTreatmentEvent()
├─ [Hop 1] inventory/services.py:PharmaceuticalInventoryDeductionService (signal chain)
│ └─ [Hop 2] reports/views.py (treatment cost in closeout)
└─ [Hop 1] invoices/services.py:calculate_treatment_items()
└─ [Hop 2] invoices/pdf_generator.py (invoice PDF)
Step 4: Risk Scoring
For each area in the blast radius, compute Risk Score = Likelihood (1-5) x Impact (1-5).
Likelihood factors:
| Factor | 1 (Low) | 3 (Medium) | 5 (High) |
|---|---|---|---|
| Lines changed | < 10 | 10-50 | 50+ |
| Complexity of change | Rename / typo fix | Logic change | New algorithm or data flow |
| Cross-module coupling | Single file | 2-3 modules | 4+ modules |
| Bug history in area | No recent bugs | 1-2 bugs in recent history | 3+ bugs (hot spot) |
Impact factors:
| Factor | 1 (Low) | 3 (Medium) | 5 (High) |
|---|---|---|---|
| Financial data affected | Display-only | Report calculations | Invoice / billing amounts |
| Data integrity | Read-only path | Update with validation | Write path (counts, inventory, balances) |
| User-facing severity | Cosmetic | Feature degraded | Feature broken / data loss |
| Multi-tenant exposure | Single-tenant data | Cross-tenant query | Auth / permission bypass |
| Reversibility | Easy rollback | Manual correction needed | Irreversible (inventory deducted, notifications sent) |
Coverage depth by risk score:
| Score | Label | Test Depth |
|---|---|---|
| 20-25 | Exhaustive | Every combination, boundary values, negative tests, offline, permissions |
| 12-19 | Heavy | Happy path + error path + one edge case + permission check |
| 6-11 | Standard | Happy path + one negative test |
| 1-5 | Smoke/Skip | Smoke test only or skip entirely |
Record the risk table:
RISK ASSESSMENT
===============
| Area | Likelihood | Impact | Score | Coverage |
|--------------------------------|-----------|--------|-------|------------|
| Custom drug selection (changed)| 5 | 5 | 25 | Exhaustive |
| Bulk treatment API (Hop 1) | 4 | 5 | 20 | Exhaustive |
| Treatment cost in closeout | 3 | 4 | 12 | Heavy |
| Scheduled treatments | 2 | 3 | 6 | Standard |
Step 5: Generate QA Plan
Produce a markdown document with ALL of these sections. Every section is mandatory unless marked (conditional).
Section 1: Header / Metadata
# QA Test Plan — {YYYY-MM-DD}
> {One-line description of what triggered this QA plan}
| Field | Value |
| -------------------- | ----------------------------------------- |
| **Plan ID** | QA-{YYYY-MM-DD}-{seq} |
| **Date** | {date} |
| **Trigger** | {Bug fix / Feature / Refactor / Release} |
| **Branch/PR** | {branch name or PR #} |
| **Risk Level** | {LOW / MEDIUM / HIGH / CRITICAL} |
| **Platform** | {backend-only / mobile-only / full-stack} |
| **Changed Files** | {N} files across {M} modules |
| **Modules Affected** | {list} |
Overall risk level = highest individual risk score:
- Any score 20+ → CRITICAL
- Any score 12-19 → HIGH
- All scores 6-11 → MEDIUM
- All scores 1-5 → LOW
Section 2: Root Cause / Change Summary
What changed, why, and which files were modified. For bug fixes: describe the root cause and the fix. For features: describe the user-facing behavior.
Include the change inventory from Step 1.
Section 3: Scope & Out-of-Scope
Explicit boundaries:
- In scope: List all areas from the blast radius that will be tested
- Out of scope: List major modules NOT affected (so the reader knows they were considered and excluded, not forgotten)
Section 4: Test Cases by Feature Area
This is the core output. Group by feature area. Each group gets its own markdown section with a table.
Table format (mandatory columns):
| # | Test Case | Priority | Type | Steps | Expected | Caveats |
|---|---|---|---|---|---|---|
| {section}.{seq} | Bold descriptive name | P0/P1/P2/P3 | smoke/functional/regression/edge/security/offline | Numbered action steps | Specific assertions (HTTP codes, field values, state changes) | Code file:line refs, architecture constraints, setup prerequisites, known limitations |
Priority (per test case, NOT per section): P0 = the fix itself or data-loss/security paths. P1 = direct integrators. P2 = adjacent, shared models. P3 = theoretical coupling only.
Type: smoke | functional | regression | edge | security | offline
MANDATORY: Every test case must have a non-empty Caveats column. Read the actual source code to populate caveats with real file references, line numbers, architecture constraints, setup requirements, or known limitations. Never leave caveats empty. Never fabricate code references — verify them by reading the file.
Generate test cases using the coverage depth from Step 4:
- Exhaustive areas (20-25): 5-8 test cases covering happy path, error paths, boundary values, permissions, offline
- Heavy areas (12-19): 3-5 test cases covering happy path, error path, one edge case
- Standard areas (6-11): 1-2 test cases covering happy path and one negative case
- Smoke areas (1-5): 0-1 test cases (covered by smoke checklist instead)
Section 5: Regression Test Cases
Tests for areas found via Hop 2 only (indirect blast radius — not directly changed, not a direct caller). Same table format, Type = regression. Always include regressions for historically buggy areas in the blast radius.
Disambiguation: Section 4 covers Hop 0 (changed code) and Hop 1 (direct callers/callees). Section 5 covers Hop 2 only. Section 6 covers cross-cutting edge cases not specific to a single feature area.
Section 6: Edge Cases & Negative Tests
Cross-cutting boundary conditions that span multiple feature areas: zero/null/empty, duplicate submissions, race conditions, max-length inputs, invalid FK references. Same table format, Type = edge.
Section 7: Security / Multi-Tenant Checks (CONDITIONAL)
Only include if the blast radius touches: authentication, authorization, permission checks, database query filters, middleware, or cross-tenant data access patterns.
Tests for:
- Data isolation between tenants/orgs/operations
- Permission enforcement (RBAC, role checks)
- Timing oracles (403 vs 404 revealing resource existence)
- CORS configuration
- Credential exposure in logs/responses
- Webhook signature verification
Section 8: Offline / Sync Tests (CONDITIONAL)
Only include if mobile service layer code is in the diff or blast radius.
Tests for:
- Offline queue behavior (enqueue, deduplication, overflow)
- Sync ordering (module priority, FIFO within priority)
- Conflict resolution (keep local, use server, decide later)
- Temp ID resolution chain across dependent entities
- Exponential backoff on failure
- Concurrent sync prevention
Section 9: Smoke Test Checklist
A quick-pass checkbox list drawn from P0 and P1 test cases. 5-15 items (scale to change size). Format:
## Smoke Test Checklist
- [ ] {Action verb} — verify {specific assertion}
- [ ] {Action verb} — verify {specific assertion}
...
Each item should be completable in under 2 minutes. This is the "run this if you only have 15 minutes" list.
Section 10: Exit Criteria
What must pass before the change can ship:
## Exit Criteria
- [ ] All P0 test cases pass
- [ ] All P1 test cases pass
- [ ] No P2 regressions introduced
- [ ] Smoke test checklist is green
- [ ] {Any project-specific criteria discovered in Step 2}
Step 6: Save & Report
-
Create
docs/directory if it doesn't exist:mkdir -p docs/ -
Save to
docs/QA-PLAN-{YYYY-MM-DD}-{slug}.md{slug}generation: PR input → PR title kebab-cased (max 40 chars). Branch input → branch name with/→-. Module input → module name. Default diff → most-changed module name.
-
Print summary: total test cases, P0/P1/P2/P3 breakdown, modules covered.
Token budget: If blast radius exceeds 30 areas, consolidate to the top 15 by risk score. Hard cap: 8 test cases per section even for Exhaustive areas. If total exceeds 60 test cases, drop P3 and condense P2 to smoke-only.
Key Principles
- Caveats are mandatory — every test case must reference real code (file:line), architecture constraints, or setup prerequisites. Read the source to populate these; never fabricate references.
- Risk drives depth — exhaustive testing for score 20+, smoke-only for score 1-5. Don't waste time on low-risk areas.
- Blast radius over gut feel — trace dependencies systematically (2 hops). The regressions you miss are always in the second hop.
- Priority per test case, not per section — a "P1 section" may contain individual P2 or P3 tests. Assign granularly.
- Conditional sections — only include security/offline sections when the blast radius warrants them. Shorter plans get read; long ones get skipped.
- Verify, don't assume — if a git command fails or a doc doesn't exist, adapt. Detect the default branch; fall back gracefully on missing
ghCLI.
Self-Review Checklist
Before finalizing the QA plan, verify:
- Every test case has a non-empty Caveats column with verified code references
- Priority is assigned per test case, not per section
- P0 tests cover the specific change that triggered the plan
- Blast radius Hop 2 areas have regression test cases
- Smoke checklist has 5-15 items from P0/P1 tests
- Exit criteria are present and actionable
- Conditional sections only included when relevant
- No placeholder text ("TBD", "TODO")
- Scope section explicitly lists excluded modules