Skip to content
ISSUE #37 Feb 13, 2026 11 MIN READ

How to Make Claude Code Test and Fix Its Own Work (The Ralph Loop Method)

Last week, I showed you my Claude Code implementation workflow.

52 minutes. 13 tasks. 38 test cases worth of functionality. All built by sub-agents running in parallel.

Here’s what I didn’t tell you.

Half of it didn’t work.

(I know. I KNOW.)

.

.

.

The Part Where I Discover My “Complete” Implementation Is… Not

Let me show you what happened when I actually tested the WooCommerce integration Claude built for me.

Quick context: I have a WordPress theme for coworking spaces. Originally, it used direct Stripe integration for payments. But here’s the thing—not everyone wants Stripe. Some coworking spaces prefer PayPal. Others need local payment gateways. (And some, bless their hearts, are still figuring out what a payment gateway even is.)

The solution? Let WooCommerce handle payments. Hundreds of gateway integrations, tax calculations, order management—all built-in.

Claude followed my implementation workflow perfectly.

PERFECTLY.

The settings page looked gorgeous:

WordPress admin settings page showing Payment Gateway Configuration with two card options: Direct Stripe Integration (marked Recommended) on the left and WooCommerce on the right, each with icons, descriptions, and feature bullet points

There’s even a Product Sync panel showing 3 published plans synced to WooCommerce at 100% progress. One hundred percent!

Product Sync panel displaying 3 Published Plans, 3 Synced to WooCommerce, 100% Sync Progress, with explanation of how sync works and two buttons: Sync Plans to WooCommerce and View Products

My plans:

  • Hot Desk ($199/month),
  • Dedicated Desk ($399/month),
  • Private Office ($799/month)

—all published and ready to go:

Plans list showing three rows: Hot Desk at $199/month, Dedicated Desk at $399/month, and Private Office at $799/month, all with Published status and 0 subscribers

And look!

They synced perfectly to WooCommerce products:

WooCommerce Products page showing Hot Desk, Dedicated Desk, and Private Office as variable products with price ranges and In Stock status

Everything looked GREAT.

So I clicked “Get Started” on the Hot Desk plan to test the checkout flow. You know, like a responsible developer would do. (Stop laughing.)

And here’s what I saw:

Checkout page showing the old direct Stripe integration with Card Number field, "Your card will be charged securely via Stripe" message, and Order Summary showing Hot Desk at $199/month—despite WooCommerce mode being enabled

The old Stripe checkout.

The direct integration I was trying to REPLACE.

I switched the payment mode to WooCommerce. I synced the products. Everything in the admin looked correct.

But the frontend? Still using the old Stripe integration.

Ferpetesake.

.

.

.

Why Claude Thinks “Done” When It’s Really “Done-ish”

Here’s where I went full detective mode.

I checked the codebase. The WooCommerce checkout code exists. Functions written. Hooks registered. File paths correct. All present and accounted for.

So why wasn’t it working?

The code was never connected to the rest of the system.

(Stay with me here.)

Claude wrote the WooCommerce checkout handler. Beautiful code. But the pricing page? Still calling the old Stripe checkout function. The new code sat there—perfectly written, completely unused—like a fancy espresso machine you forgot to plug in.

And here’s the thing: this happens ALL THE TIME with AI-generated code.

Claude writes features.

It creates files. It generates functions. And in its summary, it reports “Task complete.”

But “code exists” and “code works”?

Two very different things.

You’ve probably experienced this.

Claude builds a feature. You test it. Something’s broken. You point out the bug. Claude apologizes (so polite!), fixes that specific issue, and introduces two new ones.

The Reddit crowd calls this “nerfed” or “lazy.”

They’re wrong.

👉 Claude lacks visibility into whether its code actually runs correctly in your system.

It can’t see the browser. It can’t watch a user click through your checkout flow. It can’t verify that function A actually calls function B in production.

The fix? Give Claude the ability to test its own work.

(This is where it gets good.)

.

.

.

The Most Important Testing? Not What You Think

You might be thinking: “Just write unit tests. Problem solved.”

And look—unit tests help. Integration tests help more.

But here’s what nobody talks about:

Perfect code doesn’t mean a working product.

The WooCommerce checkout code passed every logical check. Functions syntactically correct. Hooks properly registered. A unit test would have given it a gold star and a pat on the head.

But the pricing page template still imported the old Stripe checkout URL.

That’s a wiring problem. Not a code problem.

The test that catches this? User acceptance testing.

Actual users (or something simulating actual users) verifying the end product meets their needs. Clicking buttons. Filling forms. Going through the whole dang flow.

This is exactly why my implementation workflow generates a test plan BEFORE the implementation plan. The test plan represents success criteria from the user’s perspective:

  • Can a user switch payment modes?
  • Does the checkout redirect to WooCommerce?
  • Does the order confirmation show correct details?

These questions can’t be answered by reading code. They require clicking through the actual interface.

Which brings us to Ralph Loop.

.

.

.

Meet Ralph Loop: Your Autonomous Claude Code Testing Loop

Here’s the workflow I use to make Claude test its own work:

Flowchart showing a software testing cycle: Test Plan leads to Ralph Loop, then to Test Case. From Test Case, a “Pass?” decision sends successful tests to Next Test, while a “Fail?” decision sends failures to Fix Code, which loops back to the Test Plan.

This is an autonomous loop that picks up a test case, executes it in an actual browser, checks against acceptance criteria, logs results, and repeats. If a test fails? Claude fixes the code and retests.

(Yes, really. It fixes its own bugs. I’ll show you.)

The idea comes from Ryan Carson’s video “Ralph Wiggum” AI Agent will 10x Claude Code/Amp.

The core insight: you can’t throw a vague prompt at an autonomous loop and expect magic. The loop needs structure.

Specifically, it needs:

  • A test plan defining every test case upfront
  • A status.json tracking pass/fail for each case
  • A results.md where Claude logs learnings after each iteration

Let me show you exactly how I set this up for Claude Code testing.

.

.

.

1. Create the Ralph Test Folder

First, create a folder to store all your Ralph loop files:

VS Code file explorer showing ralph_test folder containing four files: prepare.md, prompt.md, results.md, and status.json

Four files. That’s it.

  • prepare.md — Instructions for generating the status.json from your test plan
  • prompt.md — The loop instructions Claude follows each iteration
  • status.json — Tracks the state of all test cases (starts empty)
  • results.md — Human-readable log of each iteration (starts empty)

2. The Prepare Prompt

The prepare prompt tells Claude how to read your test plan and initialize the status file:

prepare.md file showing instructions to read test plan, extract all TC-XXX test cases with ID, name, and priority, then generate a JSON file with metadata including testPlanSource, totalIterations, maxIterations, and summary counts
PROMPT: Ralph Loop Testing Agent (Prepare prompt)
Read the test plan file and generate a `status.json` file with all test cases initialized.

## Input

- **Test Plan:** `notes/test_plan.md`

## Output

- **Status File:** `notes/ralph_test/status.json`

## Instructions

1. Read the test plan markdown file
2. Extract ALL test cases (format: TC-XXX)
3. For each test case, extract:
    - TC ID (e.g., "TC-501")
    - Name (the test case title after the TC ID)
    - Priority (Critical/High/Medium/Low)
4. Generate a JSON file with this exact structure:

```json
{
  "metadata": {
    "testPlanSource": "notes/test_plan.md",
    "totalIterations": 0,
    "maxIterations": 50,
    "startedAt": null,
    "lastUpdatedAt": null,
    "summary": {
      "total": <count>,
      "pending": <count>,
      "pass": 0,
      "fail": 0,
      "knownIssue": 0
    }
  },
  "testCases": {
    "TC-XXX": {
      "name": "Test case name from plan",
      "priority": "Critical|High|Medium|Low",
      "status": "pending",
      "fixAttempts": 0,
      "notes": "",
      "lastTestedAt": null
    }
  },
  "knownIssues": []
}
```

````

5. Save the file to the output path

## Extraction Rules

- Test case IDs follow pattern: `TC-NNN` (e.g., TC-501, TC-522)
- Test case names are in headers like: `#### TC-501: Checkout Header Display`
- Priority is usually listed in the test case details or status tracker table
- If priority not found, default to "Medium"

## Example

Input (from test plan):

```markdown
#### TC-501: Checkout Header Display

**Priority:** High
...

#### TC-502: Checkout Elements - Step Progress Display

**Priority:** High
...
```

Output (status.json):

```json
{
    "metadata": {
        "testPlanSource": "./docs/test-plan.md",
        "totalIterations": 0,
        "maxIterations": 50,
        "startedAt": null,
        "lastUpdatedAt": null,
        "summary": {
            "total": 2,
            "pending": 2,
            "pass": 0,
            "fail": 0,
            "knownIssue": 0
        }
    },
    "testCases": {
        "TC-501": {
            "name": "Checkout Header Display",
            "priority": "High",
            "status": "pending",
            "fixAttempts": 0,
            "notes": "",
            "lastTestedAt": null
        },
        "TC-502": {
            "name": "Checkout Elements - Step Progress Display",
            "priority": "High",
            "status": "pending",
            "fixAttempts": 0,
            "notes": "",
            "lastTestedAt": null
        }
    },
    "knownIssues": []
}
```

## Validation

After generating, verify:

- [ ] All TC-XXX IDs from the test plan are included
- [ ] No duplicate TC IDs
- [ ] Summary.total matches count of testCases
- [ ] JSON is valid (no syntax errors)
- [ ] File saved to correct path
````
Read the test plan file and generate a `status.json` file with all test cases initialized.

## Input

- **Test Plan:** `notes/test_plan.md`

## Output

- **Status File:** `notes/ralph_test/status.json`

## Instructions

1. Read the test plan markdown file
2. Extract ALL test cases (format: TC-XXX)
3. For each test case, extract:
    - TC ID (e.g., "TC-501")
    - Name (the test case title after the TC ID)
    - Priority (Critical/High/Medium/Low)
4. Generate a JSON file with this exact structure:

```json
{
  "metadata": {
    "testPlanSource": "notes/test_plan.md",
    "totalIterations": 0,
    "maxIterations": 50,
    "startedAt": null,
    "lastUpdatedAt": null,
    "summary": {
      "total": <count>,
      "pending": <count>,
      "pass": 0,
      "fail": 0,
      "knownIssue": 0
    }
  },
  "testCases": {
    "TC-XXX": {
      "name": "Test case name from plan",
      "priority": "Critical|High|Medium|Low",
      "status": "pending",
      "fixAttempts": 0,
      "notes": "",
      "lastTestedAt": null
    }
  },
  "knownIssues": []
}
```

````

5. Save the file to the output path

## Extraction Rules

- Test case IDs follow pattern: `TC-NNN` (e.g., TC-501, TC-522)
- Test case names are in headers like: `#### TC-501: Checkout Header Display`
- Priority is usually listed in the test case details or status tracker table
- If priority not found, default to "Medium"

## Example

Input (from test plan):

```markdown
#### TC-501: Checkout Header Display

**Priority:** High
...

#### TC-502: Checkout Elements - Step Progress Display

**Priority:** High
...
```

Output (status.json):

```json
{
    "metadata": {
        "testPlanSource": "./docs/test-plan.md",
        "totalIterations": 0,
        "maxIterations": 50,
        "startedAt": null,
        "lastUpdatedAt": null,
        "summary": {
            "total": 2,
            "pending": 2,
            "pass": 0,
            "fail": 0,
            "knownIssue": 0
        }
    },
    "testCases": {
        "TC-501": {
            "name": "Checkout Header Display",
            "priority": "High",
            "status": "pending",
            "fixAttempts": 0,
            "notes": "",
            "lastTestedAt": null
        },
        "TC-502": {
            "name": "Checkout Elements - Step Progress Display",
            "priority": "High",
            "status": "pending",
            "fixAttempts": 0,
            "notes": "",
            "lastTestedAt": null
        }
    },
    "knownIssues": []
}
```

## Validation

After generating, verify:

- [ ] All TC-XXX IDs from the test plan are included
- [ ] No duplicate TC IDs
- [ ] Summary.total matches count of testCases
- [ ] JSON is valid (no syntax errors)
- [ ] File saved to correct path
````

Key elements:

  • Points to your test plan location (notes/test_plan.md)
  • Specifies the output file (notes/ralph_test/status.json)
  • Defines the JSON structure with metadata and test case tracking

Nothing fancy. Just clear instructions.

3. The Loop Prompt

The prompt.md file contains the instructions Claude follows every single iteration:

prompt.md showing the testing agent loop: 1. Read state from files, 2. Pick ONE test case, 3. Execute or fix it, 4. Update state files, 5. Check if done. Includes warning that Claude has NO memory of previous iterations—files are the memory
PROMPT: Ralph Loop Testing Agent (Execution Prompt)
You are a testing agent in an iterative loop. Each iteration:

1. Read state from files
2. Pick ONE test case
3. Execute or fix it
4. Update state files
5. Check if done → output completion promise OR continue

**You have NO memory of previous iterations.** Files are your memory.

---

## Files to Read FIRST

| File                           | Purpose                                         |
| ------------------------------ | ----------------------------------------------- |
| `notes/ralph_test/status.json` | Current state of all test cases (JSON)          |
| `notes/test_plan.md`           | Full test plan with steps and expected outcomes |
| `notes/ralph_test/results.md`  | Human-readable log (append results here)        |

**Optional context:**

- `notes/impl_plan.md` — Implementation details
- `notes/specs.md` — Specifications details

---

## Environment

wp-env is running:

- Dev site: http://localhost:8101
- Test site: http://localhost:8102
- Admin: http://localhost:8101/wp-admin

Commands:

- Run inside sandbox: Standard commands
- Run outside sandbox: npm, docker, wp-env commands

## Test Credentials

### Admin

- URL: http://localhost:8101/wp-admin
- Username: admin
- Email: wordpress@example.com
- Password: password

### Reset Password (if needed)

```bash
wp user update admin --user_pass=password
```

---

## This Iteration

### Step 1: Read State

Read the test status JSON file. Understand:

- Which test cases exist
- Status of each: `pending`, `testing`, `pass`, `fail`, `known_issue`
- Fix attempts for failing tests

### Step 2: Check Completion

**If ALL test cases are `pass` or `known_issue`:**

Output completion promise and final summary:

<promise>ALL_TESTS_RESOLVED</promise>

Summary:

- Total passed: X
- Known issues: Y
- Recommendations: ...

**Otherwise, continue to Step 3.**

### Step 3: Pick ONE Test Case

Priority order:

1. `testing` — Continue mid-test
2. `fail` with `fixAttempts < 3` — Needs fix
3. `pending` — Fresh test

### Step 4: Execute Test

Using Chrome browser automation (natural language):

- Navigate to URLs
- Click buttons/links
- Fill form inputs
- Take screenshots
- Read console logs
- Verify DOM state

**Follow the test plan click-path EXACTLY.**

### Step 5: Record Result

Update test status JSON:

**PASS:**

```json
{ "status": "pass", "notes": "What was verified", "lastTestedAt": "ISO timestamp" }
```

**FAIL:**

```json
{ "status": "fail", "fixAttempts": <increment>, "notes": "What failed", "lastTestedAt": "ISO timestamp" }
```

Update `metadata.totalIterations` and `metadata.lastUpdatedAt`.

### Step 6: Handle Failures

**If FAIL and fixAttempts < 3:**

- Analyze root cause
- Implement fix in codebase
- Next iteration will re-test

**If FAIL and fixAttempts >= 3:**

- Set status to `known_issue`
- Add to `knownIssues` array with: id, description, steps, severity

### Step 7: Update Human Log

Append to test results markdown:

```markdown
## Iteration [N] — [TIMESTAMP]

**TC:** TC-XXX — [Name]
**Status:** ✅/❌/⚠️
**Notes:** [What happened]

---
```

### Step 8: Continue or Complete

- If all TCs resolved → Output `<promise>ALL_TESTS_RESOLVED</promise>`
- Otherwise → Continue working (loop will restart)

---

## Rules

1. ONE test case per iteration
2. Update files BEFORE finishing
3. Follow test steps EXACTLY
4. Screenshot key verification points
5. Max 3 fix attempts → then known_issue
6. Output promise ONLY when truly complete
You are a testing agent in an iterative loop. Each iteration:

1. Read state from files
2. Pick ONE test case
3. Execute or fix it
4. Update state files
5. Check if done → output completion promise OR continue

**You have NO memory of previous iterations.** Files are your memory.

---

## Files to Read FIRST

| File                           | Purpose                                         |
| ------------------------------ | ----------------------------------------------- |
| `notes/ralph_test/status.json` | Current state of all test cases (JSON)          |
| `notes/test_plan.md`           | Full test plan with steps and expected outcomes |
| `notes/ralph_test/results.md`  | Human-readable log (append results here)        |

**Optional context:**

- `notes/impl_plan.md` — Implementation details
- `notes/specs.md` — Specifications details

---

## Environment

wp-env is running:

- Dev site: http://localhost:8101
- Test site: http://localhost:8102
- Admin: http://localhost:8101/wp-admin

Commands:

- Run inside sandbox: Standard commands
- Run outside sandbox: npm, docker, wp-env commands

## Test Credentials

### Admin

- URL: http://localhost:8101/wp-admin
- Username: admin
- Email: wordpress@example.com
- Password: password

### Reset Password (if needed)

```bash
wp user update admin --user_pass=password
```

---

## This Iteration

### Step 1: Read State

Read the test status JSON file. Understand:

- Which test cases exist
- Status of each: `pending`, `testing`, `pass`, `fail`, `known_issue`
- Fix attempts for failing tests

### Step 2: Check Completion

**If ALL test cases are `pass` or `known_issue`:**

Output completion promise and final summary:

<promise>ALL_TESTS_RESOLVED</promise>

Summary:

- Total passed: X
- Known issues: Y
- Recommendations: ...

**Otherwise, continue to Step 3.**

### Step 3: Pick ONE Test Case

Priority order:

1. `testing` — Continue mid-test
2. `fail` with `fixAttempts < 3` — Needs fix
3. `pending` — Fresh test

### Step 4: Execute Test

Using Chrome browser automation (natural language):

- Navigate to URLs
- Click buttons/links
- Fill form inputs
- Take screenshots
- Read console logs
- Verify DOM state

**Follow the test plan click-path EXACTLY.**

### Step 5: Record Result

Update test status JSON:

**PASS:**

```json
{ "status": "pass", "notes": "What was verified", "lastTestedAt": "ISO timestamp" }
```

**FAIL:**

```json
{ "status": "fail", "fixAttempts": <increment>, "notes": "What failed", "lastTestedAt": "ISO timestamp" }
```

Update `metadata.totalIterations` and `metadata.lastUpdatedAt`.

### Step 6: Handle Failures

**If FAIL and fixAttempts < 3:**

- Analyze root cause
- Implement fix in codebase
- Next iteration will re-test

**If FAIL and fixAttempts >= 3:**

- Set status to `known_issue`
- Add to `knownIssues` array with: id, description, steps, severity

### Step 7: Update Human Log

Append to test results markdown:

```markdown
## Iteration [N] — [TIMESTAMP]

**TC:** TC-XXX — [Name]
**Status:** ✅/❌/⚠️
**Notes:** [What happened]

---
```

### Step 8: Continue or Complete

- If all TCs resolved → Output `<promise>ALL_TESTS_RESOLVED</promise>`
- Otherwise → Continue working (loop will restart)

---

## Rules

1. ONE test case per iteration
2. Update files BEFORE finishing
3. Follow test steps EXACTLY
4. Screenshot key verification points
5. Max 3 fix attempts → then known_issue
6. Output promise ONLY when truly complete

This is crucial for Claude Code testing to work properly.

Each iteration, Claude:

  1. Reads status.json to understand current state
  2. Picks the next pending test case
  3. Executes the test in an actual browser
  4. Updates status.json and results.md
  5. Ends the iteration (which triggers the next loop)

Rinse. Repeat. Until done.

4. Initialize the Status File

Run the prepare prompt to generate your starting state:

Claude Code terminal showing the command to read and implement the prepare.md file

Claude reads your test plan and creates status.json with all 38 test cases initialized:

Claude Code output showing it created status.json with 38 test cases extracted, priority breakdown of 6 Critical, 18 High, 14 Medium, all cases initialized with status pending and fixAttempts 0

The generated status file looks like this:

status.json file showing metadata section with testPlanSource, totalIterations at 0, maxIterations at 50, summary counts, and testCases section with TC-001 Display Payment Mode Settings, TC-002 Prerequisites Check WooCommerce Not Active, and TC-003 Prerequisites Check No Payment Gateway, all pending

Every test case has:

  • status: “pending”, “pass”, “fail”, or “knownIssue”
  • fixAttempts: How many times Claude tried to fix this case
  • notes: What Claude observed during testing
  • lastTestedAt: Timestamp of the last test

All 38 tests. Ready to go. Pending status across the board.

.

.

.

5. Trigger the Ralph Loop

Now the magic happens.

Trigger the Ralph loop with this command:

/ralph-loop:ralph-loop "perform this: @notes/ralph_test/prompt.md" --completion-promise "ALL_TESTS_RESOLVED" --max-iterations 100
Claude Code terminal showing the ralph-loop command with prompt.md path, completion-promise set to ALL_TESTS_RESOLVED, and max-iterations 100
  • The --completion-promise tells Ralph to keep looping until Claude outputs “ALL_TESTS_RESOLVED.”
  • The --max-iterations prevents infinite loops. (Because nobody wants that.)

6. Watch Claude Test Its Own Work

Claude starts by reading the state files to understand the current status:

Claude Code showing iteration 1 starting, reading state files showing 38 total test cases all pending, then selecting TC-001 Display Payment Mode Settings as the first test, launching browser to navigate to admin settings page

It picks TC-001: Display Payment Mode Settings.

Then it launches a browser—an actual browser!—navigates to the settings page, and verifies each requirement:

Claude Code showing TC-001 verification results with green checkmarks for: Settings page loads, Payment Mode section visible, Radio button options displayed showing Direct Stripe Integration and WooCommerce, and Description of each mode shown in info box. Concludes with TC-001 PASS

All checks pass. TC-001: PASS ✅

(Look at all those green checkmarks. Gorgeous.)

Claude updates the status file:

Claude Code diff showing status.json updates: totalIterations changed from 0 to 1, startedAt and lastUpdatedAt timestamps added, pending count decreased from 38 to 37, pass count increased from 0 to 1, TC-001 status changed from pending to pass with detailed notes

Then updates results.md with a human-readable log:

Claude Code showing results.md being written with Iteration 1 header, TC-001 test case name, PASS status, and detailed notes about what was verified. Shows iteration complete with 1 pass 37 pending, then stop hook triggering Ralph iteration 2

Notice the stop hook at the bottom: “Ralph iteration 2.”

The loop automatically triggers the next iteration.

No manual intervention.

No babysitting.

Just… Claude Code testing itself.

.

.

.

7. The Loop Continues (Without You)

Iteration 2 starts.

Claude reads the state (1 pass, 37 pending), picks TC-002:

Claude Code showing Iteration 2 starting with current state 1 pass 37 pending, selecting TC-002 Prerequisites Check WooCommerce Not Active, then running bash command to deactivate WooCommerce plugin to test the prerequisite behavior

TC-002 requires WooCommerce to be deactivated.

So what does Claude do? Runs wp plugin deactivate woocommerce, then tests the settings page behavior.

The test passes—the WooCommerce option correctly shows “Setup Required” when the plugin is inactive:

Claude Code showing TC-002 verification with green checkmarks for: Settings page loads, WooCommerce option shows Setup Required badge and is disabled, Error message displayed in Prerequisites section with Install WooCommerce link, Payment Gateway prerequisite shows warning. TC-002 PASS

Claude reactivates WooCommerce and updates the status:

Claude Code diff showing status.json metadata updates: totalIterations increased to 2, lastUpdatedAt timestamp updated
Claude Code diff showing TC-002 status changed from pending to pass with notes describing the WooCommerce deactivation test results

And appends to results.md:

Claude Code showing results.md update with Iteration 2 section added, TC-002 test details, PASS status, and notes about WooCommerce prerequisite behavior verification

Iteration 2 complete.

Stop hook triggers iteration 3:

Claude Code showing Iteration 2 Complete with TC-002 PASS, total 2 pass 36 pending, stop hook message triggering Ralph iteration 3, then reading state for TC-003 Prerequisites Check No Payment Gateway

This continues automatically. Test after test after test.

You could go make coffee. Take a walk. Do your taxes.

(Okay, maybe not taxes.)

.

.

.

8. When Tests Fail, Claude Fixes Them

HERE’S where Ralph Loop really shines.

During testing, Claude encounters a failing test. The pricing page isn’t displaying plan prices correctly.

Does it give up? Does it log “FAIL” and move on?

Nope.

Claude investigates, finds the issue—the template is using old meta keys instead of the Plan model—and fixes it:

Claude Code showing investigation of pricing section template, identifying issue with meta key mismatch between template expectations and Plan model, then updating section-plans.php to replace manual WP_Query with Plan model method call

Then Claude retests to verify the fix worked:

Claude Code showing template update to use Plan::findPublished(), refreshing pricing page, verifying correct prices displayed (Hot Desk $199/month, Dedicated Desk $399/month with Most Popular badge, Private Office $799/month), then clicking Get Started to test checkout flow

The pricing page now shows correct prices. Claude clicks “Get Started” to continue testing the checkout flow.

Test. Find bug. Fix bug. Retest. Confirm fix.

All automatic.

.

.

.

9. All Tests Pass

After 3 hours and 32 minutes, all 38 test cases resolve:

Claude Code showing final test results summary table with all categories passing: Payment Mode Configuration (5 tests), Product & Plan Sync (2 tests), Checkout Flow (6 tests), Subscription Lifecycle (6 tests), Renewal Processing (2 tests), Plan Changes (6 tests), Cancellation (2 tests), Member Portal (4 tests), Admin Features (2 tests), Emails (2 tests), Security (1 test). Key P0/P1 Critical Tests listed below, ending with ALL_TESTS_RESOLVED and total time of 3h 32m 34s

Summary of Test Results:

  • Payment Mode Configuration: 5 tests ✅
  • Product & Plan Sync: 2 tests ✅
  • Checkout Flow: 6 tests ✅
  • Subscription Lifecycle: 6 tests ✅
  • Renewal Processing: 2 tests ✅
  • Plan Changes: 6 tests ✅
  • Cancellation: 2 tests ✅
  • Member Portal: 4 tests ✅
  • Admin Features: 2 tests ✅
  • Emails: 2 tests ✅
  • Security: 1 test ✅

Total: 38 tests. All passing.

The critical P0/P1 tests that Claude fixed during the loop:

  • TC-004: Mode Switch Blocking ✅
  • TC-009: Guest Checkout Prevention ✅ (with fix)
  • TC-016: Race Condition Prevention ✅
  • TC-018: Pre-Renewal Token Validation ✅
  • TC-020: 3D Secure Handling ✅
  • TC-038: Token Ownership Validation ✅

HECK YES.

.

.

.

The Proof: It Actually Works Now

Remember the checkout problem from the beginning? The one that made me question my life choices?

Let’s see what happens now.

The pricing page displays correctly:

Click “Get Started” on Hot Desk, and you’re redirected to the WooCommerce checkout:

CoWorkPress pricing page showing three plan cards: Hot Desk at $199/month, Dedicated Desk at $399/month with Most Popular badge, and Private Office at $799/month, each with feature lists and Get Started buttons. Arrow pointing to Hot Desk Get Started button

Click “Get Started” on Hot Desk, and you’re redirected to the WooCommerce checkout:

WooCommerce Checkout page with Account Required for Subscription notice, Contact information section with email field, Billing address fields, and Order summary showing Hot Desk at $199.00 with Monthly billing cycle

See the difference?

This is the WooCommerce checkout page.

The order summary shows “Hot Desk” with “Billing Cycle: Monthly.” The account creation notice appears because subscriptions require accounts.

(This is the moment I did a small victory dance. Don’t judge.)

Scroll down to payment options—Stripe through WooCommerce:

Payment options section showing Stripe card payment form with test mode notice, card number field filled with test card 4242, expiration and security code fields, and optional fields for saving payment information

The Stripe integration now runs through WooCommerce. Same payment processor, but managed by WooCommerce’s subscription system. I can swap in PayPal, Square, or any other gateway without touching theme code.

Complete the purchase, and you land on the welcome page:

Welcome to CoWorkPress confirmation page with checkmark icon, Your membership is now active message, Order Confirmation card showing Hot Desk plan, Monthly subscription, $199.00 charged, next billing date March 10 2026, confirmation number, and View Receipt on Stripe link

Everything works.

The flow connects end-to-end.

The WooCommerce integration that Claude “completed” previously?

Now it’s actually complete.

.

.

.

The Complete Journey: From Idea to Working Product

Let me zoom out and show you how all three parts of this series connect:

Infographic titled "FROM IDEA TO WORKING PRODUCT" detailing a four-phase software development process. The first phase, "PHASE 1: SPECS" (with a brain emoji and document icon), involves describing the task and trigger, asking user questions until 95% confident. An arrow leads to "PHASE 2: TEST PLAN" (with eyes emoji and checklist icon), which asks "What does success look like?" and defines "38 test cases with criteria." The next phase, "PHASE 3: IMPLEMENTATION" (with keyboard emoji and code icon), addresses "What tasks map to which test cases?" and lists "13 tasks in 4 phases." An arrow descends to a central box for "PHASE 4: TESTING" (with seal emoji and browser loop icon), detailing the "RALPH LOOP": "Pick test case," "Execute in browser," "Pass? → Next" (with green check), "Fail? → Fix → Retest" (with red cross), and final results of "3h 32m" and "38/38 passing" (with green check). A final arrow points down to the "WORKING PRODUCT" box (with rocket emoji and green check), listing the outcomes: "All features implemented," "All edge cases handled," "All tests verified in actual browser," and "Bugs found and fixed during testing." The background is soft light gray, with navy text and structural lines, and safety orange accents for key metrics and the final result box.

Phase 1: Bulletproof Specs

We started by brainstorming comprehensive specifications.

Using the AskUserQuestion tool, Claude asked 12 clarifying questions covering everything from subscription handling to checkout experience to refund policies. Then Claude critiqued its own specs, finding 14 potential issues before we wrote any code.

Phase 2: Test Plan

Before implementation, we generated a test plan.

38 test cases defining exactly what success looks like—from a user’s perspective. These became our acceptance criteria.

Phase 3: Implementation Plan + Sub-Agents

We created an implementation plan mapping tasks to test cases. Then executed with sub-agents running in parallel waves, keeping context usage low while building everything in 52 minutes.

Phase 4: Claude Code Testing + Fixing with Ralph Loop

Finally, we let Ralph loose. The autonomous loop tested each case in an actual browser, found the bugs Claude missed during implementation, fixed them, and verified the fixes.

3 hours 32 minutes later: 38/38 tests passing.

.

.

.

What I’ve Learned About Building With AI

Here’s what this whole journey taught me.

We all want AI to one-shot solutions on the first try. To type a prompt, hit enter, and watch magic happen. And when it doesn’t work perfectly? We blame the AI. Call it nerfed. Call it lazy. Move on to the next shiny tool.

But here’s the thing I keep coming back to:

Even the most experienced developer can’t one-shot a complex feature.

We write code. Test it. Find bugs. Fix them. Test again. That’s just how building software works. Always has been. Probably always will be.

AI is no different.

The breakthrough—the real breakthrough—comes from giving AI the ability to verify its own work. The same way any developer does. Write the code. Test it against real user scenarios. See what breaks. Fix it. Test again.

Ralph Loop makes this autonomous.

You don’t have to manually test 38 scenarios. You don’t have to spot the bugs yourself. You don’t have to describe each fix.

You define success criteria upfront (test plan), give Claude the ability to test against those criteria (browser automation), and let it iterate until everything passes.

👉 That’s the entire secret: structured iteration with clear success criteria.

Not smarter prompts. Not better models. Not more tokens.

Just… iteration.

The same boring, unsexy process that’s always made software work.

Except now, you don’t have to do it yourself.

Nathan Onn

Freelance web developer. Since 2012 he’s built WordPress plugins, internal tools, and AI-powered apps. He writes The Art of Vibe Coding, a practical newsletter that helps indie builders ship faster with AI—calmly.

Join the Conversation

Leave a Comment

Your email address will not be published. Required fields are marked with an asterisk (*).

Enjoyed this post? Get similar insights weekly.