ISSUE #53 • Jun 5, 2026 • 14 MIN READ

I Gave Codex a Requirements Doc and Got a CodeCanyon-Grade Plugin Back

The first time I used Codex /goal, I sat at my desk for twenty-eight minutes pretending to do other work while an autologin plugin built itself from a one-paragraph spec.

That was How to Use Codex /goal to Build WordPress Plugins (My Spec-to-Ship Workflow). One feature. One goal. The kind of experiment where you peek at the terminal every 90 seconds and try to look casual about it.

This time, my input was a full requirements document and this single line:

./run-goals.sh

Then I walked away. For nearly five hours.

When I came back, a complete WooCommerce plugin was sitting in the repo — an admin grid for bulk-editing stock quantities across products, including per-variation stock for variable products. That’s the exact kind of WooCommerce complexity that breaks naive implementations. The genre of plugin that sells on CodeCanyon for $30–60.

All built while I made dinner, watched half a movie, and checked the terminal exactly once. (More on that later.)

VS Code file explorer showing the starting state with only wp-requirements-to-goals skill, playwright-cli skill, and requirements.md — the entire human input is one requirements file plus two skills

Everything I’ve built with the codex goal command up to this point has fit inside a demo. The autologin plugin took twenty-eight minutes. How I Chained Two Codex /goal Runs to Build a Complete CLI Tool scaled the pattern to two linked goals. How I Used 8 Codex /goal Runs to Build a Browser Game From Scratch pushed it to eight.

The question I’ve been carrying — and maybe you have too — is whether /goal survives contact with real software. Multi-feature. Edge cases. Settings pages. The kind of product someone would actually pay for.

This post is where I find out.

The honest caveat lands early, same as always: /goal produced the code, but the requirements produced the outcome. And this time the spec was a full requirements document, decomposed by a skill into a layered tree of goals — each with its own contract, its own verification, its own proof.

(If you’re new to the series, the autologin post covers what /goal is and how the goal trio works. Everything here builds on that foundation.)

The Requirements Are the Real Work

A paragraph was enough for an autologin plugin.

A full product needs a full brief.

I learned this the hard way on a previous build. The requirements were loose enough that the agent met every acceptance criterion — and still missed what I actually wanted. (If you’ve ever written a Jira ticket and gotten back something that was technically correct and completely wrong, you know the feeling.)

That gap is where I started treating the requirements doc as the real product.

(Full requirements: https://github.com/nathanonn/wc-bulk-edit-stock/blob/main/requirements.md)

The requirements for this build carried tagged user stories with explicit acceptance criteria, edge cases around out-of-stock states and variable-product handling, and cross-cutting concerns like validation and save resilience:

US-01: Quickly update a single product’s stock from a filterable admin grid
US-02: Set a group of products to out-of-stock at once (bulk action)
US-03: Edit per-variation stock for variable products inline
Edge cases: WooCommerce inactive, concurrent edits, deleted staged products, 100+ variations
Cross-cutting: Save/validation resilience, filtering/search, batch selection

That doc is the product brief, the architecture, and the test plan — all in one file. The better it is, the less you touch the build.

I wrote about the upstream discipline in How to Write Better Requirements with Claude (Stop Letting AI Assume). That post produces the input this post consumes. If you’re going to try this workflow, start there.

Here’s the thing: the codex goal command runs on evidence, and the requirements doc is where that evidence gets defined. Every acceptance criterion becomes a checkbox the machine has to satisfy before declaring a goal complete. Write the criteria well, and you’ve written the test plan. Write them vaguely, and the build reflects that vagueness right back at you.

The leverage point from the autologin post still holds — the autonomy /goal provides downstream is paid for upfront, in the spec. Here the spec is bigger, so the downstream autonomy stretches wider too.

Meet wp-requirements-to-goals — The Skill That Decomposes

The autologin post introduced a skill that turns a vague paragraph into one goal trio. One input, one output.

This post’s counterpart is wp-requirements-to-goals.

Same family, different scale. It takes a structured requirements doc and produces an entire project — a goals plan, a root scaffold, and a layered tree of goals ready to execute. When I first ran it against the bulk stock manager requirements, the decomposition it produced was almost exactly what I would have designed myself — except it took minutes instead of an afternoon of whiteboarding.

The layering follows a consistent pattern:

Layer	What it builds
`00-foundation`	Walking skeleton — plugin activates, settings register, one artifact renders
Per-US goals	One goal per user story, acceptance criteria copied verbatim from requirements
Non-US feature goals	Cross-cutting concerns that don’t map to a single story
Integration goal	Re-verifies every prior goal + cross-cutting edge cases

Each goal carries its own GOAL.md, VERIFY.md, and PROGRESS.md — the same trio from the autologin post, repeated across the full tree. Acceptance criteria are copied verbatim from the requirements document. Never paraphrased. That’s what keeps the machine’s definition of “done” identical to yours.

The integration goal at the end re-runs every previous verification — the same QC checkpoint idea readers of Your Codex Skills Should Evolve With Your Project (Ion Viper Part 2) will recognize, now baked into the WordPress skill instead of manually authored.

And before asking any questions, the skill probes the repo. It checks for existing config files, reads the slug, namespace, WordPress version, and PHP target from whatever’s already on disk. The clarification rounds stay short because the filesystem already answered most of the questions.

(Smart enough to look before it asks — which, let’s be honest, puts it ahead of a lot of people I’ve worked with.)

Codex terminal showing the wp-requirements-to-goals skill invoked against the requirements file

One-Shot or Phased — and the Q&A That Sets the Plan

The skill’s first question is a mode decision: generate goals phased or one-shot?

Phased writes the plan first, pauses so you can review and edit, then generates the goal files and scaffold. Safer for a first run — because the plan decomposition is the highest-risk decision. If the skill slices the requirements poorly, every downstream goal inherits the mistake.

One-shot generates the plan, scaffold, and all goal folders in a single pass. Faster, and what I chose here. The requirements doc was clean enough that I trusted the decomposition, and I wanted to see how far the unattended pipeline could stretch.

Codex asking whether to generate goals phased or one-shot, with three options: Phased recommended, One-shot, and None of the above

Selecting One-shot option to generate all goals and scaffold in one pass

After the mode decision, the skill ran through a handful of clarification rounds. I went with the recommended option on every one — the repo probe had already answered the identity questions, so these were mostly confirming sensible defaults.

(The whole exchange felt like confirming a restaurant reservation. “Table for one? Near the window? 7 PM?” Yes, yes, yes.)

First Q&A round with scaffold questions answered using recommended defaults — project identity, WordPress baseline, goal slicing, edge-case ownership

Second Q&A round covering test seeding method, derived acceptance criteria, and integration verification policy — all answered with recommended options

Then Codex laid out its five-step generation plan and started working.

Codex updated plan showing five generation steps: Phase 1 config, scaffold, foundation goal, per-US and non-US goals, integration goal

About 19 minutes later, the scaffold was done. Ten goal folders sitting in the goals directory. A root config, a plugin bootstrap folder, a verification protocol, and the bash script to run them all. Every contract written. Nothing implemented yet.

The project was runnable.

VS Code showing the finished scaffold — 10 goal folders from 00-foundation through 09-integration in the goals directory, plus root config files, ready to run

The Part That’s New: One Bash Command Runs Every Goal

Here’s what changed between this post and every previous one in the series.

In every prior build, I pasted each /goal command by hand. Copy the command, swap the folder name, press enter, wait, repeat. The build was autonomous within each goal, but the handoff between goals was manual. ME, copying and pasting. Every. Single. Time.

run-goals.sh removes that last handoff.

It chains every goal in order — starts the WordPress environment, runs the first goal, and when that one completes it auto-proceeds to the next, all the way through the integration goal at the end. One trigger, then leave.

Two pre-flight steps first. Install the local WordPress tooling:

Terminal showing npm install output — 404 packages installed for wp-env

Start the local WordPress environment:

wp-env start output with WordPress dev site at localhost:8888 and test site at localhost:8889

Then the trigger:

./run-goals.sh

Running ./run-goals.sh — the script starts wp-env, then launches Goal 00-foundation with danger-full-access sandbox and never approval

A practical note on plan tiers: on a ChatGPT Pro (x5) plan, the full unattended run fits inside usage limits. On a lower plan like Plus, you’d run goals in chunks to stay within limits — and the script supports exactly that:

./run-goals.sh --from 00 --to 02   # run goals 00, 01, 02
./run-goals.sh --only 03           # run a single goal

The foundation goal finished in about 14 minutes. The script committed the result and moved straight to the next goal without pausing.

Goal 00-foundation completed in 14 minutes 17 seconds, auto-proceeding to Goal 01-access-control with no human input

That auto-proceed is the whole point. The autologin post removed the per-step approvals. This one removes the per-goal handoffs. You are now outside the loop for the entire multi-goal build.

The Nearly-Five-Hour Black Box

The first time I left a single /goal run alone, the gap was 28 minutes. That felt long.

Nearly five hours is a different animal entirely. Ten goals. The entire implementation of a multi-feature WooCommerce plugin, start to finish, with nobody at the keyboard.

I won’t pretend the first time you let a run that long go feels comfortable. The trust window is ten times wider than the autologin post, and the stakes are proportionally bigger — more goals means more surface area for things to go wrong.

About two hours in, I opened the terminal tab. Just a glance — the kind where you tell yourself you’re checking “out of curiosity,” not because you’re nervous. Goal 05 was running. I closed the tab and made dinner.

Here’s what made the absence workable:

Each goal’s VERIFY.md defines what counts as proof. The continuation prompt refuses to declare a goal complete without mapping every acceptance criterion to evidence. Scope boundaries in each GOAL.md keep Codex from wandering into unrelated files. And the integration goal at the end — which alone took 91 minutes, about a third of the total runtime — ran a full regression sweep three times, re-verifying every prior goal’s work against the live WordPress environment.

Let me say that again. A third of the total build time was pure verification.

That regression discipline carries through the whole chain. Each goal re-checks the ones that came before it. A late goal breaking an early one would surface in that goal’s own verification pass, long before the integration sweep catches it again. The tests compound across the chain, and what you’re left with is a result you can audit from the artifacts alone.

283 minutes, 9 seconds. Ten goals completed, zero skipped.

Terminal showing 10 goals completed in 283 minutes 9 seconds with 0 skipped, followed by wp-env shutdown

What It Cost

I’ve been writing this series for months without ever putting a dollar figure on the autonomy.

This one does.

Before this experiment, I’d browsed CodeCanyon for bulk stock managers. The $40–60 listings had mixed reviews and half of them hadn’t been updated in a year. I wanted to know whether a clean spec and under five hours of machine time could land in the same category — so I built a bash script that totals input and output tokens across the full run and applies current GPT-5.5 API pricing.

Here’s what the 10-goal build cost:

Cost calculation output showing GPT-5.5 pricing: 10 completed goals, 4.71 hours, 208M input tokens with 206M cached, 0.43M output, Short Cost $131.40, Long Cost $254.46

How to think about that number:

Hiring a freelance WordPress developer to build a multi-feature WooCommerce admin plugin from a requirements doc would cost anywhere from $500 to several thousand dollars, depending on the complexity and the developer’s rate. Buying an existing CodeCanyon plugin and customizing it runs $30–60 for the license, plus hours of adaptation time to make it fit your exact spec.

$131 for a working, tested, multi-feature plugin built from your exact requirements — with zero hands-on coding time — lands in a genuinely interesting spot.

Does It Actually Work? (And the UI Taste Caveat)

Closed the terminal. Opened the browser. Tested the plugin like a regular human would.

The honest caveat first: the generated admin UI is functional but plain. GPT-5.5 builds things that work, but its visual design sense is weaker than Claude models. The admin page has the right columns, the right filters, the right controls — everything the requirements specified. The layout and styling are just… adequate. Functional without any flair.

The generated Bulk Edit Stock admin page showing a product table with search, category filter, stock status filter, and columns for product name, type, stock managed, stock quantity, and stock status — functional but visually plain

A day of CSS polish from a human — or a Claude session focused on UI — would bring it up to marketplace standard. The functionality, though, is the part the requirements controlled. And the functionality held up.

Here’s the test that matters most.

I edited stock for a simple product (set quantity to 20) and for a variable product’s “Small” variation (set quantity to 19), then hit Save Changes.

Bulk editing stock quantities — WC BES G09 Seasonal Two changed to 20, Small variation changed to 19, with Save Changes button and 2 products modified indicator

Then I opened the actual WooCommerce product edit screens to check whether the values persisted. The simple product showed 20.

WooCommerce product edit page for WC BES G09 Seasonal Two showing stock quantity of 20 persisted correctly after bulk edit, with red arrow pointing to the quantity field

The variation showed 19.

WooCommerce variation edit page for Small variation showing stock quantity of 19 persisted correctly after bulk edit, with red arrow pointing to the stock quantity field

Per-variation stock on variable products is exactly where a lazy plugin implementation falls apart — WooCommerce stores variation stock separately from the parent product, and the save path requires hitting variation-specific meta fields.

That complexity is the reason I chose this plugin as the test case. And it held up.

👉 What this series keeps landing on: /goal offloads the implementation so you can spend your time being a good tester. Hours of machine work freed me to focus entirely on verification. Opening the browser, clicking through the plugin, checking that values persisted — that’s where my time belongs now.

Grab the Plugin

The full project is on GitHub: wc-bulk-edit-stock. Every goal folder, the bash script, the complete Codex run history — all of it. You can walk through the entire build, goal by goal, in the commit log. (It’s one of those repos where the journey is the documentation.)

If you just want the finished plugin, the releases page has a downloadable zip. Drop it into any WooCommerce site and you’ve got yourself a working bulk stock manager.

Use the Skill for Your Own Plugin

Install the skill:

npx skills add nathanonn/agent-skills --skill wp-requirements-to-goals --agent codex

The repo is at github.com/nathanonn/agent-skills.

One prerequisite to know about: the verification step in each goal uses playwright-cli for browser-based tests against the running WordPress environment. If you want the full workflow — including automated verification — you’ll need it installed. The playwright-cli README covers the setup.

Decomposition, scaffolding, and goal generation — that’s what the skill handles. Execution is on the bash script. But both are only as good as the requirements doc you feed in. Vague requirements produce vague goals, and the build reflects that.

The real prerequisite — ferpetesake — is learning to write requirements well. Start with How to Write Better Requirements with Claude if you haven’t already.

The Bigger Picture

Five entries in this series. One pattern. An input that keeps shrinking.

The autologin post started with a paragraph and a pasted command — one feature. This one started with a requirements doc and one bash command — a complete, multi-feature product.

The skill carries the domain knowledge. /goal runs the execution loop. PROGRESS.md proves the work. What changed is the ceiling — the scope of what you can build without writing code or babysitting the build.

The human’s job has compressed to two things: writing the requirements well and verifying the result. Everything between those two — decomposition, scaffolding, implementation, testing, regression — is now machine work you can trigger and walk away from. Like leaving a slow cooker on and coming back to a finished meal. (Except the meal is a WooCommerce plugin, and the slow cooker cost $131.)

The codex goal command reaches marketplace-grade complexity here, and that’s the claim this post earns. A bulk stock manager with per-variation editing, cross-cutting validation, and a full integration sweep is the kind of plugin people actually sell. The build handled it.

The honest forward edge: the UI taste gap is real, the $131 cost is real, and “marketplace-grade functionality” still needs a human’s polish and judgment before it’s ready for paying customers. Functional code and a shippable product are different things — the gap between them is taste, branding, documentation, and support. All human work.

But the part AI is getting genuinely good at — executing a well-specified plan, unattended, across an entire multi-feature build — just took another visible step.

Your job is to get good at writing the plan.

More workflows like this — AI-assisted development with Claude Code, Codex, and the tools between them — land in The Art of Vibe Coding newsletter every week. If this one was useful, the next one probably will be too.

Nathan Onn

Freelance web developer. Since 2012 he’s built WordPress plugins, internal tools, and AI-powered apps. He writes The Art of Vibe Coding, a practical newsletter that helps indie builders ship faster with AI—calmly.

Github Linkedin

One Comment

Join the discussion

Jochen
• June 5, 2026 at 4:52 pm

Very interesting insights, thank you for sharing Nathan !

Reply