Claude Opus 4.1 vs GPT-5 vs GPT-5 Codex: The Bash Bug That Revealed Their Differences
A real-world debugging session that shows how Claude Opus 4.1, GPT-5, and GPT-5 Codex handle the same problem differently – with surprising results.
Sometimes the most frustrating bugs have the simplest solutions.
And sometimes, the AI model you trust most isn’t the right one for the job.
Let me tell you about the time I spent hours (yes, hours) watching Claude Opus 4.1 chase its tail trying to fix a bash script, only to have GPT-5 Codex solve it in minutes.
.
.
.
The Setup: A/B Testing with Git Worktrees
I needed a script to automate A/B testing environments using Git worktrees.
The concept was straightforward: create separate workspaces for comparing different AI agents’ outputs side by side, each with its own database instance.
Using my standard workflow, I had GPT-5 plan the architecture and Claude Sonnet 4 implement it.
The script (ab-worktrees.sh
) was created successfully.
… or so I thought.
.
.
.
The Problem: Silent Failure
When I ran the script, nothing happened. No error. No output. Just… nothing.

Time to debug.
.
.
.
Round 1: Claude Opus 4.1 – The Endless Loop
I turned to Claude Opus 4.1, my go-to for complex debugging.
It immediately started adding debug statements, analyzing the script structure, and running tests.
It found issues and made fixes!

It added more debug output!


After multiple iterations, Claude confidently declared:

But when I tested it…

Still nothing.
Claude kept insisting it was fixed, identifying “echo -e incompatibility” and “working tree check failures” – but the fundamental issue remained.
.
.
.
Claude Opus 4.1 vs GPT-5 vs GPT-5 Codex: Testing Them All
Frustrated, I switched to GPT-5-high in Codex CLI.


GPT-5-high couldn’t crack it either.
But then I tried one more variant: GPT-5-codex-high.
.
.
.
The Breakthrough: A Devastatingly Simple Fix
GPT-5 Codex found the issue immediately:

The problem? Bash conditional syntax.
The Bug
# Original (broken) pattern [[ -z "$A_START" ]] && A_START="$BASE_BRANCH"
The Fix
# Fixed pattern if [[ -z "$A_START" ]]; then A_START="$BASE_BRANCH" fi
That’s it.
That’s the entire fix.
.
.
.
Why This Simple Bug Was So Hard to Find
GPT-5 Codex explained it perfectly:
The &&
shortcut pattern fails silently in certain contexts:
- When
set -e
is enabled (which my script had) - In functions or subshells
- When the first condition returns non-zero
The shortcut syntax relies on specific Bash behavior that breaks under these conditions. The explicit if
statement always works.
.
.
.
The Working Script
After applying the fix:




Success!
The script now:
- Creates two Git worktrees (
worktrees/a
andworktrees/b
) - Clones the database for each environment
- Sets up separate ports (3001 and 3002)
- Provides clear instructions for running both variants


.
.
.
The Lessons
1. Different Models, Different Strengths
Claude Opus 4.1 is brilliant at many things, but it got stuck in a loop adding complex debugging rather than reconsidering the fundamental syntax.
GPT-5 Codex, specialized for code, immediately spotted the syntax pattern issue.
2. Simple Bugs Can Be the Hardest
The simpler the bug, the easier it is to overlook.
We expect complex problems to have complex solutions, but sometimes it’s just a matter of using if
instead of &&
.
3. Shell Scripting Is Deceptively Complex
What looks like identical syntax can behave completely differently depending on shell options, context, and environment.
The “clever” shortcut often isn’t worth the fragility.
4. Know When to Switch Tools
After watching Claude go in circles for hours, switching to a different model wasn’t giving up – it was being strategic.
Different AI models truly have different strengths.
.
.
.
The Irony
The most sophisticated AI model I had access to couldn’t fix a bug that boiled down to “use an if statement instead of &&”.
Meanwhile, a more specialized model solved it immediately.
Sometimes the best debugger isn’t the most powerful AI – it’s the one that’s been trained specifically for the task at hand.
.
.
.
Important Disclaimer
This is just one data point, not a comprehensive benchmark.
This single debugging session shouldn’t be used to conclude that GPT-5 Codex is definitively superior to Claude Opus 4.1, or that any model is inherently better than another.
Each AI model has its strengths and weaknesses, and they excel in different contexts.
What this experience taught me is that when you’re stuck on a problem with one model, switching to another might provide the fresh perspective needed for a breakthrough.
It’s not about finding the “best” model – it’s about using the right tool for the specific job at hand.
Claude Opus 4.1 remains exceptional at complex reasoning, creative problem-solving, and many coding tasks. GPT-5 and GPT-5 Codex have their own unique strengths.
The key is knowing when to leverage each one.
.
.
.
What’s Next: The A/B Testing Script
Speaking of using the right tool for the job – next week, I’ll be sharing the complete worktrees.sh
script that started this whole adventure.
This bash script lets you test different AI models’ outputs side by side using Git worktrees, making it easy to compare implementations from Claude Opus 4.1 vs GPT-5 vs GPT-5 Codex (or any other models) in real-time.
You’ll learn how to:
- Set up isolated testing environments for each model’s output
- Compare implementations without contaminating your main branch
- Run parallel tests with separate databases
- Make data-driven decisions about which model’s solution to use
Subscribe to stay tuned for next week’s deep dive into automated A/B testing for AI-generated code!
Join The Art of Vibe Coding—short, practical emails on shipping with AI (without the chaos).
No spam. Unsubscribe anytime.
Want techniques like these weekly?
Have you experienced similar situations where switching AI models made all the difference? Where a “simple” fix eluded even the most advanced tools? I’d love to hear your debugging war stories in the comments.
Leave a Comment