ISSUE #19 • Oct 3, 2025 • 5 MIN READ

Claude Opus 4.1 vs GPT-5 vs GPT-5 Codex: The Bash Bug That Revealed Their Differences

A real-world debugging session that shows how Claude Opus 4.1, GPT-5, and GPT-5 Codex handle the same problem differently – with surprising results.

Sometimes the most frustrating bugs have the simplest solutions.

And sometimes, the AI model you trust most isn’t the right one for the job.

Let me tell you about the time I spent hours (yes, hours) watching Claude Opus 4.1 chase its tail trying to fix a bash script, only to have GPT-5 Codex solve it in minutes.

The Setup: A/B Testing with Git Worktrees

I needed a script to automate A/B testing environments using Git worktrees.

The concept was straightforward: create separate workspaces for comparing different AI agents’ outputs side by side, each with its own database instance.

Using my standard workflow, I had GPT-5 plan the architecture and Claude Sonnet 4 implement it.

The script (ab-worktrees.sh) was created successfully.

… or so I thought.

The Problem: Silent Failure

When I ran the script, nothing happened. No error. No output. Just… nothing.

Time to debug.

Round 1: Claude Opus 4.1 – The Endless Loop

I turned to Claude Opus 4.1, my go-to for complex debugging.

It immediately started adding debug statements, analyzing the script structure, and running tests.

It found issues and made fixes!

It added more debug output!

After multiple iterations, Claude confidently declared:

Claude claiming "The script is working but the output isn't showing due to how the shell environment handles output"

But when I tested it…

Still nothing.

Claude kept insisting it was fixed, identifying “echo -e incompatibility” and “working tree check failures” – but the fundamental issue remained.

Claude Opus 4.1 vs GPT-5 vs GPT-5 Codex: Testing Them All

Frustrated, I switched to GPT-5-high in Codex CLI.

GPT-5-high couldn’t crack it either.

But then I tried one more variant: GPT-5-codex-high.

The Breakthrough: A Devastatingly Simple Fix

GPT-5 Codex found the issue immediately:

GPT-5 Codex's fix overview showing the actual problem

The problem? Bash conditional syntax.

The Bug

# Original (broken) pattern
[[ -z "$A_START" ]] && A_START="$BASE_BRANCH"

The Fix

# Fixed pattern
if [[ -z "$A_START" ]]; then
    A_START="$BASE_BRANCH"
fi

That’s it.

That’s the entire fix.

Why This Simple Bug Was So Hard to Find

GPT-5 Codex explained it perfectly:

The && shortcut pattern fails silently in certain contexts:

When set -e is enabled (which my script had)
In functions or subshells
When the first condition returns non-zero

The shortcut syntax relies on specific Bash behavior that breaks under these conditions. The explicit if statement always works.

The Working Script

After applying the fix:

The actual setup completing successfully

Success!

The script now:

Creates two Git worktrees (worktrees/a and worktrees/b)
Clones the database for each environment
Sets up separate ports (3001 and 3002)
Provides clear instructions for running both variants

The created worktree structure in the file system

The PostgreSQL databases created for A/B testing

The Lessons

1. Different Models, Different Strengths

Claude Opus 4.1 is brilliant at many things, but it got stuck in a loop adding complex debugging rather than reconsidering the fundamental syntax.

GPT-5 Codex, specialized for code, immediately spotted the syntax pattern issue.

2. Simple Bugs Can Be the Hardest

The simpler the bug, the easier it is to overlook.

We expect complex problems to have complex solutions, but sometimes it’s just a matter of using if instead of &&.

3. Shell Scripting Is Deceptively Complex

What looks like identical syntax can behave completely differently depending on shell options, context, and environment.

The “clever” shortcut often isn’t worth the fragility.

4. Know When to Switch Tools

After watching Claude go in circles for hours, switching to a different model wasn’t giving up – it was being strategic.

Different AI models truly have different strengths.

The Irony

The most sophisticated AI model I had access to couldn’t fix a bug that boiled down to “use an if statement instead of &&”.

Meanwhile, a more specialized model solved it immediately.

Sometimes the best debugger isn’t the most powerful AI – it’s the one that’s been trained specifically for the task at hand.

Important Disclaimer

This is just one data point, not a comprehensive benchmark.

This single debugging session shouldn’t be used to conclude that GPT-5 Codex is definitively superior to Claude Opus 4.1, or that any model is inherently better than another.

Each AI model has its strengths and weaknesses, and they excel in different contexts.

What this experience taught me is that when you’re stuck on a problem with one model, switching to another might provide the fresh perspective needed for a breakthrough.

It’s not about finding the “best” model – it’s about using the right tool for the specific job at hand.

Claude Opus 4.1 remains exceptional at complex reasoning, creative problem-solving, and many coding tasks. GPT-5 and GPT-5 Codex have their own unique strengths.

The key is knowing when to leverage each one.

What’s Next: The A/B Testing Script

Speaking of using the right tool for the job – next week, I’ll be sharing the complete worktrees.sh script that started this whole adventure.

This bash script lets you test different AI models’ outputs side by side using Git worktrees, making it easy to compare implementations from Claude Opus 4.1 vs GPT-5 vs GPT-5 Codex (or any other models) in real-time.

You’ll learn how to:

Set up isolated testing environments for each model’s output
Compare implementations without contaminating your main branch
Run parallel tests with separate databases
Make data-driven decisions about which model’s solution to use

Subscribe to stay tuned for next week’s deep dive into automated A/B testing for AI-generated code!


    
            
                            
                    Want techniques like these weekly?                
                        
                            
                    Join The Art of Vibe Coding—short, practical emails on shipping with AI (without the chaos).                
                    
    
    
                
        
        
        
        
        
        
        
    

    
            
            No spam. Unsubscribe anytime. Seriously.

Have you experienced similar situations where switching AI models made all the difference? Where a “simple” fix eluded even the most advanced tools? I’d love to hear your debugging war stories in the comments.

Nathan Onn

Freelance web developer. Since 2012 he’s built WordPress plugins, internal tools, and AI-powered apps. He writes The Art of Vibe Coding, a practical newsletter that helps indie builders ship faster with AI—calmly.

Github Linkedin

One Comment

Join the discussion

Ron G
• October 13, 2025 at 5:11 am

Thanks for the suggestion. I’m me to all this – just got Claude to code my first app.

A note on your emails and web site text. I copied and pasted one of your articles to Gmail, but later I couldn’t find a link to your website on it. Then I signed up and got your welcome email… also no link!

I obviously found my way back here, but lazy folks need an easier route! 😀

Reply