Code vs Codex
48 hours ago, OpenAI released Codex, their lightweight coding agent that looks like it does a lot of the same stuff that Claude Code does.
Because Task Demon uses coding tools like Claude Code to create the detailed plan documents it writes, we were excited to immediately add support for Codex in Task Demon. With v0.7.0 of the Task Demon CLI agent you can now just set the tool
option to codex
in your .taskdemon/config.yml
file and the agent will start using Codex. Simple, huh?
OpenAI Codex vs Claude Code
Ok so let's see how these two tools compare.
Is it silly to compare the two? Codex is brand new, and Claude Code is relatively mature at all of 2 months old. Codex will evolve rapidly, and we've only had a few hours to play with it, so these initial observations are just that - initial observations.
Most of what Task Demon does is to very efficiently create plan documents for all your coding tasks, so that's what we focused our testing on. We gave each tool a set of 5 planning tasks to perform, and measured how long it took as well as the quality of the output.
The results are of course subjective, and the landscape will change rapidly, but here's a 6 minute video of what we found:
For those who want to read rather than watch, keep going:
What we tested
Task Demon primarily uses coding tools to generate plan documents for some task, so we focused our testing on that. We gave each tool 5 planning tasks to perform, and measured how long it took as well as the quality of the output.
In the results repo there are 4 folders:
- tasks - the 2-sentence ticket descriptions to turn into plans
- agent1 - the plans generated by Claude Code
- agent2 - the plans generated by Codex
- verdicts - Claude and Codex's own verdicts on which plans were better
We named them agent1
and agent2
because in a moment we're going to have Claude Code and OpenAI Codex themselves assess which made the better plans, and we don't want to give anything away!
Example Task
Here's an example of one of the tasks we gave each tool - in this case the changelog task:
This task description was turned into 2 plan documents, along with 4 other tasks. Along with the task description we passed the tool a detailed set of instructions on how to generate the plan. They are a little long to show in their entirety here, but let's take a look at some of what they came up with, starting with Claude's plan:
The plan document the Claude spit out includes a lovely diagram, and numbered tasks, which is cool. It largely went along with the instructions we gave it, and it took about 2 minutes to complete the document.
Now let's take a look at Codex's plan - again too long to reproduce in its entirety here but follow the link if you want to see the whole thing:
To me, Codex did not follow the instructions we gave it quite so well. It did attempt to make a diagram, but it doesn't render properly in GitHub, and the tasks aren't numbered. Looking at the other plans output by Codex, only about half of them have numbered tasks, whereas all 5 of the Claude plans use numbered tasks as requested.
However, the agents themselves have a different perspective, as we will see:
Who Claude and Codex think won
Our final step was to ask Claude and Codex which of them had done a better job with the generated plans.
We prepared a detailed PROCESS.md prompt for the tool to follow, and asked them to follow the instructions.
Here's what Claude Code said:
Note that it actually says "agent2" instead of "Codex" because we never told it which one was which - here's the full raw output.
Codex's verdict was a little more nuanced:
You can see Codex's full verdict here.
We ran these several times and generally got consistent outcomes - Claude Code said Codex's plans were better most of the time, Codex said it was about tied. And to the only human in the loop, Claude's plans look better.
Anecdotes, Vibes & Recommendations
My anecdotal experience using the 2 agents over the last day or so is that Codex is plenty capable, but it also takes a lot longer to do things than Claude Code does, and it tends to be a lot more variable in how long it takes. Sometimes it will generate a plan in 30 seconds, which doesn't seem long enough to generate a good plan, other times it will take 2-3 minutes for the same task. There are a million reasons why this could be the case today and be solved tomorrow.
Beyond that, Codex feels a little more timid than Claude Code. After the plan documents were generated, we had both Codex and Claude go and implement some of the plans, reverting the changes between each try. Codex generally took longer to get the task done, but what was notable was that it would often spend the first several minutes just reading code - even when presented with a detailed plan document. It is fascinating to watch these machines make hundreds of tool calls, but watching Codex it was sometimes a little puzzling why it was reading certain files that didn't seem to have much to do with the task.
Are we going to keep Codex around? Absolutely. It's already in a great place for a tool that's been released for 48 hours, and the fact that it's open source is a big plus for a lot of folks who want more control over where their code gets LLM'd. Task Demon has first-class support for OpenAI Codex from yesterday, and we're really excited that Claude Code has more competition to contend with.
So install codex now, and give it a spin. Don't expect it to beat Claude Code just yet, but do expect it to keep getting better, fast.
Share Post:
Ready to transform your development workflow?
Join thousands of developers using Task Demon to build better software, faster. Start your free trial today.
How Task Demon transforms your workflow
Discover the key features that help developers build better software, faster.
AI-Powered Coding
Let intelligent agents help you implement features, fix bugs, and write tests.
Learn moreTask Triage & Planning
Automatically analyze and prioritize tasks with AI that understands your codebase.
Learn morePrivacy & Security
Run AI agents locally on your machine, keeping your code private and secure.
Learn moreGitHub Integration
Seamlessly connect with your repositories and sync issues automatically.
Learn moreExplore Task Demon
Discover more resources to help you get the most out of Task Demon.
Implementation Examples
See how Task Demon helps implement real-world features and fix bugs.
Video Demonstrations
Watch Task Demon in action with step-by-step demonstrations.
Feature Deep Dives
Explore detailed explanations of Task Demon's key features.
Why Task Demon?
Learn why teams choose Task Demon for AI-powered development.
Trusted by development teams
Teams using Task Demon are shipping faster, with higher quality code and less developer burnout.
Task Demon has revolutionized our development process. The AI triage system saves us hours of manual work categorizing and prioritizing issues.
Alex RiveraCTO, TechFlowThe GitHub integration is seamless. It keeps everything in sync and the AI agents actually understand our codebase context when planning implementations.
Jamie ChenLead Developer
As an engineering manager, Task Demon gives me unprecedented visibility into our development process. The AI planning helps my team work more efficiently.
Taylor RodriguezEngineering ManagerThe local agent using Claude Code is a game-changer. We get all the benefits of AI coding assistance while keeping our code private and secure.
Sam PatelSecurity Engineer
The automated documentation feature alone is worth it. Task Demon generates comprehensive docs that actually stay up-to-date with our codebase.
Jordan KumarSenior Developer