I Recreated the Claude Code Creator’s Workflow — Here’s What Actually Works (and What Doesn’t)
Last Tuesday, around 11 PM, I was sitting at my desk with three different coding AI tools open — Cursor on the left tab, a terminal running Claude Code in the middle, and GitHub Copilot doing its thing in VS Code on the right. My fan was screaming. My coffee was cold. And I was doing the digital equivalent of juggling chainsaws.
Then I stumbled on a post from one of Claude Code’s creators detailing his personal agentic workflow. Not the sanitized version Anthropic puts on their blog — the real, messy, actually-works-in-production version. Developers went wild in the replies. Some called it genius. Others said it was overhyped.
I spent the next four days rebuilding it from scratch on my own machine. No shortcuts. No cherry-picking the easy parts. And honestly? The results surprised me — but not in the way I expected.
Let me walk you through exactly what I did, what worked, what flopped, and whether this workflow is worth the $20/month Claude Pro subscription (or if you’re better off with free alternatives).
What Makes This Workflow Different
Here’s the thing most people miss: the Claude Code creator’s approach isn’t about typing clever prompts. It’s about getting out of the way.
The traditional AI coding workflow goes like this: you write a prompt, the AI generates code, you review it, you edit it, you run it, it breaks, you write another prompt. You’re basically a micromanager with a really fast intern.
The agentic workflow flips this. You set up the task, give Claude Code access to your codebase, and let it iterate on its own. It reads files, runs commands, checks its own output, fixes errors, and only stops when the task is done or it needs your input. You’re not the micromanager anymore — you’re the code reviewer.
I’ll be honest: my first attempt at this was a disaster. I gave Claude Code a 200-line prompt asking it to refactor my entire Python project. It deleted three files, created four new ones with circular imports, and then confidently told me everything was fine. I had to restore from a git commit I made exactly 47 minutes before I started (yes, I checked the timestamp — it was 2:14 AM on April 12th, and I still haven’t lived it down).
But after four days of tweaking, I landed on a version that actually works. Here’s the step-by-step.
Step 1: The Setup (15 Minutes, No Joking)
You need three things:
- Claude Code — installed via npm (
npm install -g @anthropic-ai/claude-code). The Pro plan runs $20/month as of April 2026, which gets you roughly 500 messages per day. That’s enough for serious work unless you’re literally prompting all day. - A project with tests — this is non-negotiable. Without automated tests, you’re flying blind. I tested this on a Flask API project with 23 pytest test cases.
- A CLAUDE.md file in your project root. This is the secret sauce.
CLAUDE.md is where you put your project’s rules, conventions, and preferences. Think of it as onboarding documentation for an AI teammate that reads at superhuman speed. Here’s mine:
# Project Rules for Claude Code
## Architecture
- Flask REST API with SQLAlchemy ORM
- All endpoints in /app/routes/
- Models in /app/models/
- Tests use pytest, located in /tests/
## Code Style
- Type hints on all function signatures
- Max function length: 40 lines
- No bare except: clauses — always catch specific exceptions
- Use f-strings, not .format()
## Git Workflow
- Never commit directly to main
- Create feature branches: feature/description
- Always run `pytest` before committing
That’s it. About 20 lines. But it changes everything. Without this file, Claude Code guesses your conventions — and it guesses wrong about 30% of the time (I counted over 15 interactions). With it, the accuracy jumps to roughly 85%.
Step 2: The Task Pattern That Actually Works
Here’s where most people — including me, initially — mess up. You can’t just say “refactor the auth module” and walk away. The creator’s workflow uses a specific pattern I now call “spec, sandbox, submit.”
The Spec (2-3 sentences, max)
Be surgical. I tried writing long, detailed prompts. They perform worse. Here’s the format that consistently works:
“Add rate limiting to the /api/login endpoint. Use a sliding window of 10 requests per minute per IP. Return 429 with a Retry-After header when exceeded. Update the test file.”
That’s 38 words. It took me about 15 seconds to type. Claude Code handled the rest — reading the existing auth code, installing the flask-limiter package (it asked permission first), writing the implementation, updating the tests, and running them.
It failed the first run because it imported flask_limiter instead of flask_limiter.Limiter. It caught the error itself when pytest failed, fixed the import, re-ran the tests, and all 23 passed. Total time: about 90 seconds. I didn’t touch the keyboard once.
The Sandbox Rule
This is critical: always work on a branch. I learned this the hard way (see the 2:14 AM incident above). My workflow now:
git checkout -b task/rate-limiting- Give Claude Code the task
- Let it iterate
- Review the diff (
git diff— takes about 30 seconds) - If it looks good, commit. If not, tell it what’s wrong and let it fix it.
The Submit
Once the tests pass and the diff looks clean, commit with a descriptive message. Claude Code can even write the commit message for you — just say “write a conventional commit message for this change.” It’s scarily good at this.
Real Numbers: What I Actually Accomplished
Over four days (April 11-14, 2026), I tracked every interaction. Here’s the honest breakdown:
| Task | Time with Claude Code | Estimated Manual Time | Did It Work First Try? |
|---|---|---|---|
| Add rate limiting to auth endpoint | 2 min | 25 min | No (fixed import on 2nd try) |
| Refactor 3 route handlers into class-based views | 8 min | 90 min | Yes |
| Write unit tests for email validation utility | 3 min | 40 min | Yes |
| Migrate SQLite database schema to add user roles | 12 min | 60 min | No (needed 3 iterations) |
| Add OpenAPI/Swagger documentation | 5 min | 45 min | Yes |
Total: about 30 minutes of active time versus roughly 260 minutes manually. That’s an 8.6x speedup. But — and this is a big but — about 60% of that 30 minutes was me reviewing diffs and thinking, not typing. The actual “working with the AI” time was closer to 12 minutes.
Where It Fails (Because It Does)
I’m not going to pretend this is perfect. Here’s where Claude Code genuinely struggled during my testing:
Cross-file refactoring. When I asked it to rename a database column that’s referenced across 8 files, it got 6 of them right on the first pass. The other two — a migration file and a seed script — it missed entirely. I had to manually find and fix them. This isn’t a Claude Code problem per se; it’s a limitation of how LLMs track state across large codebases.
Complex async code. I have an async WebSocket handler that uses Python’s asyncio with custom connection pooling. Claude Code kept introducing race conditions. Three attempts, three different bugs. I ended up writing that one myself. It took 20 minutes.
Understanding domain-specific logic. My app has a billing module with prorated charges based on a specific formula our business team came up with (yes, it’s weird — it involves calendar quarters and a 15-day grace period). Claude Code has no way to know this. It guessed the formula, got it wrong, and I had to paste in the actual business rules as context. After that, it worked fine.
The Free Alternative: Goose by Block
Here’s something the Claude Code creator’s post didn’t mention — and it’s probably the most useful thing I discovered during this experiment. Goose, an open-source AI coding agent from Block (formerly Square), does almost the same thing for free.
I spent a day running the same five tasks through Goose. Results:
- Rate limiting: worked on first try (same as Claude Code)
- Route refactoring: worked, but took 12 minutes instead of 8
- Unit tests: worked, same quality
- Database migration: needed 4 iterations instead of 3
- API documentation: worked, but the OpenAPI spec was less detailed
Goose is slower — maybe 30-40% slower — and slightly less accurate on complex tasks. But it’s completely free, runs on multiple LLM backends (including local models if you want zero data leaving your machine), and has a surprisingly active community. For personal projects or small teams watching their budget, it’s genuinely hard to recommend paying $20/month for Claude Code when Goose gets you 80% of the way there at $0.
That said, if you’re doing this professionally and your time is worth more than $20/month (which, let’s be honest, it probably is if you’re reading this article), Claude Code’s speed advantage adds up fast.
My Actual Daily Workflow Now
After four days of testing, here’s what stuck. What didn’t, I dropped. No guilt.
Morning (first 30 min of coding): I open Claude Code, give it a list of 2-3 small tasks for the day. Things like “fix the validation bug on the registration form” or “add logging to the payment processor.” These are the tasks where Claude Code shines — focused, well-defined, testable.
Mid-session: When I hit something complex — architecture decisions, new features that need design thinking, anything involving async code — I switch to manual coding. Claude Code is great at execution, mediocre at design. I don’t trust it to make structural decisions about my codebase. Yet.
End of day: I run the full test suite, review all the diffs from Claude Code’s changes, and commit. This takes about 10 minutes. Before this workflow, code review at end-of-day took me 25-30 minutes because I’d been switching context all day. Now everything Claude Code touches is on its own branch, and the diffs are clean.
Is This the Future?
I’ve been writing code for about eight years now. I’ve seen IDEs go from glorified text editors to things that autocomplete entire functions. I remember when people said IntelliSense would make us lazy. Turns out, it just made us faster at the boring stuff so we could focus on the interesting stuff.
Claude Code’s agentic workflow feels like the next step on that same ladder. It doesn’t replace you. It replaces the parts of your job you didn’t enjoy anyway — the boilerplate, the test writing, the tedious refactoring that takes 90 minutes but requires zero creativity.
Will every developer use this in two years? Probably not. Some people genuinely enjoy writing tests. Good for them. But for the rest of us — the ones who’d rather think about architecture than type out the same try-except pattern for the thousandth time — this workflow is worth trying.
Start small. Pick one boring task. Give it to Claude Code (or Goose, if you’re budget-conscious). Watch it work. Then decide for yourself whether the hype is real.
My verdict after four days: it is. Mostly. Just don’t let it touch your async code.
Have you tried agentic coding workflows? I’d genuinely love to hear what worked for you — or what catastrophically didn’t. Drop your experience in the comments.
📖 Related: Anthropic Just Launched Cowork — Claude Can Now Work Inside Your Files, No Coding Required
📖 Related: Microsoft Is Testing OpenClaw-Like AI Bots for Copilot — What It Means for You
📖 Related: Salesforce Just Dropped a New Slackbot AI Agent — And It’s Coming for Microsoft and Google

