About

Arbitrage is a Drug Wars clone. Buy low, sell high. Instead of drugs we've got a broad selection of mostly legal goods.

This Game is the result of Claude Code experimentation. With the exception of this page, the entirety of this codebase has been written by an LLM.

Github: Github
Claude Code: Claude.ai

Overall: if you don't care about the details, you can generate a lot of code very, very fast. If you care about things being correct, you're limited by your ability to review code and retain the ever-changing mental model, and your ability to talk to the machine in a way that gets your requirements met.

I'm using Claude like an ephemeral team of devs and I'm the architect/pm/qa just chatting with the team over Slack and validating changes with a browser. I took some notes on my general experience, December 2025.

The future is code review. So much code review. Claude can yack out a lot of decent Ruby code really fast, but I still need to read it, ingest it, and add it to my mental model so that I can instruct the agent appropriately during next steps. Unfortunately, Claude is slow enough that it's very easy to become disengaged from the work. A lot of larger tasks are right in that sweet spot of taking 5 to 10 minutes to chew through, especially when Claude knows it needs to write tests after and it goes into a debug & fix loop, and short tasks can become tedious to watch when the Agent is doing everything right. This is long enough that my eyes and thoughts start to wander, and too short to switch over to another task. Plus, Claude is dirtying up my git stage and I haven't investigated parallel environments so I can run the Agent as a side task.
I don't see how someone can effectively guide an agent unless that person has a deep knowledge of the system, its data model, and a firm opinion on what will be built next. Non-specific and non-technical prompts that leave space for Claude to be "creative" are very hit or miss. Sometimes you'll love the solution Claude came up with, but mostly you'll hate it. Claude sometimes asks clarifying questions, especially in Plan mode, but it makes bold assumptions once it's in execution mode.
I asked Claude to implement the News pages' Timeline view before I had it implement some of the modeling that would log the data needed to build that view. It went off and tried its best. And I mean it, it really tried. It picked an arbitrary created_at timestamp that, with some creative thinking, might represent the timeline that was needed. It spent 10 minutes building an a bunch of functions, unit tests, and request tests that fundamentally were just torturing said timestamp to force a "timeline" that wasn't related to anything in the game. All the tests passed!

Ultimately, as a developer, we are responsible for the code we ship and review. For me the act of writing a hash, accessing a hash, writing a spec that validates the usage of said hash, all of these things cement a data structure into my memory. Using Claude I find myself reaching for things like route comments and schema definitions a lot more when I haven't personally written the code I'm working on. How does that table relate to another table? Where did we put that functionality?
I've started to treat model context the same way we ideally want branches & PRs to be treated. Focused on one contextual change. As small as possible. Disposable. Ultimately, a liability. I want to start a new context as fast as I can, the longer my conversation goes with Claude and the more topics in the conversation, the more it starts to make dumb mistakes. Long Claude contexts give me the same itch between my shoulder blades that I get when I know my local branch is dirty, or getting too big. That said, as the project has filled out with features I've noticed less of this wonky behavior.
Keep your git history clean. As soon as you have a logical change set that you're happy with, get it committed! You NEED the ability to reset a dirty branch to clean and re-try a with the prompt phrased differently.
Avoid the temptation to have the agent "do one more thing" on a dirty branch you're happy with ESPECIALLY if that one more thing is not immediately related to what you just implemented. It feels kinda nostalgic, like using Photoshop on an crappy computer before auto-save was a thing. Ooooo it crashed again, I wonder when I last hit the hotkey.
"Make a Plan and write it down" is a magic phrase, and goes hand-in-hand with keeping the context small. When I start a new feature I ask Claude to "make a plan for implementing X". I give Claude focused direction for a specific context, a couple of examples, and I explicitly tell it to make a plan for me to review. I now have a `docs/` folder in the project full of markdown files it has generated from each plan. I'll go back and forth with Claude refining the markdown file, and once I think it looks good I commit it and start a new context, tell Claude to read the new plan and start implementing it. I have not measured the effectiveness, but the vibe I have is Claude's chance of one-shotting a concept is greatly increased when it runs off of any level of formalized plan. Formalizing the plan also gives *you* the chance to fill in any assumptions you made. The more specific the plan gets the better.
Claude's knowledge of Ruby, Rails, and its ecosystem is complete, but Claude doesn't necessarily know what's current. Claude kept writing specs using shoulda matchers despite this project not using that gem. It tried to install the gem in the middle of other tasks. Shoulda was a huge thing early in Rails' life and seems to appear in 50% of the Rails Stack Overflow posts that I run across, so I guess this makes sense. Updating the CLAUDE file explicitly saying that we don't use shoulda mostly fixed the issue, now when it writes a shoulda matcher it catches the issue on a test run and refactors the usage out.
Claude is happy to test every damn thing, it really wants to reach 100% testing coverage and it will do it in the most brute-force manner if left to its own devices. I've never seen so much `.private_send` usage in my life. Updating the CLAUDE file with instructions on how you want specs written is important. Tell it what not to do.
Claude uses very interesting debugging techniques. I've seen it curl the local server and grep through the DOM to validate an issue, a lot. Actually, it told itself how to do this and logged it to the style guide. I've seen it fire up ruby prompt with a 5 line command injecting the running of the base action controller renderer, and `private_send`'ing its way to rendering a specific template, then use that DOM to diagnose an issue. Not sure I'd do it that way but it's something to file away for later.
It can make changes where you wouldn't expect that are tricky to catch. When I told Claude it was time to refactor the SCSS into partials, it also changed a bunch of colors, and probably more things that I haven't really noticed. This is actually a Big Deal, there is no circumstance where I would expect a simple "refactor this big file into logical small files" pull request to also include a bunch of unit manipulation.
This kind of goofup is also tricky to catch in Code Review. The reviewer gets to look at the big file being deleted and XX number of small new files. Neither of these things will show a proper diff. The reviewer would have to go line-by-line and compare sections of the deleted file to the new partials. I think in most cases I'd be checking the naming and inclusion of partials, and probably run off of trust that the Jr. Engineer didn't decide that ORANGE would be the new color.

Side note, it actually took Claude more time to do that refactor than it would have taken me to copy-paste into partials. It writes really, really verbose CSS, and it kinda choked on application.scss's size.
I'm 100% sure there are a ton of bugs in here. The amount of times I caught a dumb bug was pretty high, which means any code I didn't review closely is highly suspect. The code that does all the economy moving math, for instance, I just YOLO'd whatever Claude gave me and approved it when the charts and colors looked right. I'm sure it's riddled with bugs, but I'm limited by my willingness to double-check the math.