“Sorry Jim, we just don’t have it in the budget to give you a raise.”
Oh no! Anyway, what did you guys have for breakfast today?
Isn’t there a saying for this? If you owe a bank $10,000, it’s your problem. If you owe a bank $100 million, it’s their problem.
That’s why I refuse to use most cloud API keys. The API keys always default to take the money first. Anthropomorphic already has their money. They’re not getting it back.
I’m just a home user and worry that if my security slipped for one moment over my entire life, my keys would be leaked and I’d see a surprise $1k Google cloud bill.
I use openrouter for my of my paid online calls, when setting up API keys you get to set a budget. Easy. It’s also pre-paid so unless I turn on auto-charge for my credit card (which is off by default) it will never blow more than I’ve already loaded. Two layers of safety are better than one.
A single engineer experimenting with agentic coding workflows can rack up hundreds or thousands of dollars in usage costs in a month. Multiply that across an enterprise with unrestricted access, and the numbers become difficult to contain.
I’m not doubting it, but I don’t think I could personally spend more than about $1.5-3k per month. And that’s with like 12-16 hour sessions doing troubleshooting and RCA while it builds tools to dig deeper.
Aside: RCA typically takes me 15-30 minutes by hand and, subsequent to a recent major deployment, we’re having thousands of incidents per day from like ten per week, so I’m faced with building scripts to read from a half dozen systems and collect maybe 100k Kibana logs out of tens of millions and then categorize them by fix so that we can feed those entries into scripts to repair data integrity.
I max out at about $100/day. That’s about all the output I can review. Most days I don’t use it at all but this month has been wild. I can’t imagine what the fuck someone is doing to spend even $30k in a month. There’s no way there’s any human code review going on at all. I’m amazed by Claude, but it’s not that good.
I know people at work that spend much, much more than this. They’re what I’d describe as “fully AI native”. Honestly I don’t know how they handle it since it seems like a lot of work.
They have over a dozen agents, all using Claude Opus in fast mode. The agents have roles - for example, one for technical architecture, one for UI design, one for building overall plans, one for coding, one for security review, one for code review, etc. They run codemods (automated code cleanup and migration to newer APIs) using AI. Their backlog/wishlist tasks are completed using AI. They have several OpenClaw-style bots that respond to Google Chat messages, run periodic jobs, summarize emails, etc.
If you want an extreme example… The developer of OpenClaw is “spending” $1.3 million per month on its development: https://www.businessinsider.com/openclaw-peter-steinberger-ai-token-bill-2026-5. He works at OpenAI so of course he doesn’t have to pay for it.
You could build a significantly better, higher quality product if you spent that much on actual humans…
Use the most expensive model available and run multiple tasks in parallel.
But… but code review. It does not get everything right. And an error early on could cascade. Idk… not my circus, not my monkeys, but that story is insane.
The AI agents do a lot more than write code though. They can summarize meetings and emails, prepare project plans, create interactive design mockups, keep track of what you work on and write weekly/monthly summaries, create reports based on A/B test data, etc. If someone is heavily using AI, coding is just one part of it.
I use it quite a bit for planning and partially implementing side projects at work. Stuff that isn’t my normal day-to-day project. They’re usually APIs or internal webapps that I’d find useful but don’t have time to do all the work myself.
For example, we use Google Chat at work, but its management of custom Emoji isn’t great. I created an internal tool that shows all custom Emoji sorted by how frequently they’re used, and allow people to vote on deletion (since we have a bunch of duplicates). I used AI to plan it, build the entities, write the code to hit Google’s API, etc. I had it running in the background while working on other more important projects.
I treat it like an intern or a new grad. Assume the code won’t be great, but I can guide it to do the right things.
Generally what I do. I use ChatGPT for anything I can copy and paste in . I use Opus for "look at the service <<here>> and write a script that I can use as part of an investigation script. Follow development standards described CLAUDE.md.
Or look at the log files in this folder and describe failure modes. Use these services to find pull data to support or refute your findings. Write everything to an xlsx spreadsheet.
I feel like I’m not an amateur with AI. I’ve been here from the beginning. Since the inception of AI Dungeon. I’ve written projects that leverage AI. I would consider myself a power user, except I can’t hold a candle against some of these folks. Blows my mind. But I humbly tip my hat. Thanks for the reply.
You have more ai agents doing code review!
I’m sure if I tried I could expend a wild amount with ai. Not that it would be particularly useful, but I’m sure I could make some money into pollution.
I do wish we knew more about the company, 0.5 billion dollars is…. A lot.
You’re working too much
I enjoy it, honestly. It’s insane to me that someone wants to pay me as much as they do to play with computers and mentor and learn from others.
These models run on normal computers, and they are giving them away.
Does your company not have computers?
Not really. The state of the art models are huge, even the open-weight ones. You really don’t want to quantize below 4-bit, and even that’s a bit of a stretch… Ideally you’d use at least 8-bit to get good results with these models when used for coding.
GLM-5.1 needs around 400GB VRAM at 4-bit quantization. Apple aren’t making the Mac Studio with 512GB unified RAM any more, so you’d need something like 5 x Nvidia A100 80GB to run a model like this.
Kimi K2.6 is around the same size.
Distillation works better than quantization, to the point Qwen recently out-benchmarked its 397B model with a 27B model, two months apart. Arguably the only reason to train comically large models is that this is a decent strategy for finding very small models.




