I used to think I had a Claude spending problem.
Then I looked at where the tokens were actually going and realized I had a thinking problem.
Most of my usage was iteration. Not complex work. Just me sending vague prompts, getting imperfect outputs, and going back and forth until I got something usable. "Fix this bug." "Make it better." "Actually, consider this constraint I forgot to mention." Four prompts to get what one complete prompt would have produced.
The issue isn't that Claude fails at hard problems. It's that incomplete prompts produce incomplete answers, and then the fix feels like the model's fault.
When I started giving it everything upfront, the difference was immediate. Code context, constraints, edge cases, what I'd already tried. One prompt. Useful output. Done.
The other thing I noticed: I was using it for things that didn't need it. Boilerplate. Simple CRUD. Formatting. Work I could do in five minutes with a bit of focus. Claude genuinely shines when thinking is expensive — debugging something non-obvious, understanding an unfamiliar codebase, making architectural calls that have downstream consequences. That's where the leverage is.
There's also a pattern of repeated thinking that adds up more than anything else. Opening new chats, asking the same kind of question slightly reworded, slightly reframed. The model recomputes the same reasoning each time. When I started batching work and keeping context around, that stopped.
Counterintuitively, longer prompts saved money. Most people write short prompts to "save tokens" but then do five iterations to converge on the result. A longer, well-structured prompt once is cheaper than a conversation that slowly builds toward the same answer.
The people who use Claude efficiently aren't necessarily using it less. They're just more deliberate. They batch work, define constraints early, reuse outputs, and use it as a reviewer as much as a generator.
If your Claude usage feels expensive, it's usually not the pricing. It's that the thinking is fragmented across too many prompts.
One way I dealt with this was building a proper agent setup. Persistent memory, defined roles, structured workflows. Instead of starting fresh every session, the agent already has context. Instead of me composing each prompt manually, the workflow handles it. Usage dropped significantly without losing any of the output quality.
If you want to get your own AI setup working that way, I can help. The agent configurations and skills I use are available and ready to run.