May 5, 20262 min readAI, Engineering, Economics

The AI Stack Is Maturing — And So Should How We Use It

Cheap, unlimited AI is winding down, and the craft is maturing from prompt to context to harness engineering. The value now is in better scaffolding, not more tokens.

A few signals worth surfacing for the team.

On the economics side

Anthropic recently tested removing Claude Code from the $20 plan, nudging users toward the $100 tier.
GitHub Copilot has shifted from flat "actions" billing to token-based usage — heavier models burn quota faster.
Uber reportedly consumed its entire annual AI budget in just 4 months after encouraging maximum usage.
Google sits in a stronger position — their core business funds AI investment, so they're under far less pressure to monetise aggressively than Anthropic or OpenAI.

The takeaway

The era of cheap, unlimited AI access is winding down. Expect tighter limits, more nuanced billing, and continued price changes from the providers we rely on.

On the engineering side

The discipline is evolving too. We've moved through three distinct phases in how we work with AI:

Prompt engineering — crafting the right single-turn input to get a good output.
Context engineering — managing what the model sees: which docs to retrieve, what fits in the window, how to compress history.
Harness engineering — the current frontier. Designing the environment around the agent: which tools it can call, the guardrails, validation loops, sandboxes, and feedback mechanisms that make it reliable across hundreds of autonomous decisions. Coined by Mitchell Hashimoto earlier this year, it's now being adopted across OpenAI, Anthropic, and others.

Three phases, each wrapping the last. Harness engineering is the current frontier — the whole environment around the agent, not just the input.

Why both shifts matter for us

As token costs rise and we move toward more agentic workflows, the value isn't in burning more tokens — it's in building better harnesses around the AI we use. Smarter scaffolding, tighter guardrails, and well-designed feedback loops mean fewer failed runs, less wasted context, and far better economics per outcome.

A harness is everything around the agent — tools, guardrails, validation loops, a sandbox, feedback. The scaffolding is what makes the agent reliable across hundreds of decisions.

Worth thinking about where in our workflows we're still living in "prompt engineering" mode — and where we should be investing in proper harness design.

Keen to hear what others are seeing — anyone already experimenting with harness patterns in their projects?