Most businesses are walking into AI with one assumption: “this will save us a ton of time and money.” We picture chatbots handling customer service 24/7, dynamic product descriptions across thousands of e-commerce SKUs, and real-time hyper-personalization that lifts conversion overnight.
It sounds like a perfect plan. But before you wire AI into your own site, you need to know two stories that shook the market in May 2026.
First, Microsoft — the company that more or less defined this wave of AI — quietly cut its engineers off from Claude Code, Anthropic’s flagship coding agent.1, 2 A week later, Uber publicly admitted it had burned its entire annual AI budget in four months, with its CTO saying it bluntly: “I’m back to the drawing board.”3, 4
The cause in both cases is the same — and it’s exactly the mechanism that can wipe out the budget for your store or portal if you bolt on AI without a plan.
What exactly did Microsoft do?
On May 25, 2026, The Next Web reported that Microsoft was cancelling most internal Claude Code licenses across its Experiences & Devices division — the teams behind Windows, Microsoft 365, Outlook, Teams, and Surface.1 Engineers in that division have until June 30, 2026 (the last day of Microsoft’s fiscal year) to migrate to GitHub Copilot CLI, Microsoft’s own command-line coding tool.1, 2
What matters here:
- Claude Code was rolled out internally in December 2025 — the experiment lasted less than six months.1, 2
- Microsoft is not breaking with Anthropic. Claude models remain available through Microsoft Foundry and Microsoft 365 Copilot.2
- The official justification is “toolchain unification.” EVP Rajesh Jha argued that Copilot CLI has one critical advantage: Microsoft can “directly shape the product through GitHub.”2
- In practice, industry reporting suggests Claude Code had become “perhaps a little too popular” inside Microsoft — engineers reached for it over Copilot, and that drove token costs sharply upward.1, 2
This is a rare picture: a company with over $200 billion in revenue rolling back a tool that objectively worked — precisely because it worked too well.
Uber: an entire year of AI budget burned in four months
On May 26, Fortune published an interview with Praveen Neppalli Naga, Uber’s CTO, and Andrew Macdonald, the company’s COO.3 The numbers are staggering:
- In December 2025, Uber rolled out Claude Code (followed shortly by Cursor) to a team of 5,000 engineers.1, 5, 6
- Adoption exploded: from 32% of engineers in February to 84% by March 2026.1
- Per-engineer token costs reached $500–$2,000 per month.1, 3
- Uber’s annual AI tooling budget — reported by industry sources at ~$3.4B — was burned through in four months.5, 6
- Uber further accelerated usage with an internal leaderboard ranking teams by AI tool calls — a now-textbook lesson in how not to design incentives.6
The quote that became the headline: “I’m back to the drawing board, because the budget I thought I would need is blown away already.” — Praveen Neppalli Naga, CTO, Uber.1, 3
In the same interview, COO Andrew Macdonald added that the real problem isn’t productivity itself but the missing direct link between AI spend and shipped user features: “If you’re not actually able to draw a direct line to how [many] useful features and functionality you’re shipping to your users, that trade becomes harder to justify.”3
For the record — Uber is not giving up on AI. CEO Dara Khosrowshahi says 10% of code committed to Uber’s repositories is now produced by autonomous agents.3 The company simply has to redesign the economics.
The token paradox: the better the tool, the higher the bill
Both stories illustrate the same uncomfortable mechanic of the enterprise AI market. Models are billed by tokens — chunks of text sent to the AI (the prompt) and generated by the AI (the response). Agentic tools like Claude Code and Cursor decide for themselves how many requests to fire off to satisfy the user’s goal.1
The consequence is brutal:
The more useful the tool, the more often engineers reach for it. The more often they reach for it, the more tokens it consumes. The more tokens — the higher the bill.1
Microsoft and Uber proved that without active cost-control mechanisms, this curve doesn’t flatten on its own. A successful deployment produces a budget failure.
And this is not a two-company problem — it’s an industry-wide shift. According to industry reports, AI tool prices in the US climbed 20–37% in 2026, and starting June 1, 2026, GitHub changes its Copilot billing model.5 The whole market is moving the cost onto the end customer.
If you want to understand what actually happens under the hood of these agents and why they consume so many tokens, I broke it down in the analysis of the 512,000-line Claude Code leak — the internal architecture is far more elaborate than what the user sees.
What does this have to do with your portal or e-commerce store?
“Hold on — I’m not Microsoft, I’m not Uber. My site is a shoe store or a services portal. This doesn’t apply to me.”
It does. The scale is different; the mechanism is identical.
Every AI language-model API (OpenAI, Anthropic, Google, Mistral) is billed per token. Imagine you wire up an advanced chatbot or product-description generator on top of an expensive model without guardrails. All it takes is:
- Your store catches a sudden traffic spike from a viral social campaign,
- A competitor’s bot starts looping queries against your assistant,
- Users start treating your e-commerce chatbot like a free ChatGPT for writing essays.
The result? You wake up on Monday with a weekend API bill in the thousands of dollars. That’s the exact “better = more expensive” mechanic that forced Microsoft to back out of Claude Code.
How to deploy AI on a website without going bankrupt
The Silicon Valley lesson is clear: an amateur API hookup is a ticking time bomb. Successful AI deployment in modern web applications isn’t about blindly integrating “the smartest” model on the market. The key is an optimized, secure architecture.
What actually keeps costs down:
- Hybrid model routing. For simple queries (“what are your shipping costs?”), cheap, fast models (Haiku/Mini-class) are used. The expensive models are reserved for complex reasoning.
- Semantic caching. If ten customers ask the chatbot the same thing, the system recognizes intent similarity and serves the cached answer — cost per repeat: $0.
- RAG architecture (Retrieval-Augmented Generation). Instead of letting the model “hallucinate,” we lock it inside your company’s knowledge base. Shorter context = fewer tokens = lower bill.
- Hard rate limits and budget alerts. Server-side guardrails that physically cut off traffic during bot attacks or daily-threshold breaches. You decide the maximum you’re willing to spend.
- Routing by user value. A logged-in customer with an $800 cart is handled differently from an anonymous user who just hit message four with your chatbot about their favorite cats. Microsoft and Uber got steamrolled precisely because every call carried the same cost priority.
Quick reference: cheap vs expensive AI stack
| Element | Naive deployment | Cost-resilient architecture |
|---|---|---|
| Model | One expensive model for everything | Routing: cheap for simple, expensive for complex |
| Cache | None — every question billed separately | Semantic cache for repeated intents |
| Knowledge | ”Open” prompts, hallucinations | RAG locked to your data |
| Limits | None — sky’s the limit | Rate limiting + hard daily budget + alerts |
| Risk | Thousands of dollars per weekend | Predictable, controlled spend |
Summary
Microsoft and Uber are today’s best case study of what not to do when deploying AI into a product. Microsoft is rolling back Claude Code in Experiences & Devices by June 30, 2026 — even though the tool objectively worked well.1, 2 Uber publicly admitted its CTO is “back to the drawing board” after burning a full year of AI budget in four months.3
The takeaway for anyone planning AI on their own site or store: cost-management architecture has to be built in from day one — model routing, semantic cache, RAG, rate limiting, alerts. You can’t bolt it on after the invoice arrives.
Want to add AI to your store or portal with full control over cost and ROI? I build next-generation websites and web applications — smartly, securely, and at scale.
Let’s talk about AI in your business →
Frequently Asked Questions
Is Microsoft fully breaking with Anthropic?
No. Microsoft is cancelling internal Claude Code licenses in the Experiences & Devices division by June 30, 2026, but Claude models remain available through Microsoft Foundry and Microsoft 365 Copilot. This is a pullback from one specific agentic tool — not a breakup with Anthropic.
What is Microsoft moving its engineers to?
To GitHub Copilot CLI — Microsoft's own command-line coding tool. The official rationale is "toolchain unification." EVP Rajesh Jha pointed to the key advantage: Microsoft can directly shape the product through GitHub.
How much does agentic AI cost at enterprise scale?
According to Uber's CTO Praveen Neppalli Naga — $500–$2,000 per engineer per month. With 5,000 engineers, that was enough to burn through Uber's estimated ~$3.4B annual AI budget in four months. That's developer tooling; a chatbot on a website is cheaper, but the "more usage = bigger bill" mechanic is identical.
What does it mean that AI is "billed per token"?
A token is a chunk of text — roughly four characters or a fragment of a word. You pay separately for the prompt sent to the AI and the response it generates. The longer the context (full chat history + attached documents) and the longer the response, the more tokens — and the higher the bill. Agentic AI additionally decides on its own how many calls to make, which can multiply the bill compared to a regular chat.
How do I protect the AI budget on my own site or store?
Four layers: model routing (cheap models for simple queries, expensive only for complex), semantic caching (cache and serve recurring intents), RAG architecture (lock context to your own data — shorter prompts, fewer tokens), and hard limits (server-side rate limiting + daily budget + alerts). Without these, a single bot attack or viral spike can produce a multi-thousand-dollar weekend bill.
Is this problem unique to Claude Code?
No — it's a general problem of agentic AI billed per token. The same mechanism applies to Cursor, GitHub Copilot, OpenAI tools, and Google models. Uber, in fact, used Claude Code and Cursor in parallel. AI tool prices in the US climbed 20–37% in 2026, and from June 1, 2026, GitHub changes Copilot's billing model — the whole market is pushing cost onto the end customer.
Sources
-
Microsoft’s quiet Claude Code retreat and the real cost of enterprise AI — The Next Web (May 25, 2026) https://thenextweb.com/news/microsoft-claude-code-retreat-ai-cost
-
Microsoft cancels Claude Code licences after engineers use it too much — People Matters https://www.peoplematters.in/news/ai-and-emerging-tech/microsoft-cancels-claude-code-licences-after-engineers-use-it-too-much-49918
-
Uber burned through its entire 2026 AI budget in four months — Fortune (May 26, 2026) https://fortune.com/2026/05/26/uber-coo-ai-spending-tokens-claude-code/
-
Uber’s Anthropic AI Push Hits A Wall — CTO Says Budget Struggles Despite $3.4B Spend — Yahoo Finance https://finance.yahoo.com/sectors/technology/articles/ubers-anthropic-ai-push-hits-223109852.html
-
Microsoft cancels Claude Code licenses as AI costs surge across the industry — Crypto Briefing https://cryptobriefing.com/microsoft-cancels-claude-code-ai-costs/
-
Uber Spends Full 2026 AI Budget in 4 Months — Briefs.co https://www.briefs.co/news/uber-torches-entire-2026-ai-budget-on-claude-code-in-four-months/



