How to Manage AI Token Costs at Scale (LLM Cost Management)
Why do AI token costs spike as teams scale?
Costs climb fastest right after model upgrades.
On the Business AI Explained podcast, Lanch's Leandro noted that overruns "started to be felt more when the new Anthropic models came out, just because you hit your limits much quicker." He added that "everyone in the market had that same feeling." The spike isn't only about model pricing.
It also depends on who is spending the tokens: "people in the tech team and people who understand the infrastructure behind it more are more responsible with the costs." When access is broad and understanding is uneven, usage outpaces the budget — some companies realized they had hit their token budget five months into the year.
How do you set AI usage limits across a company?
Lanch's answer was to put limits in early rather than after the budget broke: "We actually found that we had to set up limits quite quickly, especially within our internal tools." Use a platform with flexible controls.
Lanch uses LangDock, which has a front end similar to Anthropic and ChatGPT, but lets you "set usage limits really flexibly." Remove the models driving overruns.
The team "took away those models that were making each user hit their limits." This keeps a familiar interface for users while capping the spend that comes from the most expensive models.
“That started to be felt more when the new Anthropic models came out, just because you hit your limits much quicker.”
How do you budget for AI costs in a project?
Lanch builds the cost question into its product requirements process.
As Leandro put it, "one thing we also included in this PRD process was an evaluation of how much it should and will cost going forward to set this up." Budgeting also ties to the build-vs-buy decision.
Source-of-truth systems — CRMs, ERPs, the data warehouse — "are something you shouldn't build with AI, because so much business logic goes into them." The internal tools worth building were the ones "where there just wasn't any solution in the market," like Lanch's route optimization tool.
Scoping cost up front avoids investing AI spend where an off-the-shelf tool already exists.
“We actually found that we had to set up limits quite quickly, especially within our internal tools.”
What causes runaway LLM costs?
Two patterns drive runaway spend.
First, uncontrolled access: when teams are granted full access and unlimited tokens, some hit their token budget five months into the year.
Lanch's host raised the risk of "people going completely nuts and building a bunch of useless tools, where it feels like productivity but it's just procrastination." Second, unmaintainable automation.
Leandro described how AI workflows "are quite hard to maintain, and quite complex to scale, since they create a lot of maintenance if they're not built in the proper way." Complexity that isn't built properly keeps consuming tokens and engineering time without a clear business outcome.
“they realized they hit their token budget five months into the year”
How do you control AI costs without blocking adoption?
The goal is responsible usage, not a freeze.
Lanch kept a familiar, ChatGPT-like front end through LangDock while setting flexible limits and removing the costliest models — so people could keep working without each user hitting their limit.
Cost discipline also tracks understanding: those who "understand the infrastructure behind it more are more responsible with the costs," which makes upskilling part of cost control.
Frequently asked questions.
- Why do AI token costs spike when teams scale?
- According to Lanch, overruns intensified when new Anthropic models launched because teams hit their usage limits much quicker, and everyone in the market felt the same. Cost discipline also varies by user: people who understand the underlying infrastructure are more responsible with spend. Combined with broad, unlimited access, this is how some companies hit their token budget five months into the year.
- How did Lanch set AI usage limits?
- Lanch set up limits quickly, especially within its internal tools. The team uses LangDock — a platform with a front end similar to Anthropic and ChatGPT that lets you set usage limits really flexibly — and removed the models that were making each user hit their limits. This caps the costliest spend while keeping a familiar interface for users.
- How should you budget for AI costs in a project?
- Lanch built cost into its PRD process, adding an evaluation of how much a tool should and will cost going forward to set it up. Budgeting also connects to build-vs-buy: structured source-of-truth systems like CRMs, ERPs and data warehouses shouldn't be built with AI, while the highest-impact internal tools were those where no market solution existed.
- What causes runaway LLM costs?
- First, unlimited access can lead people to build a bunch of useless tools that feel like productivity but are really procrastination, and to hit their token budget five months into the year. Second, AI workflows that aren't built properly are hard to maintain and complex to scale, creating ongoing maintenance and cost.
- Can you control AI costs without blocking adoption?
- Lanch kept a ChatGPT-like front end through LangDock while setting flexible limits and removing the costliest models, so users could keep working without each hitting their limit. Because people who understand the infrastructure are more responsible with costs, upskilling teams is itself part of cost control rather than a barrier to adoption.