Every SaaS company building with AI hits the same wall eventually. Your AI bills grow. Someone asks which customers, features, or teams are driving that spend, and no one has a clean answer.
That’s not a tooling gap. It’s a data gap. And the companies feeling it most are the ones where AI isn’t a side feature anymore. It’s the product.
AI Introduced a Consumption-Native Cost That Should Be Easy to Track
Traditional cloud cost attribution was already hard. EC2 instances, RDS clusters, Kafka brokers shared by dozens of workloads, billed by the resource rather than by who consumed it. Attribution required tagging, rule-writing, and a lot of manual upkeep just to get a rough answer.
AI flipped the cost model in one important way: every inference call generates a cost signal that’s directly proportional to usage. Tokens in, tokens out, price per token. Unlike EC2, which runs whether you use it or not, AI cost is consumption-native. It’s inherently measurable at the per-request level.
That should make per-customer AI unit economics easier to track than anything else in the stack. In practice, most teams find it harder.
Billing Data Shows the Total but Not the Story Behind It
Here’s what your billing data shows: OpenAI = $43,000 this month. Bedrock = $12,000.
Here’s what it doesn’t show: which customers drove those numbers. Which feature triggered the most expensive calls. Whether your enterprise tier’s AI usage is actually covered by what you charge, or whether it’s slowly compressing margins.
The reason is structural. Billing exports capture what was billed. They don’t capture why. They record the provider invoice. They don’t see the workload that triggered each call, the customer session behind it, or the product feature that requested the inference. You get the total. You don’t get the story behind it.
Routing through an LLM gateway compounds the problem. Every provider call made on behalf of every upstream service collapses into one line item. It’s an abstraction that obscures cost origin entirely.
Accurate AI Unit Economics Requires Connecting Three Layers Most Tools Never Join
Getting AI unit and token economics right requires understanding where inference cost is actually eroding margin. Products, agents, and customers all generate inference spend, and that spend sprawls across providers, features, and tiers in ways that billing exports never surface. Most teams see the total. They don’t see which part of the business is absorbing the cost.
First, the inference event itself: which model, which endpoint, how many tokens, at what cost. Most billing tools handle this, at least in aggregate.
Second, the workload context: which microservice or container made the call. This is the bridge between the AI provider’s billing and your own infrastructure. Some APM tools can see this if you’ve instrumented your code. Most haven’t.
Third, the business context: which customer, team, or product feature that workload belongs to. This is where almost everyone hits a dead end.
Connecting all three typically requires either extensive manual instrumentation or a way to observe the traffic directly without relying on code-level changes. The more durable approach is to read this data at the network level, at runtime, so it captures everything regardless of whether someone remembered to add a header or a trace ID.
Without Token Economics, Margin Decisions Are Built on Assumptions
Once you can map token consumption to customers, products, and agents, you can ask the questions pricing and finance teams actually need answered. What does it cost to serve this customer’s AI usage, and is the price we’re charging them covering it? Which product features are burning inference budget faster than they generate value, and is the margin on that feature sustainable at scale?
This matters more for AI-heavy products than anywhere else because AI cost is variable and correlated directly with usage intensity. A customer running 10,000 inference calls per day looks nothing like one running 100, even if they’re on the same tier at the same price.
Getting AI unit economics right means being able to see that Customer A cost $620 in AI inference this month, on a contract that generates $2,500. It means knowing whether your enterprise customers are disproportionately expensive to serve compared to your SMB accounts. And it means having data that pricing teams can actually act on, rather than spreadsheet estimates built from beta cohorts.
The companies building defensible AI pricing models are the ones treating per-customer token consumption as a first-class cost signal. Not an afterthought billed monthly and reviewed once a quarter. A runtime signal.
Estimation Breaks Down in Predictable Ways
The alternative is estimation. Teams split AI costs by seat count, request volume, or some engineering heuristic. These feel reasonable until someone looks at the numbers closely.
A few things that break down predictably:
Underpriced enterprise accounts. High-usage enterprise customers generate disproportionate AI costs. Without attribution, those costs get absorbed into your margin silently. The account looks fine on the revenue line. It’s bleeding below it.
Pricing decisions built on simulated usage. Pricing teams model expected AI consumption per tier using load tests or beta user data. Real production usage looks nothing like simulations. You end up with a pricing model that works on paper and erodes at scale.
No root cause when spend spikes. When AI spend jumps 40% month over month, you can see the total increase but not what caused it. One large customer? A new feature rollout? A retry loop that got out of control? Without per-customer attribution, you’re working backward from the bill.
Token Type Breakdown: Where Attribution Gets Precise
Not all token costs are the same. Input tokens and output tokens carry different prices, and the split between them tells you whether a workload’s cost is driven by the prompts going in or the responses coming back. A feature that generates long, detailed outputs costs more per call than one that runs short lookups against the same model.
This is where token-level attribution gets meaningful. Total token spend per customer is useful. But knowing that Customer A’s cost this month came from 80% output tokens on a generative feature while Customer B’s came almost entirely from input-heavy classification calls changes how you think about pricing both accounts.
Attribute gives pricing teams visibility by surfacing token type within AI workloads. Combined with per-customer attribution, pricing teams can see not just what a customer costs but what’s driving that cost and whether it scales the way the pricing model assumes.
AI Unit Economics Belong in the Same View as Cloud Cost
The cleanest version of this is solved when AI cost attribution is part of the same picture as your full infrastructure costs. Not two dashboards with a manual join. One view where Customer A’s true cost to serve, or cost per inference is visible end to end, from compute to data transfer, MongoDB and AI inference, all in one. That’s the visibility that changes how pricing decisions get made. And it requires token-level consumption data tied to customers, not just provider invoices.

The data exists. Every inference call leaves a signal at the network level. The question is whether your cost infrastructure is capturing it, and whether your pricing model is built on what customers actually cost to serve rather than what you assumed they would.
FAQs
AI token cost attribution is the process of mapping token consumption from LLM inference calls to the customers, product features, or teams that generated them. It connects the usage event (which model, how many tokens, at what cost) to the workload that triggered it and the business context behind that workload. Without this, AI billing shows provider totals with no breakdown by who or what drove them.
Billing exports record what your AI provider charged you. They don’t record which microservice made the call, which customer session triggered it, or which product feature initiated the inference. That context exists at the network layer, not in the invoice. Connecting the two requires either instrumentation in your code or a runtime data source that observes traffic directly.
LLM observability tools track prompt quality, latency, and model performance. AI cost attribution maps token consumption to business units: customers, teams, features. The two answer different questions. Observability tells you how your models are performing. Attribution tells you who’s paying for them and whether the pricing covers it.
An LLM gateway routes inference requests from your services to AI providers. From a billing perspective, the gateway appears as the consumer. Your bill shows the gateway calling OpenAI or Bedrock, not the upstream workload that needed the inference. Every service using the gateway collapses into a single line item. Resolving that requires a data source that can match outbound provider calls to the inbound requests that triggered them.
Per-customer token consumption data lets pricing teams set AI tiers based on what customers actually cost to serve, not simulated usage estimates. It surfaces whether enterprise accounts are subsidizing heavy AI usage, which features are burning inference budget faster than they generate margin, and whether current tier pricing holds at scale. Without it, AI pricing models are built on assumptions that production traffic will eventually break.