Microsoft · Azure AI · 2026

Azure OpenAI Service Commitment & Pricing

Azure OpenAI is bought two ways: consumption-based pay-as-you-go tokens, or committed Provisioned Throughput Units bought hourly, monthly, or annually. This buyer-side guide explains how each pricing model works, when a commitment saves money, and the levers that keep generative-AI spend predictable.

Updated June 2026 1,300-Word Guide Microsoft

Azure OpenAI Service is priced on two fundamentally different models, and choosing the wrong one is the single most common cause of runaway generative-AI cost. The first model is pay-as-you-go, billed per token consumed with no commitment. The second is Provisioned Throughput, where you reserve dedicated model-processing capacity measured in Provisioned Throughput Units (PTUs) and pay for that capacity whether or not you use it. Understanding when each model wins is the core buyer-side decision, and it sits inside the wider set of choices covered in our complete Microsoft licensing guide.

Unlike a per-user SKU such as Microsoft 365 E5, Azure OpenAI has no seat count. Cost scales with usage volume and the throughput profile of your workload, which means the procurement question is not "how many licences" but "which consumption model, at what committed level, with what reservation term." This guide walks the buyer through that decision.

The two pricing models

Pay-as-you-go (also called standard or consumption pricing) bills you for the tokens your application sends to and receives from the model. Input tokens (your prompt) and output tokens (the model's response) are priced separately, and the rate differs by model family and by model version. There is no floor and no commitment: a workload that runs for an hour and stops generating cost stops billing. This elasticity is the strength of the consumption model and the reason almost every deployment begins there.

Provisioned Throughput reserves a fixed amount of model-processing capacity that is dedicated to your deployment. You buy PTUs, and that capacity delivers a predictable throughput ceiling (tokens per minute) with more consistent latency than the shared consumption pool. Because the capacity is reserved, you pay for it continuously for the term you commit to, regardless of how much of it you actually use.

How Provisioned Throughput Units work

A PTU is a unit of dedicated capacity. The number of PTUs a deployment needs depends on the model, the prompt size, the generation length, and the calls-per-minute you must sustain at peak. Microsoft publishes minimum PTU deployment sizes per model and a capacity calculator that estimates the PTUs required for a given throughput target. The buyer-side discipline is to size PTUs against sustained peak throughput, not against theoretical maximum demand, because over-provisioning capacity that sits idle is pure waste.

Provisioned Throughput can be purchased on three commitment horizons. Hourly (sometimes described as on-demand provisioned) lets you spin reserved capacity up and down with no longer-term lock-in, useful for predictable daily peaks. Monthly and annual reservations trade flexibility for a lower effective hourly rate: you commit to a number of PTUs for the full term and receive a discount against the hourly price in exchange.

The crossover principle: consumption pricing wins when usage is spiky, low-volume, or still being validated. Provisioned Throughput wins when a workload runs a high, steady volume of tokens around the clock — at which point the dedicated capacity costs less per token than the same volume bought on demand, and delivers steadier latency as a bonus. The break-even is a function of utilisation, so the question to model is: what fraction of the reserved capacity will actually be used?

Consumption vs Provisioned at a glance

DimensionPay-as-you-go (consumption)Provisioned Throughput (PTU)
Billing basisPer input and output token usedPer reserved PTU for the term
CommitmentNoneHourly, monthly, or annual
Cost when idleZeroFull reserved capacity still billed
Latency profileShared pool, more variableDedicated, more consistent
Best fitPilots, spiky or low volumeHigh, steady production throughput
Discount leverNone (rate is fixed by model)Monthly / annual reservation discount

The buyer-side cost levers

Several levers control Azure OpenAI spend independent of the pricing model you select. Model choice is the largest: smaller and more efficient models cost materially less per token than the largest frontier models, and many workloads run perfectly well on a smaller model once prompts are tuned. Matching the model to the task, rather than defaulting to the most capable model everywhere, is the highest-return optimisation.

Prompt and output engineering is the second lever. Because you pay per token, trimming system prompts, capping output length, and avoiding unnecessary context all reduce cost directly. Caching of repeated input context, where supported, reduces the tokens re-billed on every call. Batch processing, where latency is not critical, is typically priced below real-time calls and suits offline workloads such as document classification.

Commitment structure is the third lever, and it is the one procurement controls. Once a workload's steady-state throughput is established through a consumption-based pilot, converting the steady baseline to a Provisioned reservation, while leaving spiky overflow on consumption, captures the reservation discount on the predictable portion without locking in capacity for peaks that rarely occur.

Where this sits in your enterprise agreement

Azure OpenAI consumption draws down against your Microsoft Azure commitment (MACC) or Azure consumption commitment, so generative-AI spend counts toward the cloud commitment you have already negotiated. That linkage matters at renewal: a fast-growing AI workload can consume a commitment faster than forecast, and Microsoft will often use AI growth as a reason to push for a larger forward commitment. Treat AI consumption as a negotiable line in the broader Azure commitment rather than a separate, unmanaged cost. Our Microsoft licensing experts model AI consumption against the wider Azure and Microsoft 365 estate so that AI growth strengthens, rather than weakens, your negotiating position.

It is also worth distinguishing Azure OpenAI Service (a consumption platform you build on) from packaged AI assistants licensed per user, such as Microsoft 365 Copilot or Security Copilot. The former is metered infrastructure; the latter are seat-based SKUs. Many enterprises run both, and the licensing logic for each is different — a point we return to in the complete Microsoft licensing guide.

The decision sequence

The practical buyer sequence is: (1) build and validate on pay-as-you-go so you learn real token volumes and latency requirements before committing a dollar; (2) instrument usage to separate steady baseline throughput from spiky overflow; (3) size PTUs against the steady baseline only; (4) model the reservation discount at realistic utilisation, not best-case; (5) commit the baseline to a monthly or annual reservation and leave overflow on consumption; (6) revisit the split each quarter as volume grows or model choices change.

The mistake to avoid is committing to Provisioned Throughput before usage is understood. A reservation sized against optimistic projections that never materialise locks in idle capacity, and idle PTUs are billed in full. Equally, leaving a high-volume, around-the-clock production workload on pure consumption forever leaves the reservation discount on the table. The right answer is almost always a blend, recalibrated regularly.

For adjacent SKU decisions, compare the per-user licensing logic in Power BI Pro vs Premium, the desktop-virtualisation choice in Windows 365 vs Azure Virtual Desktop, and the AI-assistant pricing in Microsoft Security Copilot pricing. For a structured commitment review, the Microsoft EA guide sets out the full negotiation framework.

The Licensing Edge

Weekly vendor intelligence from former Oracle, SAP, and Microsoft executives, delivered every Tuesday.

Keep Your Azure AI Spend Predictable

We model Azure OpenAI consumption and Provisioned Throughput against real usage, size reservations to actual baselines, and fold AI growth into your wider Microsoft commitment so it strengthens your negotiating position.

Request an Independent Evaluation