Is Unity AI Gateway only available if I already use Databricks?

Effectively yes. Unity AI Gateway is part of Unity Catalog, which is Databricks's data-governance product. You can run Databricks on AWS, Azure, or GCP, but you do need a Databricks account and an active Unity Catalog deployment. There is no standalone version. If you are not on Databricks today, the announcement is most useful as a spec to measure other gateways (LiteLLM, Portkey, Vellum) against.

Does this replace my LLM-based eval suite?

No. Guardrails and evaluations solve different problems. Evaluations are build-time accuracy checks across a representative input distribution: they catch regressions before deploy. Guardrails are run-time policy classifiers on individual requests: they catch real-time violations of explicit policies (PII leaks, explicit content, prompt injection). Both are needed. Anyone marketing guardrails as a replacement for an eval suite is selling something.

What is the difference between LLM guardrails and traditional WAF policies?

WAF policies match on request strings and patterns. LLM guardrails run a classifier model on the full request or response payload, which lets them catch semantic violations a pattern-match misses (a generated response that contains a phone number in any format, a request that contains a prompt-injection pattern in natural language). The cost is latency: a classifier pass adds 50 to 200ms depending on the model you pick. Use guardrails for semantic policies and WAF rules for known patterns.

How does Unity AI Gateway compare to vendor-neutral MCP gateways like LiteLLM or Portkey?

LiteLLM and Portkey are stronger on cross-vendor portability and weaker on enterprise governance. Unity AI Gateway is stronger on the audit and policy story (because it inherits Unity Catalog's permissions model) and weaker on portability (the policies you write are Databricks-specific). If you are a single-vendor enterprise prioritising audit, Unity wins. If you are multi-vendor or portability-first, LiteLLM or Portkey are still the better default for now.

Should we wait for GA before adopting this in production?

Depends on the feature. Payload logging is the lowest-risk to adopt in beta (worst case it logs less than expected, you tune the configuration). Service policies in shadow mode are safe to adopt now. Hard-cap cost controls and guardrails I would run in shadow or log-only mode through the beta period and switch to enforce mode once Databricks publishes the GA blog post. The beta-to-GA cycle on Databricks features is typically 60 to 90 days. Plan the adoption sequence so the shadow period lands you ready for enforce on day one of GA.

MCP governance just became a product: what.

In this post (12 sections)

In this post

Three months ago I sat with the platform lead at an Ahmedabad-based IT services firm and walked them through what their MCP gateway needed before we could put production agents behind it. The list was familiar from every other enterprise engagement I had run that year. Tool-access policy keyed per team. Per-user cost limits with hard caps. Full payload audit. An LLM guardrail layer for the cases where evals are too slow. None of those existed as products. We rebuilt some version of each one from scratch.

Last week Databricks shipped Unity AI Gateway with four beta features that cover most of that list, and the official position is that this is now part of Unity Catalog. If you are running on Databricks, this changes the build-vs-buy calculation for the gateway layer. If you are not, the announcement still matters because it sets the bar for what enterprise MCP governance should look like in 2026.

Why enterprise MCP has been broken until now

MCP shipped its 1.0 spec in May 2026 and the protocol itself is settled. The remote-server registry crossed 500 servers earlier this month. The SDKs are stable in six languages. The protocol is done. The governance layer around the protocol is not.

Almost every MCP server in production today ships with no opinion on who is allowed to call which tool, how much that team can spend, what the request payload looks like in audit, or what to do when an LLM tries to call a tool in a way that violates a policy. In a single-team prototype, none of that matters. In a 5,000-employee enterprise it is the whole conversation.

Tool-access policy. Which agents in which teams are allowed to call which MCP tools. Today this is hand-rolled in gateway middleware or, worse, not enforced at all. (Covered in depth in tool registry design for agentic AI.)
Cost controls. Per-user and per-team caps that fire before the bill arrives, not after. The default in 2025 was a monthly invoice surprise.
Audit logging. The full request and response payload at the gateway boundary, retained for at least 90 days, queryable by user, agent, and tool. Required for any regulated industry. Almost nobody has it in one place.
Guardrail responses. When the LLM produces an output that violates a policy, the gateway intercepts. Custom logic per policy class. Most teams have a hard-coded denylist that nobody has updated since 2024.

Most of the engineering hours I have billed in 2026 enterprise engagements have gone into building that list. Not the agent logic. The gateway-and-governance layer around it.

What Unity AI Gateway actually shipped

Four beta features dropped on May 19, with the framing that they are now part of Unity Catalog (Databricks's existing data-governance product). The pitch is that the same governance model you already use for tables and ML models now extends to LLMs, agent calls, and MCP tools.

LLM-based guardrails. Safety policies that run as classifier passes on the response payload before it returns to the agent. Custom policy classes, not just a hard-coded category list.
Cost controls. Per-user budget alerts and limits, configurable per route or per agent. Hard caps that block the call, not just notify after the fact.
Payload logging. Complete request and response logging for every agent interaction passing through the gateway. Queryable in Unity Catalog the same way you would query a table.
Service policies for MCPs. Tool-access policy: which agents can access which MCP servers and tools, enforced at the gateway boundary.

These are the four things I have been hand-rolling for enterprise clients. Databricks shipped them as configurable surfaces with audit trails. That alone is meaningful. The deeper move is that they are wired into Unity Catalog's existing permission model, which means the team that already owns data governance now owns AI governance too. For enterprises with a mature Unity Catalog deployment, that is the most natural home for these controls. For enterprises without one, it is also the strongest reason yet to start one.

Service policies: tool-access as MCP-native config

This is the feature that closes the biggest gap. Until now, restricting which agents can call which MCP tools was either done in the agent code (insecure, every agent has to opt in), in the MCP server itself (every server reinvents the policy model), or in custom middleware in front of the gateway (works but expensive to maintain).

Service policies move the policy decision to the gateway. The MCP server keeps doing what it does. The agent code is unchanged. The policy is declared centrally, applied to all traffic, and enforced at the boundary.

Concrete example. The IT services client I mentioned earlier has eight teams, four of them shipping agents. One of those teams is the security operations team, with access to high-sensitivity tools (Active Directory writes, firewall rule changes, employee account modifications). Another is the customer-support team, with access to ticket tools and a read-only product knowledge base. Before Unity AI Gateway, restricting the support team's agents from accidentally calling the AD-write tool was either done by trust (terrible) or by running two separate MCP gateways (expensive). Now it is a service policy: the AD-write tool is restricted to agents tagged with the security-ops role.

The policy declarations are SQL-shaped (GRANT EXECUTE ON TOOL <tool> TO <role>) which is jarring at first and then quickly becomes the obvious move. The same people who already write GRANTs for tables write them for agent tool access. The policy review process can sit inside the same change-management workflow that data governance already runs.

A pattern that has surfaced quickly: shadow-mode rollout. You declare the policy with an explicit "log violations, do not block" setting for the first two weeks. Every call that would have been blocked produces an audit entry instead. You read those entries, you find the cases where the policy is wrong (because nobody had documented the cross-team workflow that was actually fine), you adjust, you flip to enforce. Without shadow mode, every policy rollout is also a production outage waiting to happen. With it, the rollout is boring. Boring is the right shape for governance.

Cost limits: the per-user budget that should have existed two years ago

Agent cost overruns are the single most common surprise in enterprise engagements. A team builds a multi-step workflow, ships it, the workflow makes it past Day 1, and by Day 30 the bill is 4x what was budgeted because someone hooked the agent into a webhook that fires more often than expected.

Cost controls in Unity AI Gateway sit at the per-user and per-route level, with three primitives that matter.

Soft alerts at a configurable percentage of budget. Notifies via Unity Catalog's audit channels.
Hard caps that block the next call once the budget is hit. The gateway returns a structured error to the agent.
Per-route configuration. You can set different limits for prototype agents (low cap, alerts only) vs production agents (higher cap, hard block).

The hard-cap behaviour is the bit I would not have shipped a production agent without. Soft alerts are a workplace politeness feature. Hard caps are the thing that actually limits damage. The flip side: a hard cap that fires on a production-critical agent in the middle of the workday is its own incident. The right pattern is layered budgets (per-user soft alert, per-team soft alert, per-team hard cap) with the alerts firing well before the cap. Unity AI Gateway supports the layering. It does not write the policy for you.

Payload logging: the audit story I keep getting asked for

Compliance and audit-team pressure on AI deployments increased noticeably through 2025 and into 2026. Healthcare, financial services, legal, anything regulated: the question is no longer "can we deploy an agent" but "can you show me every prompt and every response for the past 90 days, queryable by user". For most clients the answer was "we have logs but they are scattered across the LLM provider, the gateway, and the agent application, and we cannot reconstruct a full session reliably". That is not an answer that satisfies an auditor.

Payload logging in Unity AI Gateway is the first thing I have seen that gets this right in a single product surface. Full request and response payload retention, time-stamped, indexed by user and agent and tool, queryable from the same Unity Catalog interface that already serves the audit team. The retention configuration is yours to set. The default is 90 days.

Two caveats. PII redaction in the payload is not solved by the product. If your agent processes regulated personal data, you still need a redaction layer (either in the agent code or in a custom Unity Catalog filter on the audit table) before your audit log becomes more of a liability than an asset. Second, the storage cost of full payload logging at scale is non-trivial. We measured roughly $40 per million tokens stored per month on the IT services workload. That is real money at high volume. Tier it by agent class (full payload for production, sampled payload for prototypes) and the cost is manageable. Default to full payload everywhere and the audit log itself becomes a line item that catches the CFO's eye.

LLM guardrails: useful but not a replacement for evals

Guardrails as a category have been over-sold by every vendor in the space since 2024. The line I keep having to repeat in engagements: guardrails are not evaluation. They are a runtime classifier pass that catches a narrow set of policy violations in real time. Evaluations are a build-time eval-suite pass that catches accuracy regressions across a representative input distribution. Different jobs. Both are needed.

Unity AI Gateway's LLM guardrails are the better-than-average kind. Custom policy classes, configurable model for the classifier pass, intercept-and-rewrite or intercept-and-block behaviour. The policies I would actually use them for: PII leak detection (catching the rare case where a model outputs an email address or phone number that was not redacted upstream), explicit content blocking in user-facing applications, prompt-injection signal detection on the input side.

What I would not use them for: replacing the eval suite. Replacing structured-output validation (use Pydantic or an equivalent for that). Replacing rate limits (use the cost-controls feature, not the guardrails). The guardrails are a safety net, not a primary defence. Anyone selling them as a primary defence is selling something.

Who this is for (and who it is not)

If you are a Databricks shop with a Unity Catalog deployment already running, Unity AI Gateway is the closest thing to a drop-in MCP governance layer that has shipped to date. The integration with your existing permissions model alone is worth the migration. Take it.

If you are an enterprise running agents on a different platform (AWS Bedrock, Google Vertex, a self-hosted Anthropic deployment), the announcement still matters as a spec. Databricks just set the bar for what enterprise MCP governance looks like. You can use it as a checklist when you evaluate other gateways (LiteLLM, Portkey, Vellum, the various open-source options) or when you build your own. The four primitives (service policies, cost controls, payload logging, guardrails) are the right four. Whichever vendor solves all four well first wins the enterprise gateway market.

If you are a smaller team without a Unity Catalog deployment, this is not the thing to adopt today. Standing up Unity Catalog purely to get the AI Gateway is too much infrastructure for the return. Most SMB agent deployments are still better served by an open-source gateway or by hand-rolled middleware. The pattern to copy is the four primitives. The product is overkill at that scale.

What this means for non-Databricks gateways

For the half of the market not on Databricks, the announcement sets a clear evaluation matrix. Any gateway you consider in 2026 should answer the four-primitive question. Here is how the current shortlist scores against that, based on what I have actually deployed in client engagements.

LiteLLM. Strong on cross-vendor routing and cost tracking. Per-user cost limits are configurable but coarse-grained. Service policies are not first-class (you build them with custom hooks). Payload logging is solid. Guardrails: integrate-your-own. Best for multi-vendor teams who care about portability.
Portkey. Cost tracking and rate limits are strong. Cache management is the best in this list. Service policies and audit are reasonable but not at Unity Catalog's depth. Guardrails are configurable. Best for cost-conscious teams running across vendors.
Vellum. Strong on prompt experimentation and eval orchestration. Weaker on the runtime-governance primitives. Probably not the right primary gateway, but useful adjacent to one for prompt-engineering workflows.
Open-source rolls (LangGraph + custom middleware). Most flexibility, most engineering overhead. You will build all four primitives yourself, badly the first time, well the second. The hidden cost is the ongoing maintenance once it works.

Nothing here is bad. None of them are Unity AI Gateway, but most enterprise stacks do not need Unity AI Gateway. The question is which primitives you actually use under load and which ones you can live without. Service policies are the hardest to roll yourself well; that is the one to optimise for in the gateway you pick. Cost limits and payload logging are tractable to build on top of any gateway that exposes hooks. Guardrails you can mostly skip on the gateway and run as a separate classifier step (it is more configurable that way anyway).

The PII redaction gap and how to close it

I flagged this above and it deserves its own section because it is the most common follow-up question I get on enterprise MCP rollouts. Payload logging is a privacy footprint. Once you have it, you have created a new data-protection burden: an audit log full of prompts and responses, some of which contain personal data your agent received as input. If you do not redact it, your audit log becomes the largest unredacted PII store in the company.

Unity AI Gateway does not solve this directly. Two patterns close the gap, and they are not mutually exclusive.

Upstream redaction in the agent code. Run a Presidio-shaped redaction pass on user input before it hits the LLM. Tag the redacted input with a structured marker so downstream code knows the data was anonymised. This is the cleanest approach because the redaction happens once, at the boundary.
Audit-table view layer. Create a Unity Catalog view on top of the raw audit log that runs redaction at read time. The raw log stays secured to the data-protection team, the view is what the application team queries. This is the right approach when you need the raw log for the rare investigation but routine access goes through the view.

In practice most regulated clients end up running both. Upstream redaction handles the routine case; the view layer is the access-control story for the rare unredacted-access investigation. Configure the audit table's retention separately from your application logs (your application logs are probably already in a non-regulated S3 bucket, your audit log is now a HIPAA or GDPR liability waiting to be measured).

The migration I would run for a Databricks shop

01
Inventory the tool-access matrix
Map every MCP tool currently exposed to every agent in every team. Many enterprises do not have this written down, which is the first surprise. Spending two days to write it down has often paid back the rest of the migration on its own.
02
Roll service policies in shadow mode
Configure the policies but set them to log-only, not enforce. Watch the audit log for a week. Every violation that appears is either a real misconfiguration or a real undocumented dependency. Both are worth knowing before flipping to enforce mode.
03
Switch payload logging on day one
No reason to wait. The retention is yours to configure. Default to 90 days for production agents and 14 days for prototypes. Tier the storage cost upfront.
04
Cost controls per route, not per user
Per-user limits sound reasonable and turn into incident triggers. Per-route limits (the agent route is the budget unit) is the version that works in practice. Per-user soft alerts on top is the right shape.
05
Guardrails after observability
Add guardrails last, not first. You need the audit log to see what would-be-blocked patterns are actually doing in production, before you flip the block. Configure-and-shadow for at least two weeks before enforcing.
06
Re-evaluate the home-rolled gateway
The middleware that has been doing this work in your codebase can probably retire. Plan the deletion. Migrations that do not delete the old thing have a way of becoming permanent parallel systems that drift apart.

The migration is two to four engineer-weeks for a mid-size enterprise. The cost saving on dropped middleware is real but secondary. The headline win is the audit story. Walking into a compliance review with a Unity Catalog audit log queryable by user, agent, and tool is a different conversation than walking in with a spreadsheet of CloudWatch traces.

What is still missing

Three gaps remain after Unity AI Gateway. None are deal-breakers. All are worth tracking.

First, PII redaction is not solved at the gateway. Payload logging at full fidelity creates a privacy footprint that has to be managed somewhere. Databricks does not yet ship a redaction layer in front of the audit log. For regulated industries this means either an upstream redaction step in the agent code, or a custom view layer on the Unity Catalog audit table. Neither is hard, both are extra work.

Second, cross-vendor policy portability does not exist. The service policies you write for Unity AI Gateway are specific to Databricks. If you move to a different gateway in 2027, you rewrite the policies. The MCP protocol itself does not yet have a portable policy expression. The W3C-shaped standardisation conversation has not started in any meaningful way.

Third, the guardrail classifier is single-pass. For high-throughput production agents (above 50 requests per second on a single route) the classifier pass adds latency that you may not be willing to spend. Tune the classifier model selection carefully. For most enterprise routes the latency is fine. For real-time UX it may not be.

A quick note on adoption sequencing for clients I am advising

For the IT services client I keep referring to, the adoption sequence we are running over the next four weeks is concrete enough to share. Week one: turn on payload logging in production and prototypes at the default retention. No policy changes, just visibility. Week two: inventory the tool-access matrix, write the service policies, deploy them in shadow mode. Week three: read the shadow-mode audit log, fix the policies that are wrong (and there will be several), enable per-route cost controls with soft alerts only. Week four: flip service policies to enforce mode, enable hard caps on the highest-spend routes, retire the home-rolled middleware that has been doing the gateway job. The guardrail rollout is deferred to month two; we want the audit log mature before we add a classifier intercept.

For a smaller engagement (a fintech team of 14 with three production agents and no Unity Catalog deployment), the sequence is different. We are not adopting Unity AI Gateway. We are looking at LiteLLM with custom hooks for service policies. The four-primitive checklist still applies; the implementation does not. If you are at the build-vs-buy stage, agentic AI consulting engagements start with an audit of the existing gateway and registry layers before any new architecture lands.

The point of sharing both is to ground the announcement in two real shapes of adoption. The headline ("Databricks shipped a product") is true. The right response depends entirely on which side of the build-vs-buy line your stack already sits on.

MCP governance has been the conversation nobody had time to ship. Databricks shipped most of it. If you are on the platform, the build-vs-buy decision is closed. If you are not, the spec they just set is the spec your gateway should be measuring against.

Source: Databricks official announcement at https://www.databricks.com/blog/whats-new-unity-ai-gateway-service-policies-guardrails-observability-and-cost-controls-ai. Related: original Unity AI Gateway launch at https://www.databricks.com/blog/ai-gateway-governance-layer-agentic-ai.

MCP governance just became a product: what Databricks Unity AI Gateway changes for enterprise agents

Why enterprise MCP has been broken until now

What Unity AI Gateway actually shipped

Service policies: tool-access as MCP-native config

Cost limits: the per-user budget that should have existed two years ago

Payload logging: the audit story I keep getting asked for

LLM guardrails: useful but not a replacement for evals

Who this is for (and who it is not)

What this means for non-Databricks gateways

The PII redaction gap and how to close it

The migration I would run for a Databricks shop

What is still missing

A quick note on adoption sequencing for clients I am advising

Agentic AI patterns, delivered Thursdays

Questions readers ask about this post

Read next

MCP governance just became a product: what Databricks Unity AI Gateway changes for enterprise agents

Why enterprise MCP has been broken until now

What Unity AI Gateway actually shipped

Service policies: tool-access as MCP-native config

Cost limits: the per-user budget that should have existed two years ago

Payload logging: the audit story I keep getting asked for

LLM guardrails: useful but not a replacement for evals

Who this is for (and who it is not)

What this means for non-Databricks gateways

The PII redaction gap and how to close it

The migration I would run for a Databricks shop

What is still missing

A quick note on adoption sequencing for clients I am advising

Agentic AI patterns, delivered Thursdays

Questions readers ask about this post

Read next

Your agent's supply chain is the attack surface now

How an agentic studio screens, scores and shortlists candidates for your hiring team

The cheapest LLM call is the one you do not make. GitHub's 19-62% token cut, decoded