Skip to main content

Writing

Agent Experience (AX) and the Agent Experience Interface (AXI): a working guide

13 min read

Most software was built for humans clicking buttons or developers reading docs. Sometime in the last 18 months a different kind of traffic showed up: AI agents using your product on a user’s behalf. They sign up for accounts, call APIs, read documentation, run CLI commands. Most of them never see your homepage.

If your product is hard for those agents to use, the human paying the bill notices, because the agent they “hired” to do the job either fails or burns through tokens trying.

In January 2025, Netlify’s CEO Mathias Biilmann gave that gap a name: Agent Experience (AX). A more recent and much narrower idea, the Agent Experience Interface (AXI), focuses on one specific question inside AX. What does a CLI tool look like when you take the agent’s constraints seriously?

This post walks through what each concept actually means according to the official sources, how they relate, and what I’d ask clients to change today.

Where AX comes from

The original definition is short. Biilmann calls AX “the holistic experience AI agents will have as the user of a product or platform.” He places it alongside two earlier ideas:

  • User Experience (UX), coined at Apple in 1993, about humans interacting with software.
  • Developer Experience (DX), popularised around 2011, about humans building on top of software.
  • Agent Experience (AX), 2025, about agents using software on a human’s behalf.

AX sits next to UX and DX. The claim is that products which only behave well when a human is at the keyboard will quietly start feeling weaker than products which also behave well when an agent is. Biilmann put it plainly in his original essay: companies should “start consciously designing the AX of their products, or risk being replaced by tools that empower their customers to harness the exponential power of seamlessly collaborating with agents.”

In a recent example I was trying to setup a project which used a particular tool from a company that I will not name here (the company is a really well known, large IT organisation). The agent started its work and after a while I looked at my terminal window and I was greeted with the following:

REDACTED is installed globally. Now I need you to authenticate with REDACTED — this step is interactive (it opens a browser for OAuth), so you'll need to run it.

Please run this in your terminal (or type it here with the ! prefix so the output lands in our session):

! REDACTED login

It'll open a browser, ask you to authorize REDACTED against your REDACTED account, then store the token locally so all subsequent commands work.

Once that's done, tell me and I'll:
...list of things the agent would do once I'm done with the auth...

This is the type of scenario that AX tries to overcome.

The four areas of AX

Across “Introducing AX” and the “One Year of AX” follow-up, Biilmann settles on four areas worth designing for. I’ve kept the names from his own writing, because once you start renaming them you lose the connection to the wider conversation that’s now underway.

Access

Can an agent actually do the thing? The question goes beyond “do you have an API.” It’s whether an agent acting on a user’s behalf can authenticate, perform meaningful work, and hand the result back without manual login forms, captchas, email confirmations, or other handshakes designed exclusively for browsers.

Two examples from the source material make this concrete. Clerk redesigned its SDK so that “an agent could fully install, set up and preview Clerk without any human-in-the-loop interaction for sign-up, login or creating an auth instance.” Netlify allows anonymous deploys that can be claimed afterwards by the human via a signed link, so agents can ship work first and ownership gets assigned second.

Quick diagnostic: if your sign-up flow assumes a human is sitting in front of a browser, your product probably has no real Access for agents right now.

Context

Agents don’t know your product. They know what their training data contained, which may be a year out of date, plus whatever context you provide at runtime. Context engineering is the deliberate work of supplying that material in a useful shape: documentation written in formats agents can ingest, an llms.txt at the root of your domain, a markdown export button on your docs pages, and increasingly an MCP server exposing live information instead of static text.

The wider shift across the field is from prompt engineering (how do I phrase the request?) to context engineering (what does the model see at the moment it has to act?). For product teams, the practical consequence is that engineering now owns context as a product surface.

Tools

Once an agent has access and context, what can it actually do? This is the surface-area question. APIs are the obvious answer. MCP servers, CLIs, and structured webhooks all count too. The interesting design choices show up in the shape of those tools, which is exactly where AXI enters the picture.

Biilmann’s cautionary example is Salesforce’s “Agentforce” strategy, which positions Salesforce’s own agent as the way to interact with Salesforce-locked data. HubSpot took the opposite route: an MCP interface that lets any agent (Claude, ChatGPT, an internal one) operate against HubSpot data on the user’s behalf. Both approaches work for the next few quarters. I think only the second one ages well, because customers increasingly want to bring the agent they already trust along with them.

Orchestration

Finally, can your platform start an agent? Linear lets users @mention an external agent inside a ticket. The agent then runs work and reports back inside Linear itself. The handover happens in the user’s existing workflow, so the user doesn’t have to remember to open a new chat tab.

Orchestration is the AX equivalent of webhooks for a SaaS product. The product can trigger agents on events its users care about, instead of waiting for users to bring agents to it.

That’s the map. Any company that takes AX seriously should, over time, have something deliberate to say about each of the four areas. In my experience, most still don’t.

What AXI is, and why it’s a separate idea

If AX is the philosophy, the Agent Experience Interface (AXI) is a much narrower technical proposal. It defines ten design principles for building command-line tools that agents use efficiently, and it’s published at axi.md. The principles came out of empirical work measuring how agents actually use CLIs and MCP servers in real tasks.

The starting observation is uncomfortable. Both raw CLIs and MCP-style tools, in their current form, waste tokens when an agent is the caller:

  • A normal CLI separates action from observation. The agent runs a command, gets back “OK,” and then has to call another command to see what happened. Every meaningful step becomes two invocations.
  • An MCP server, in current implementations, ships its full tool schema into the model’s context on every call. Past a few dozen tools, the schema overhead alone can push input tokens from around 80K to 185K per task.
  • Both struggle with discoverability. An agent that doesn’t know a flag exists has no efficient way to find it.

AXI is an attempt to design CLIs that don’t have these problems. The ten principles cluster into four buckets (efficiency, robustness, discoverability, and support), but it’s easier to summarise them by what they change in the agent’s loop.

The first set concerns output. AXI tools default to a format called TOON (Token-Oriented Object Notation), which strips the braces, quotes, and commas of JSON while staying unambiguous to language models. A list of issues drops from roughly 35 tokens per item in JSON to about 21 in TOON, around 40% saving on the wire. Default schemas stay minimal at three or four fields per row, and the agent can ask for more if it needs them. Long text fields get truncated with explicit hints about how to fetch the full body.

The second set concerns what happens after a call. Mutations should be idempotent, so an agent retrying a failed step doesn’t double-write. Errors should come back structured, on stdout, with proper exit codes, which gives the agent something concrete to parse. Empty results should ship as explicit zeros with a clear “no rows” message, so the agent doesn’t have to interpret a blank line. Pre-computed aggregates such as total counts should be included where they’d otherwise cost an extra round-trip. The aim is to let the agent reason about what just happened without having to make another probing call.

The third set is about discovery. Running a command with no arguments should show live, actionable data; the help screen sits behind --help. After each output, the tool can append one or two relevant follow-on commands with placeholders the agent fills in. Every subcommand offers a consistent --help. Session-level hooks populate ambient context at the start of a conversation, so the agent doesn’t have to ask “what’s available here?” before doing anything useful.

The empirical case is worth pausing on. The AXI authors ran 490 browser-automation tasks across three implementations of the same underlying tool: their AXI version, an MCP version, and a raw CLI. The AXI version completed 100% of tasks at an average of $0.074 each. The MCP equivalent hit 99% at $0.100. The raw CLI sat in between. A 425-run GitHub benchmark told a similar story: AXI scored 100% at $0.050 a task, while the MCP versions came in between 82% and 87% at two to three times the cost.

Benchmarks aren’t the final word on anything. They’re easy to gerrymander, and these came from the AXI authors themselves. But they do suggest something the marketing material around MCP currently glosses over. The shape of the interface is a first-order driver of agent cost and reliability.

How AX and AXI relate

It helps to think of them at different scopes. AX asks the broad question (what does my product look like to an agent?), which covers authentication, documentation, tools, orchestration, the lot. AXI answers a narrower version: if I’m building a CLI that agents will run, what does it need to look like to be cheap, reliable, and discoverable? Most companies will need to think about AX. A subset, namely the ones shipping CLIs, SDKs, or MCP servers, will benefit from also thinking about AXI specifically.

Quick test for which conversation you’re in. If your team is debating whether to ship an MCP server or a CLI for a particular agent integration, you’re in AXI territory, and the benchmark numbers above are directly relevant. If your team is debating whether agents can sign up to your product at all without a human re-typing a password, you’re in AX territory, and the fix is structural before it ever touches the interface shape.

What companies should actually do

I’ve had a version of this conversation with several teams over the past few months. The priority order shifts by company, but the work tends to cluster into the same handful of moves.

Start by measuring agent traffic separately from human traffic. Most analytics setups don’t, and you can’t manage what you can’t see. At minimum, instrument your API and login endpoints to identify common agent user-agents (Claude, ChatGPT, Cursor, and the rest) and track sign-ups, errors, and time-to-first-success for that cohort distinctly.

Then audit your sign-up and first-run experience as if you were an agent. Open a Claude or ChatGPT session, point it at your own product, and tell it to do a normal first-time task. Watch where it gets stuck. The pattern is almost always the same: captchas, OAuth flows that demand a human-driven browser, confirmation emails, drag-and-drop UI for things that have no API. Each is a candidate fix in the Access category.

Documentation is the next surface to revisit. If your docs are single-page apps with content rendered client-side in JavaScript, an agent won’t see them properly. Ship a markdown export, publish an llms.txt, and consider an MCP server for live, structured access.

There’s a strategic question worth deciding deliberately, before it gets decided by default for you: whether to lock agents inside your own product. The Salesforce/HubSpot contrast generalises. If you’re trying to be the only agent for your users’ interaction with your own data, you’re implicitly betting that nobody will want their data accessible to a horizontal agent like Claude, ChatGPT, or whatever internal one their company runs. In my experience, when companies make that bet, it’s usually because they don’t have an agent strategy at all. The handful doing it as a deliberate moat tend to handle it more carefully.

For technical tooling specifically, evaluate against AXI. If you ship a CLI or are weighing an MCP server, the AXI principles work well as a review checklist: idempotency, structured errors, sensible defaults, low-token outputs, ambient context. The benchmarks suggest the return on this work is high relative to its cost.

One last thing about what AX actually is. The most common confusion I run into is some version of “we added a copilot to our product, so we’re doing AX.” That conflates two separate things. Adding your own agent inside your product is a feature decision. The AX question is whether the agents your customers are already using elsewhere can reach your product and do useful work. The answer almost never involves your UI.

What’s still missing

It’s worth being honest about where the discipline sits in 2026. Biilmann’s own one-year review puts it plainly: AX has caught on among developer-tool founders (Clerk, Stytch, WorkOS, Resend, Netlify itself), while broader SaaS, e-commerce, and consumer products are still mostly untouched. Outside developer tooling, “only a few visionaries” (he names Dharmesh Shah and John Maeda) are actively designing for agents. His prediction is that 2026 is the year AX generalises beyond dev tools. The early evidence supports that, but it’s early.

A few questions are still open. How do you bill an agent that fans out into a hundred API calls when its human asked one question? What does identity look like when the entity acting on the platform is a delegated process? How do you do customer support for a customer who is an LLM? Honestly, I don’t think anyone has good answers yet. The teams that work them out first will have something useful to say.

For now, the operational answer is the one Biilmann gave at the start: design consciously for agents as a class of user, then measure what happens and iterate. The companies I’d bet on for the next three years are mostly the ones already doing some version of that.


If you’re trying to figure out which of AX, AXI, and “we should probably do something about agents” actually applies to your business, that’s the kind of conversation I would love to have.

Sources