# MCP Apps

Source: https://tpiros.dev/blog/build-an-mcp-app-interactive-ui-in-the-chat

Before we get to MCP Apps, it helps to remember what problem MCP set out to solve in the first place, because the new proposal is a direct answer to a limitation baked into that original design.

# Where MCP came from

Rewind to late 2024. If you were building an AI app that needed to read from GitHub, query a database and post to Slack, you wrote three integrations. If the company down the road was building a different app against the same three systems, they wrote their own three. Every model-facing app reimplemented the same plumbing, and every tool vendor had to be taught about each app one at a time. That's the M×N problem: M apps times N tools, all wired by hand.

On 25 November 2024, Anthropic open-sourced the Model Context Protocol to collapse that into M+N. Spec version `2024-11-05`, with Python and TypeScript SDKs on day one. The idea is the same one that made HTTP useful: agree on a wire format once, and anyone can talk to anyone. A **server** exposes its capabilities over JSON-RPC, a **client** (the AI app) connects to it, and neither needs to know anything special about the other.

A server exposes three kinds of thing. **Tools** are functions the model can call (query a table, send a message). **Resources** are data the client can read (a file, a record). **Prompts** are reusable templates. Write one MCP server for your product and every MCP-speaking client can use it, which is the whole point.

If you want to build either side from scratch, I've walked through both before: [creating an MCP server](/blog/mcp-server) and [creating an MCP client](/blog/mcp-client). This post assumes that base and goes one layer up, to the UI.

The adoption curve was steep, and worth listing because it's why this matters rather than being a curiosity:

- **March 2025**: OpenAI adds MCP across the Agents SDK, the Responses API and the ChatGPT desktop app.
- **April 2025**: Google confirms MCP support in Gemini.
- **May 2025**: Microsoft and GitHub join the steering committee, Windows ships support.
- **November 2025**: the spec grows up, with stateless operation, async tasks and an official registry for discovering servers.
- **December 2025**: Anthropic donates MCP to the Agentic AI Foundation under the Linux Foundation, co-founded with Block and OpenAI.

So by early 2026 MCP is the de-facto way agents reach tools and data: thousands of servers, SDKs in every major language, and a neutral foundation steering it.

# The ceiling nobody designed around

Now look closely at the shape of that first era, because there's an assumption hiding in it.

A tool call returns text, an image, or structured data, and the host shows it in the conversation. The model reads it out, and that's the end of the interaction. For "what's the weather" or "summarise this file", perfect. The trouble starts the moment the result is something the user wants to *work with* rather than just read.

Ask for your sales numbers and the tool hands back a table. Now you want it sorted by revenue, so the model runs again. Then you filter to EMEA, then you narrow to Q2, and each time the whole request goes back through the model. That's a slow, lossy stand-in for what is really one click on a column header.

The model is doing work it's terrible at (re-rendering a table with a different sort) and not doing the work it's good at (reasoning). Better text was never going to fix that. What's missing is an interface.

# The proposal

That's the gap the **2026-07-28 release candidate** sets out to close. It's the largest revision since launch, and it bundles several things: a stateless core, a Tasks extension, elicitation, sampling, and caching. The one we care about here is **MCP Apps**, tracked as SEP-1865 and first written up in the [2026-01-26 MCP Apps spec](https://blog.modelcontextprotocol.io/posts/2026-01-26-mcp-apps/).

The proposal in one sentence: a tool can return an **interactive UI** that the host renders directly in the conversation, and that UI can call back into the server on its own.

That last clause is the important half. Plenty of systems can already render HTML; what's new is that the interface, once rendered, can call tools itself without going back through the model. The column-header click sorts the table by talking straight to the server, and the model only gets pulled back in when there's something for it to reason about.

Everything below is how that works, and a small app you can run today to watch it happen.

# The mental model

An MCP App is a sandboxed iframe that can speak MCP. The rest of this section just fills that in.

The server does two pieces of work and ties them together. It registers a normal tool, and it registers a UI resource under a `ui://` URI holding a bundled HTML page. The tool carries one extra field, `_meta.ui.resourceUri`, pointing at that resource. When the model calls the tool, the host reads the field, fetches the HTML, and renders it in an iframe inside the chat.

From there the iframe talks back over JSON-RPC on `postMessage`. It can call any tool on the server itself, with no model turn, and the host can push fresh results into it. The underlying MCP transport (stdio, HTTP) sits below all of this; the `ui/*` dialect runs inside the frame.

# Why not just build a web app and send a link

Reasonable question, and sometimes a plain web app is the right answer. Four things make the in-chat version different.

The app lives in the conversation, so there's no tab to lose and no "which thread had that dashboard". It gets bidirectional data flow for free: a standalone page would need its own API, its own auth and its own state, whereas the iframe calls server tools through the channel the host already manages. And it can borrow the host's capabilities. Rather than building an email integration, the app can ask the host to "send this", and the host routes it through whatever the user has already connected, with their consent.

The sandbox is the fourth reason. The frame can't reach the parent page, read cookies, or escape its container, so the host can render a third-party app without trusting its author. If none of that buys you anything, write a web app. If you want the thing wired into the conversation, this is the better tool.

# Building a small one

Let's build the canonical example from the docs, because it isolates the one moment that matters: you click a bar and the chart drills down with no model turn, and the assistant still knows what you're looking at.

![The Sales Explorer chart rendered inside Claude](https://res.cloudinary.com/tamas/image/upload/f_auto,q_auto,w_900/v1782834290/claude-mcp-apps_jcslkk)

Here's what we're building, rendered in Claude. The whole thing lives in the conversation, and the bars are clickable.

The stack is web. The server is the official `@modelcontextprotocol/sdk` plus `@modelcontextprotocol/ext-apps`. The iframe is vanilla JS, so every line is about the protocol rather than a framework. Node 24 runs the TypeScript directly, no `tsx` or `ts-node`.

The full project is on GitHub at [tpiros/mcp-apps](https://github.com/tpiros/mcp-apps) if you'd rather clone it and run it than read snippets. The snippets below are trimmed to the parts that matter.

## The server

Two registrations, tied together by the resource URI. The tool declares its UI in `_meta.ui.resourceUri`; the resource serves the bundled HTML.

```ts

  registerAppResource,
  registerAppTool,
  RESOURCE_MIME_TYPE,
} from "@modelcontextprotocol/ext-apps/server";

const resourceUri = "ui://sales-explorer/mcp-app.html";

registerAppTool(server, "get_sales", {
  title: "Sales Explorer",
  description: "Show sales as an interactive chart. Omit `region` for the overview, or pass a region to see its monthly breakdown.",
  inputSchema: {
    region: z.enum(["EMEA", "AMER", "APAC", "LATAM"]).optional(),
  },
  outputSchema: SalesPayload.shape,
  _meta: { ui: { resourceUri } }, // this line makes it an MCP App
}, async ({ region }) => {
  const payload = region
    ? { level: "region", region, rows: REGION_MONTHLY[region] }
    : { level: "overview", rows: overviewRows() };

  return {
    content: [{ type: "text", text: region ? `Monthly sales for ${region}.` : "Sales by region." }],
    structuredContent: payload, // what the UI reads to draw the chart
  };
});

registerAppResource(server, resourceUri, resourceUri,
  { mimeType: RESOURCE_MIME_TYPE },
  async () => ({
    contents: [{ uri: resourceUri, mimeType: RESOURCE_MIME_TYPE, text: await readBundledHtml() }],
  }),
);
```

Two things worth pointing at. `content` is the text a non-UI client or the model would see; `structuredContent` is the payload the iframe reads. And `RESOURCE_MIME_TYPE` is the SDK's constant for the one mime type the spec allows here, which we'll come back to when we check the wire.

## The iframe

The client does three things, one per beat of the demo. It renders the chart from the result the host pushes in. On a bar click it calls the tool itself. And it tells the host what the user is now looking at.

```ts

const app = new App({ name: "Sales Explorer", version: "1.0.0" });

// 1. The host pushes the first result here when the model calls get_sales.
app.ontoolresult = (result) => render(result.structuredContent);

// 2. The key moment: a click calls the tool with no model turn.
async function drillInto(region) {
  const result = await app.callServerTool({ name: "get_sales", arguments: { region } });
  render(result.structuredContent);

  // 3. Keep the model in the loop so "email this" knows what "this" is.
  await app.sendMessage({
    role: "user",
    content: [{ type: "text", text: `(Now viewing the monthly breakdown for ${region}.)` }],
  });
}

app.onhostcontextchanged = (ctx) => ctx.theme && applyDocumentTheme(ctx.theme);
app.connect();
```

`render` is just DOM. I drew the bars with `div`s and a CSS width rather than pulling in a chart library, which keeps the whole UI legible and makes the click handler the most interesting line in the file. The overview bars are buttons that call `drillInto`; the region view is plain.

The UI gets bundled into a single self-contained HTML file with Vite and `vite-plugin-singlefile`, so the server returns one resource with no external asset requests for the host to police.

# Following the round trip

Here's the whole sequence, end to end. Watch step 4 in particular, because it's the one thing a static result can't do.

```
User: "show me sales by region"
   │
   ▼
Model ── tools/call get_sales ──▶ Server
                                    │ returns data + ui:// resource
   ┌────────────────────────────────┘
   ▼
Host renders the HTML in a sandboxed iframe, in the chat
   │
   ▼
User clicks the EMEA bar
   │
   ▼
iframe ── tools/call {region:"EMEA"} ──▶ Host ──▶ Server     (no model turn)
   ◀── fresh data ──
   │
   ▼
iframe re-renders, and sends a context message
   │
   ▼
User: "summarise what I'm viewing"  → model knows it's EMEA monthly
```

Steps 1 to 3 are a normal tool call with one extra payload. Step 4 is the new thing: the user changed what they're looking at, and the model never ran. That's the difference between a static table and an interface.

![The drilled-down monthly view after clicking a region](https://res.cloudinary.com/tamas/image/upload/f_auto,q_auto,w_900/v1782834876/claude-mcp-apps)

That second view is the result of step 4: one click, the chart swapped to a region's monthly breakdown, no round trip through the model.

# Running it in Claude Desktop

Claude Desktop only accepts `https://` URLs for remote MCP servers, so a local `http://localhost:3001/mcp` gets rejected with "URL must start with https". For local development you don't use the URL field at all. You wire it up over stdio in the config file, which has no URL rule.

On macOS the file is `~/Library/Application Support/Claude/claude_desktop_config.json`:

```json
{
  "mcpServers": {
    "sales-explorer": {
      "command": "/Users/you/.nvm/versions/node/v24.15.0/bin/node",
      "args": ["/abs/path/to/main.ts", "--stdio"]
    }
  }
}
```

Use the absolute path to `node`. Claude Desktop spawns the command with a minimal `PATH`, so a bare `node` (especially under nvm) won't be found. Build the UI once so `dist/mcp-app.html` exists, fully quit Claude Desktop (it only reads the config on launch), reopen, and ask for sales by region. If it errors, the log at `~/Library/Logs/Claude/mcp-server-sales-explorer.log` usually points straight at the wrong node path.

# Checking it against the spec

I don't like taking the SDK's word for it, so here's the same server over HTTP, poked with `curl`. List the tools and the UI link is right there:

```bash
curl -s -H 'Content-Type: application/json' \
     -H 'Accept: application/json, text/event-stream' \
     -X POST http://localhost:3001/mcp \
     --data '{"jsonrpc":"2.0","id":1,"method":"tools/list"}'
# ... "name":"get_sales" ... "resourceUri":"ui://sales-explorer/mcp-app.html"
```

Read the resource and the mime type is the one the spec mandates:

```bash
# resources/read ui://sales-explorer/mcp-app.html
# ... "mimeType":"text/html;profile=mcp-app" ... <!DOCTYPE html> ...
```

Lined up against the [2026-01-26 spec](https://github.com/modelcontextprotocol/ext-apps/blob/main/specification/2026-01-26/apps.mdx), every normative bit checks out:

| Spec says | The demo does |
| --- | --- |
| UI resources MUST use the `ui://` scheme | `ui://sales-explorer/mcp-app.html` |
| mimeType MUST be `text/html;profile=mcp-app` | exactly that, confirmed on the wire |
| Tool links UI via `_meta.ui.resourceUri` | `_meta: { ui: { resourceUri } }` |
| Iframe handshake `ui/initialize` | `app.connect()` |
| Iframe calls tools via `tools/call` | `app.callServerTool(...)` |
| Host pushes results via `ui/notifications/tool-result` | `app.ontoolresult = ...` |
| Transport: JSON-RPC 2.0 over `postMessage` | handled by the `App` class |

Nothing is hand-rolled. `registerAppTool` and the `App` class are the SDK's conformant wrappers around those methods, which is the whole reason to use them.

# A word on WebMCP

There's a second standard from 2026 that's easy to confuse with this one. [WebMCP](https://www.webfuse.com/blog/what-is-webmcp-the-practical-guide-to-the-web-model-context-protocol) lets a website expose its own functions to the browser's agent through a `navigator.modelContext` API. So: can an MCP App talk to a WebMCP page?

No, and the reason is instructive. The iframe is sandboxed and speaks only to its host. WebMCP isn't a server you dial anyway; it's an in-page API for the browser's own agent. Different agent, different runtime, and the sandbox blocks the frame from reaching the page's `navigator.modelContext` on purpose. You can bridge the two with extra plumbing (a WebMCP-to-MCP shim plus your server acting as a client), but neither spec gives you the link for free. MCP Apps bring UI into the agent. WebMCP makes the open web legible to the agent. They're solving different problems.

# When to reach for this

The chart is a toy, but the shape generalises to anything where reading isn't enough and the user needs to manipulate. Data exploration with drill-down and filters. Configuration with dozens of interdependent options, shown as a form instead of a twenty-message interview. Document and media review with the actual viewer embedded. Live dashboards that update without you asking "what's the status now". Multi-step approvals where you step through items with buttons.

All of them lean on the same trick the round trip showed: the user drives the interface directly, and the model only steps in when it has something to add. So the rule of thumb is simple enough. If your tool's result is something people read once, keep it text; the moment they want to poke at it, ship the UI.

MCP spent its first year and a half teaching models to *call* things. This is the part where it learns to *show* you things, and let you act on what it shows.
