X

This site uses cookies and by using the site you are consenting to this. We utilize cookies to optimize our brand’s web presence and website experience. To learn more about cookies, click here to read our privacy statement.

I built a read-only Cosmos DB MCP so Claude can help debug BabyPlaybook

I’ve been building BabyPlaybook on nights and weekends. It’s a mobile app for tracking the chaos of having a newborn. Bottles, diapers, sleep, milestones, the works. Offline-first SQLite on the device, Cosmos DB in the cloud, with a sync layer in the middle that has, generously speaking, kept me busy.

The frustrating part of debugging BabyPlaybook isn’t the code. The code I can read. The frustrating part is the data. Half the bugs I chase look like:

“Why is this user’s bottle event showing up locally but not after the cloud pull?”

“The entitlement says tier=premium but isActive=false — what does Cosmos actually have?”

To answer those questions, I’d query my way through the az CLI, copy JSON around, and lose my train of thought. Or I’d write a throwaway script, run it, and three hours later realize I’d written four nearly-identical throwaway scripts that day.

So I built a small Cosmos DB MCP server. Now I can just ask Claude:

“Show me the last 5 events for baby 01H….”

Claude doesn’t get a magic database backdoor. It gets a handful of narrow, mostly read-only diagnostic tools that I control. Those tools query Cosmos, return JSON, and keep me in the debugging flow. That’s the real win.

That’s the post. But let me back up and explain what an MCP actually is, because I had to figure that out too.

What’s MCP?

Model Context Protocol. A small open spec from Anthropic for hooking external capabilities into an LLM-based app. (The spec defines tools, resources, and prompts. This post is about tools, because that’s what I built.) Claude Code (and Claude Desktop, and a bunch of other clients) speak it natively.

The shorthand I use: it’s a standard way to teach Claude new verbs.

Claude Code already knows how to read files, edit files, run commands, and search code.

But Claude Code does not automatically know how to query my BabyPlaybook Cosmos DB in a safe, app-specific way.

That is why I built an MCP server.

The mechanics are simple:

  1. You write a small program (a “server”) that exposes some tools.
  2. The client (Claude Code) launches your program as a subprocess and talks to it over stdin/stdout using JSON-RPC.
  3. When Claude wants to use one of your tools, the client sends a request, your server runs it, you return JSON. Claude reads the JSON like any other tool result.

For a local Claude Code setup like this one, there’s no HTTP server, no hosted deployment, no separate auth flow. Claude launches the process locally and talks to it over stdin/stdout. Your “server” is a Node script. The “protocol” is JSON over a pipe. The official SDK does the boring parts.

(MCP can also run as a remote server with proper auth. That’s a different post.)

Why bother?

MCP starts paying off when:

  • The data isn’t in the repo (a database, an issue tracker, a metrics dashboard).
  • The data changes faster than your context window (live prod state).
  • The same task comes up over and over and “shell out to the CLI each time” feels dumb.

For me that’s Cosmos. A lot of the interesting BabyPlaybook debugging questions are about state (what’s actually in the database right now) and that lives in those nine containers. Logs answer the “what happened” questions. Cosmos answers the “what’s true” ones.

Building the thing

I’m not going to walk through every line, but the shape is small enough to fit on a napkin.

A BabyPlaybook Cosmos MCP needs to:

  1. Connect to Cosmos using the connection string my Azure Function App already uses.
  2. Expose a handful of read tools so Claude can query without me writing SQL by hand each time.
  3. Optionally expose write tools for the rare “fix this one bad row in prod” case, but gated, so writes aren’t a default capability.

The whole thing ended up being ~200 lines of TypeScript across five files:

packages/mcp-cosmos/
├── package.json
├── tsconfig.json
└── src/
    ├── server.ts        # MCP entry point
    ├── cosmos.ts        # CosmosClient factory + env switching
    ├── schemas.ts       # Zod input schemas, container metadata
    └── tools/
        ├── read.ts      # query, get_document, list_containers, count, describe_schema
        └── write.ts     # upsert_document, delete_document (gated)

(Yes, TypeScript. I live in C# at the day job, but the MCP SDK is most mature in TS and Python, and TS is what Claude Code expects to launch as a subprocess. There’s a .NET SDK now too, but for a one-evening hack, TS won.)

The minimum viable MCP server

Forget Cosmos for a moment. Here’s a complete, working MCP server in one file. It exposes one tool that does literal arithmetic, because nothing kills a tutorial faster than pretending the toy example is interesting.

// hello-mcp/src/server.ts
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";

const server = new McpServer({ name: "hello-mcp", version: "0.1.0" });

server.tool(
  "add",
  "Add two numbers and return the result.",
  { a: z.number(), b: z.number() },
  async ({ a, b }) => ({
    content: [{ type: "text", text: String(a + b) }],
  }),
);

await server.connect(new StdioServerTransport());

Three pieces, every MCP server has them:

  1. An McpServer, the container.
  2. One or more server.tool(...) calls. Each registers a verb with a name, a description, an input schema (Zod), and a handler that returns content.
  3. A transport. For local servers, StdioServerTransport. The client launches you, talks over stdin/stdout, and that’s the whole connection.

To wire it into Claude Code, you drop a .mcp.json at the repo root:

{
  "mcpServers": {
    "hello": {
      "command": "node",
      "args": ["./hello-mcp/dist/server.js"]
    }
  }
}

Restart Claude Code. Ask “what’s 2 + 2?” Claude will (hopefully) decide to call your add tool and report 4. You’ve made an LLM use a calculator. Congratulations, you are a 1970s AI researcher.

The leap from “add two numbers” to “query my production database” is mostly about three things. Replacing the trivial handler body with a real client (Cosmos SDK, in my case). Being thoughtful about the input schema, because typos and bad inputs hit prod. And returning results as JSON in the text field, which Claude reads like any other tool output.

Where I made it BabyPlaybook-flavored

Four decisions I made specifically because this is my app, not a generic Cosmos MCP.

1. Container names are a Zod enum, not a free-form string

BabyPlaybook has nine containers. If I let Claude pass an arbitrary string, a typo like caregiverMembership (missing the s) fails as a Cosmos 404 with a useless message. The enum surfaces the typo at the MCP boundary.

const ContainerName = z.enum([
  "families",
  "babies",
  "events",
  "caregiverMemberships",
  "familyEntitlements",
  "webhookEvents",
  "syncCheckpoints",
  "auditLog",
  "notifications",
]);

server.tool(
  "query",
  "Run a parameterized SQL query against a BabyPlaybook container.",
  {
    container: ContainerName,
    sql: z.string(),
    parameters: z
      .array(z.object({ name: z.string(), value: z.unknown() }))
      .optional(),
    partitionKey: z.string().optional(),
  },
  async ({ container, sql, parameters, partitionKey }) => {
    /* ... */
  },
);

Small detail, but the kind of thing that compounds when you’re debugging at 11pm. The optional partitionKey is there because cross-partition queries in Cosmos cost real RUs, and I’d rather make that opt-in than accidentally fan out a 50-RU query across every partition.

2. Schemas are mirrored in the MCP source, not derived at runtime

I expose a describe_schema tool that returns the field list for a container, pulled from a static map I keep in sync with the C# models. No “discover the schema” mode, no extra Cosmos round-trip. The map lives next to the models in the same repo, so it never goes stale silently.

const SCHEMA: Record<z.infer<typeof ContainerName>, FieldSpec[]> = {
  babies: [
    { name: "id",          type: "string", note: "GUID" },
    { name: "familyId",    type: "string", note: "partition key" },
    { name: "displayName", type: "string" },
    { name: "birthDate",   type: "string", note: "ISO 8601" },
    { name: "createdAt",   type: "string", note: "ISO 8601 UTC" },
  ],
  events: [
    { name: "id",         type: "string" },
    { name: "babyId",     type: "string", note: "partition key" },
    { name: "type",       type: "enum",   note: "bottle | diaper | sleep | pump | solids | milestone" },
    { name: "occurredAt", type: "string", note: "ISO 8601 UTC" },
    { name: "payload",    type: "object", note: "shape varies by type" },
  ],
  // ...seven more
};

When the C# Baby model gets a new field, this map gets the same field. It’s annoying enough to maintain that I notice, and small enough that I actually do it.

3. Tools encode debugging workflows, not just SQL access

The first version of this MCP had one query tool plus the metadata helpers. Functional, but boring. Every debugging session looked the same: I’d ask Claude to query container A, then container B, then mentally join the results. Claude was doing the SQL, but I was still doing the thinking.

So I added tools shaped like the questions I actually ask. The most useful so far is diagnose_sync_state:

server.tool(
  "diagnose_sync_state",
  "Compare local and cloud sync checkpoints for a baby and surface deltas.",
  {
    babyId: z.string(),
    sinceHours: z.number().default(24),
  },
  async ({ babyId, sinceHours }) => {
    const checkpoint = await readCheckpoint(babyId);
    const events = await queryEvents({
      babyId,
      sinceTs: nowMinusHours(sinceHours),
    });
    return jsonResult({
      checkpoint,
      eventCount: events.items.length,
      latestEventTs: events.items.at(-1)?._ts ?? null,
      checkpointLagSeconds:
        checkpoint?.lastSyncedTs && events.items.at(-1)
          ? events.items.at(-1)._ts - checkpoint.lastSyncedTs
          : null,
      requestCharge: events.requestCharge, // RU cost, useful when called often
    });
  },
);

That’s the actual question I’m asking when I open Cosmos at 11pm: is this baby’s checkpoint behind the events I see? Encoding it as a tool means Claude answers it in one call instead of three, and the response shape is built for the question, not for the database.

A few others in the same spirit:

  • get_baby_timeline(babyId, since) returns events across containers in render order.
  • find_orphaned_events() returns events whose babyId no longer resolves.
  • count_active_entitlements() returns the number I keep manually computing.

These tools run the same underlying queries the generic query tool would. The difference is that the intent lives in the MCP, not in Claude’s head. When I’m tired, I want my tools doing more of the thinking.

4. Writes are gated, and the gate is the point

Read tools register unconditionally. Write tools only register when COSMOS_MCP_ALLOW_WRITES=true. I want “Claude can mutate prod data” to be a deliberate per-session choice, not a default. Flipping the gate is a one-line shell command, but the explicit step is the point. Destructive operations should feel destructive.

The bigger frame: this isn’t “Claude has database access.” It’s a small diagnostic interface I wrote, with known containers, validated inputs, and read-only as the floor. The MCP is the guardrail. If I wanted Claude to have raw Cosmos access, I’d have given it the connection string.

Local vs. prod is not the same setup

The same MCP behaves differently across environments. Locally and in dev I let it have wider latitude: bigger result sets, looser limits, both read and write tools registered. In production I want boring constraints: read-only by default, known containers only, small result sets, no secrets in tool output, and writes opt-in per session. Same code, different env vars. The point isn’t that the code is bulletproof, it’s that the production version of the tool is intentionally smaller than the dev version.

What it actually feels like to use

The first thing I asked Claude after wiring it up:

“Count how many premium families are in Cosmos.”

Claude called count on familyEntitlements with where: c.tier = @t AND c.isActive = true. Got a number back. No az-cli incantation, no script, no copying JSON between windows.

The second thing I asked:

“Show me one event document so I remember the shape.”

query with SELECT TOP 1 * FROM c. Done.

The friction of “go ask the database a quick question” drops to roughly zero, which means I ask more questions, which means I get to the actual bug faster.

Why not just use Azure MCP?

The official Azure MCP (@azure/mcp) is the right starting point if you want broad Azure access. It covers Cosmos, Storage, Key Vault, the works. If you’re poking around Azure resources in general, use it.

I wanted something narrower. BabyPlaybook container names. BabyPlaybook schema hints. BabyPlaybook-shaped tools like diagnose_sync_state. Write tools that aren’t even registered unless I opt in. In other words, not a generic Cosmos assistant but a debugging assistant that speaks my app’s domain.

The two approaches aren’t competing. Use Azure MCP for the platform. Build a thin domain-aware MCP next to whatever specific app you’re trying to debug.

Should you build one?

Rough heuristic: if you’ve written the same throwaway debugging script three times, you should have built an MCP instead. The investment is small (an evening), the SDK handles the protocol grunt work, and you get back something that pays dividends every time you debug.

A few honest caveats:

Don’t build an MCP for stuff Claude can already do. If your data is files, just use Read and Grep. Don’t build a read_files_with_extra_steps tool.

Expose intent, not a tunnel. “Run any SQL against the database” is a tunnel. get_recent_events, count_active_entitlements, diagnose_sync_state are intent. Tunnels are easy to write and easy to misuse. Intent-shaped tools take slightly more work and hold up at 11pm.

Be paranoid about writes. Default to read-only. If you need writes, gate them, log them, and make destructive operations feel destructive.

Mirror, don’t introspect, when the domain is small. For BabyPlaybook, the C# models live right next to the MCP source, the schema map is short, and keeping it explicit makes Claude’s view of the data match mine. For larger or more dynamic schemas, live introspection might earn its place.

That’s the post. Now if you’ll excuse me, there’s a sync bug with my name on it. With help, this time.