--- name: build-a-chatbot description: Use when the user wants to build, add, or wire up a chatbot, AI assistant, conversational interface, or LLM tool-calling layer in any codebase — including phrases like "build a chatbot", "add a chatbot to this app", "wire up an AI assistant", "add LLM tool calling", "let users chat with our app", "make an AI helper", "add a Cmd+K assistant", "integrate MCP into the chat", "build an AI assistant for this codebase". Trigger this skill whenever the user is starting from zero or extending an existing chatbot — it walks through codebase survey, vendor selection (OpenAI/Anthropic), architecture choice (native tools / MCP / hybrid), the multi-turn tool-calling loop, dispatch layer with feature gating, optional embedding-based intent routing, optional MCP integration, frontend wiring, and verification. Works in any backend (Django, FastAPI, Express, Rails, Next.js API routes, Spring Boot, etc.) and any frontend (React, Vue, Svelte, mobile). Copy this skill's bundled scaffolds rather than writing from scratch. Use even when the user describes the work in domain terms ("let users ask questions about their orders", "add a support bot", "AI search for our app") — those are all chatbot tasks. --- # Build a Chatbot This skill walks you through standing up a production-grade chatbot in any codebase. It distills the architecture of a working production chatbot (39 tools, multi-turn GPT-4o, cosine-similarity intent routing, per-org RBAC) into a vendor-neutral, framework-neutral playbook plus copy-paste scaffolds. **Before you start coding, read this whole file.** Then jump into references/ and scripts/ as the workflow directs. --- ## When this skill applies You're being asked to add conversational AI to an app — anything from a Cmd+K palette to a support bot to "let users ask questions about their data." The user has APIs (their own), and possibly MCP servers (third-party). Your job is to wire a chat UI to an LLM that can call those APIs/MCP tools on behalf of the user. If the user only wants pure RAG (search over docs, no actions), this skill is overkill — point them at a vector-store recipe instead. This skill is for **chatbots that take actions**. --- ## The five-part architecture (always) ``` ┌─────────────────────────────────────────────────────────────┐ │ 1. CHAT UI (frontend or terminal) │ │ ↓ HTTP + auth │ │ 2. CHAT ENDPOINT (validate + rate limit + RBAC) │ │ ↓ │ │ 3. (optional) INTENT ROUTER (embedding centroids) │ │ ↓ │ │ 4. MULTI-TURN LOOP (LLM ↔ tool_calls ↔ results) │ │ ↓ │ │ 5. TOOL DISPATCH (single entry point, feature gate)│ │ ├─ NATIVE handlers (your DB, Celery, etc.) │ │ └─ MCP-bridged tools (third-party data) │ └─────────────────────────────────────────────────────────────┘ ``` Three of these — the multi-turn loop, the dispatch layer, and the tool handlers — are mandatory. Intent router and MCP bridge are optional but recommended for nontrivial chatbots. --- ## Workflow Follow these steps in order. Each step has a "verify" checkpoint — don't move on until it passes. The whole flow takes a senior engineer ~1–2 days for a basic chatbot, ~1 week including intent router + MCP. ### Step 1 — Survey the codebase Before writing anything, find out what you're building inside of. Use `Explore` (or read directly) to determine: - **Backend framework** — Django, FastAPI, Flask, Express, Rails, Spring, Next.js API routes? This tells you where the chat endpoint lives and how request lifecycle / auth / RBAC work. - **Auth model** — JWT? Session cookies? OAuth? Per-user, per-org, per-tenant? The dispatch layer needs to know "who is calling" to scope queries. - **Existing AI usage** — is there already an OpenAI/Anthropic client? A chatbot? A vector store? Reuse before reinventing. - **API surface** — what are the highest-value 5–10 endpoints? Those are your first chat tools. Don't try to expose everything on day one. - **Frontend stack** — React / Vue / Svelte / mobile / Electron / terminal? The UI scaffold differs but the request shape is identical. - **Secrets management** — `.env`, AWS Secrets Manager, Doppler, Vault? You need this for the LLM API key and any MCP server credentials. Write a short summary of what you found before proceeding. It should fit in ~10 lines and answer: _Where does the chat endpoint go? Who can call it? What 5 tools do I expose first? Which LLM vendor and why?_ ### Step 2 — Pick the architecture Three patterns. Default to **Pattern C (Hybrid)** for any nontrivial app. - **Pattern A — Native tools only.** Tools live in your repo, dispatch in-process. Best when all tools touch your own DB and you don't share with other LLM hosts. Lowest latency. - **Pattern B — MCP only.** All tools come from MCP servers. Best when the same tools need to work in Claude Desktop, Cursor, _and_ your app. Higher latency. - **Pattern C — Hybrid (recommended).** Native handlers for tightly coupled domain ops (create/update/delete on your DB, transactional safety matters). MCP servers for third-party data (Stripe, GitHub, Slack, QuickBooks, internal data warehouses). For more nuance see `references/architecture.md`. ### Step 3 — Pick the vendor and scaffold the LLM client Both OpenAI and Anthropic work. The scaffolds support both. Pick based on: - **OpenAI** — broader ecosystem, automatic prompt caching, strict mode for guaranteed JSON. Default to GPT-4o for quality, GPT-4o-mini for cost. - **Anthropic** — better citation API, explicit `cache_control` for fine control, often stronger long-context reasoning. Default to Claude Opus 4.7 (1M ctx) for quality, Claude Haiku 4.5 for cost. The wire formats differ; see `references/vendor-cheatsheet.md` for a side-by- side. The scaffold in `scripts/python/chat_service.py` is OpenAI-shaped; `scripts/typescript/chat-service.ts` is Anthropic-shaped. Mix and match. Initialize the client in a singleton (`@lru_cache`d in Python, module-scoped in TS) so you reuse the HTTP connection pool. Read the API key from your existing secrets pipeline, never hardcode. ### Step 4 — Copy the dispatch layer This is the centralized entry point for every tool call. **Copy `scripts/python/dispatch_tool.py`** (or the TS equivalent) into your codebase and adapt the imports. It does three things: 1. Looks up the handler by name. 2. Checks feature gating (Pro-only, role-based, etc.) **before** the handler runs. Returns a structured `{"error": "requires_pro", "feature": ..., "message": ...}` envelope on denial. 3. Calls the handler with `(user, args)`. **Why centralized:** the model sees the full tool catalog; the dispatcher enforces RBAC server-side. Never gate tools by hiding them from the model — that's bypassable. Gate at dispatch. ### Step 5 — Define first-party tools Group tools by domain (e.g. `customer_tools.py`, `order_tools.py`, `billing_tools.py`). Each module exports: ```python HANDLERS = { "tool_name": handler_function, # signature: (user, args: dict) -> dict } ``` Each handler: - Takes `(user, args: dict)` and returns a JSON-serializable dict. - Scopes every DB query by `user.organization_id` (or your tenant key). - Returns a clean envelope: `{...result..., "success": True}` on write, `[...]` or `{...}` on read, `{"error": "..."}` on user-level failures. - Lets programming bugs (`AttributeError`, `KeyError`) propagate to your error tracker — only catch domain exceptions. Write 3–5 tools first. Validate the loop end-to-end before adding more. The full recipe (schema → handler → registration → parity check) is in `references/adding-tools.md`. The starter scaffold is in `scripts/python/chat_tools_template/`. ### Step 6 — Wire the multi-turn loop **Copy `scripts/python/chat_service.py`.** This is the canonical multi-turn tool-calling loop with the _conditional `tools` kwarg_ fix (lines marked with `### CONDITIONAL ###`). The fix matters — explicit `tools=None` is rejected by the OpenAI API, so on the final turn you must omit the kwarg entirely. The loop: 1. First call: `messages + tools + tool_choice="auto"`. 2. While model emits `tool_calls` and `turn < MAX_TURNS`: - Append assistant's tool_calls message. - For each tool_call: dispatch, append tool result. - On final turn (`turn == MAX_TURNS - 1`): omit `tools` to force a text summary. 3. Return `{response, actions}`. Bound `MAX_TURNS` (5 is a good default). Add an anti-hedge instruction to the system prompt — see `assets/system-prompt-template.md`. ### Step 7 — Add the chat endpoint A single POST endpoint. Required behavior: - Auth required (whatever your auth is — JWT, session, etc.). - Rate limit: 20 req/min per user is a reasonable start. - Validate `message` length (≤ 2000 chars). - Sanitize `history` — only allow `role ∈ {"user", "assistant"}`, reject `"system"` (prompt injection). - Optional `context` dict for FE state ("user is on quote page Q-..."). - Optional daily quota check (so a runaway loop doesn't burn $$$). Request/response shape is in `references/architecture.md` § "Endpoint contract". ### Step 8 — Verify the loop end-to-end (don't skip) Before moving on, prove the loop works: 1. **Parity check** — every schema name has a handler: ```python from chat_tools import ALL_TOOLS, ALL_HANDLERS defs = {t["function"]["name"] for t in ALL_TOOLS} handlers = set(ALL_HANDLERS) assert defs == handlers, defs ^ handlers ``` 2. **Curl smoke test** — hit the endpoint with a known intent ("create a test customer named Ada"). Expect `200`, an `actions` array with the tool call, and a DB row. 3. **Multi-turn smoke test** — give a prompt that requires chaining ("create a quote for the customer Ada and add a $500 line item"). Watch the logs — you should see ≥ 2 tool calls in one request. If the model says "I'll do that for you" in plain text and stops, your anti-hedge prompt is missing or weak. ### Step 9 — (Optional but recommended) Add the intent router Worth it when you have ≥ 5 high-frequency intents. Skips the LLM entirely on common requests, saving 80%+ of API costs and ~1s latency. Copy `scripts/python/intent_matcher.py` and `scripts/python/intent_extractors.py`. Define your intents (~5–10 example phrases each), let the matcher pre-compute centroids on first use, route by confidence band: - HIGH (≥ 0.85): execute directly via `dispatch_tool`. - MEDIUM (0.70–0.84): execute with a "Going to do X" prefix. - AMBIGUOUS (top-2 Δ < 0.05): ask user to clarify. - LOW / NONE: fall back to the multi-turn LLM loop. For destructive intents (`delete_customer`, `cancel_subscription`), require a confirmation word ("yes", "confirm", "proceed") before executing. Deeper guidance in `references/intent-routing.md`. ### Step 10 — (Optional) Add MCP integration Add MCP servers when you need third-party data (Stripe, GitHub, Slack, internal data lake) and the data lives outside your transactional DB. - For third parties with official MCP servers: install and configure them. - For your internal data: build a server with `scripts/python/fastmcp_server.py`. - Bridge MCP tools into the same `dispatch_tool` surface using `scripts/python/mcp_bridge.py` — list MCP tools at boot, convert schemas to the LLM's tool format, register alongside native tools with an `mcp:` prefix in the name. This is Pattern C in action. To `chat_service.py`, MCP tools look identical to native tools — same dispatch, same logging, same gating. Full primer + transports + OAuth in `references/mcp-integration.md`. ### Step 11 — Frontend A chat component needs to: - POST `{message, history[≤20], context}` to the chat endpoint with auth. - Auto-refresh the auth token on 401, then retry. - Render the assistant response, plus any _suggestion chips_ the backend returned (3 short follow-up prompts, click → new request). - If the backend returns `navigate_to`, route to it _without_ rendering an assistant message. This avoids "I'm taking you there" clutter. - On `error_code: "requires_pro"`, render the upsell payload as an inline CTA, not as raw error text. The TS scaffold in `scripts/typescript/chat-component.tsx` is React-shaped but trivial to port. The request/response contract is identical regardless of framework. ### Step 12 — Verify and ship Before declaring victory: - Run the parity check (Step 8). - Run a 5-prompt golden conversation test — record real prompts, assert the model emitted tool calls from a known allowlist, never assert the model's exact text. Save these for regression. - Check the logs for _every_ tool call: `(conversation_id, turn, tool_name, args, result_or_error, latency_ms)`. This is your debugging log + audit trail + golden corpus. - Hit the endpoint with a free user and a Pro user (or your equivalent tiers) and confirm gating works. `references/verification-checklist.md` has the full list. --- ## Production gotchas (read these — they bite) 1. **The `tools=None` trap.** Both vendors reject explicit `tools=None`. To omit tools on the final turn, omit the kwarg entirely. The scaffold does this. 2. **The hedge-and-stop trap.** Without a strong anti-hedge prompt, the model says "I'll do that for you" and stops _without calling the tool_. Use the system prompt in `assets/system-prompt-template.md`. 3. **Infinite loops.** Cap `MAX_TURNS` (3–6) and dedupe identical `(name, normalized_args)` calls. A documented Claude Code recursion incident burned 1.67B tokens before someone hit Ctrl+C. 4. **Idempotency.** Every write tool gets a server-generated idempotency key the LLM never sees: hash of `(user_id, tool_name, normalized_args, conversation_id)`. Stripe-style. Otherwise a transient retry duplicates work. 5. **Prompt caching.** Put the system prompt + tool list at the top of the message array. OpenAI caches it automatically; Anthropic needs explicit `cache_control: {type: "ephemeral", ttl: "1h"}`. With ~30+ tools the prefix is 3–5k tokens — caching it pays for itself within 2 turns. 6. **Don't trust the model for auth.** Gate at dispatch. Returning a `requires_pro` envelope from the dispatcher is fine; expecting the model to "know" not to call a Pro tool is not. 7. **History role allowlist.** Only accept `user` and `assistant` in submitted history. Rejecting `system` blocks a class of prompt injection. 8. **Sanitize args before logging.** PII in args (emails, phones) shouldn't land in cleartext logs. Hash or redact. --- ## Reference files (read as needed) - `references/architecture.md` — full ASCII flows, request/response contract, observability schema. - `references/vendor-cheatsheet.md` — OpenAI vs Anthropic schema diffs, `tool_choice`, parallel calls, caching TTLs. - `references/intent-routing.md` — embedding centroid pre-pass, when to use it, threshold tuning, fast-path arg extraction. - `references/mcp-integration.md` — MCP primer, transports, OAuth 2.1, bridging MCP tools into your existing dispatcher. - `references/adding-tools.md` — the recipe for adding a new tool (schema → handler → register → parity check → smoke test). - `references/verification-checklist.md` — the pre-ship checklist. ## Scaffolds (copy and adapt) - `scripts/python/chat_service.py` — multi-turn loop with conditional `tools` kwarg fix. - `scripts/python/dispatch_tool.py` — central dispatch + feature gating. - `scripts/python/intent_matcher.py` — cosine centroid intent matcher. - `scripts/python/intent_extractors.py` — regex fast-path arg extraction. - `scripts/python/mcp_bridge.py` — MCP client → tool list adapter. - `scripts/python/fastmcp_server.py` — minimal MCP server template. - `scripts/python/chat_tools_template/` — domain-module starter (`__init__.py`, `definitions.py`, `example_tools.py`). - `scripts/typescript/chat-service.ts` — Anthropic-shaped multi-turn loop. - `scripts/typescript/mcp-bridge.ts` — MCP → Anthropic tools bridge. - `scripts/typescript/chat-component.tsx` — React chat UI starter. - `assets/system-prompt-template.md` — anti-hedge system prompt. These are starting points, not final code. Copy them into your codebase, adapt imports/types/auth, and edit freely. The skeletons exist to save you from rediscovering the gotchas above.