← All Posts

How We Structure Every AI API Route

The 7-step pattern we use at MillerAI Innovations for every production AI endpoint — from auth to Claude API to structured response parsing. No shortcuts, no vibes.

Every AI API route we ship follows the same pattern. Not because we love boilerplate — because we got burned early by routes that worked in dev and exploded in production.

After building 50+ API endpoints across BitcoinRMF, SatsLegacy, and our other applications, we settled on a 7-step flow that handles the hard parts before they become incidents.

The 7-Step Flow

Every route that touches Claude follows this exact sequence:

  1. Authenticate
  2. Rate limit
  3. Validate input
  4. Check environment
  5. Call Claude with a structured system prompt
  6. Parse the JSON response
  7. Add security headers and return

Skip any step and you'll regret it. Here's why each one matters.

Step 1: Authenticate First

Before touching anything else, verify the user has a valid session. This is a hard gate — not a soft check.

const session = await getServerSession(authOptions);
if (!session?.user) {
  return addSecurityHeaders(
    NextResponse.json({ error: 'Authentication required' }, { status: 401 })
  );
}

We use NextAuth with Twitter OAuth and JWT sessions. The key decision: authentication happens at the route level, not in middleware. Middleware is tempting but makes it harder to have mixed auth strategies — some of our routes use bearer tokens for cron jobs instead of session auth.

Step 2: Rate Limit by Endpoint Type

Not all endpoints deserve the same rate limit. An AI analysis that costs real money per call gets a tighter limit than a vote action.

const clientId = getClientId(request);
const { allowed, remaining, resetIn } = checkRateLimit(
  `analysis:${clientId}`,
  'analysis'
);

if (!allowed) {
  return addSecurityHeaders(rateLimitResponse(resetIn));
}

Our rate limit tiers:

| Endpoint Type | Limit | |---|---| | AI analysis | 10 req/min | | Voting | 30 req/min | | Auth attempts | 5 req/15 min | | Pipeline operations | 20 req/min | | External API calls | 5 req/30 sec |

The rate limiter is in-memory with a sliding window. For a solo founder's scale, this is simpler and faster than Redis. We'll move to Upstash when we need distributed rate limiting.

Step 3: Validate and Constrain Input

Every field gets checked for type, presence, and length. This isn't paranoia — it's cost control. Sending a 50KB string to Claude when you expected a paragraph is an expensive mistake.

if (!description || typeof description !== 'string') {
  return NextResponse.json(
    { error: 'Threat description is required' },
    { status: 400 }
  );
}

if (description.length > 5000) {
  return NextResponse.json(
    { error: 'Description too long (max 5,000 characters)' },
    { status: 400 }
  );
}

For CRUD routes with complex schemas, we use Zod for runtime validation:

const parsed = threatInputSchema.safeParse(body);
if (!parsed.success) {
  return NextResponse.json(
    { error: 'Invalid input', details: parsed.error.flatten() },
    { status: 400 }
  );
}

Step 4: Check Environment Before Calling

This catches deployment misconfigurations before they hit the API. It's one line that saves you from cryptic Anthropic SDK errors.

if (!process.env.ANTHROPIC_API_KEY) {
  return NextResponse.json(
    { error: 'AI service not configured' },
    { status: 503 }
  );
}

Return 503 (Service Unavailable), not 500. The distinction matters — 503 tells clients the issue is temporary and retriable.

Step 5: Call Claude with Structured System Prompts

This is where most developers under-invest. The system prompt is your contract with the model. Ours are specific about the exact JSON structure we expect, including field names, types, and valid ranges.

const client = getAnthropicClient();
const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 4096,
  system: THREAT_ANALYSIS_PROMPT,
  messages: [
    {
      role: 'user',
      content: `Analyze this threat:\n\n${sanitizeInput(description)}`,
    },
  ],
});

Two critical details here:

Input sanitization. User input gets HTML-entity-escaped before it reaches Claude. This isn't about prompt injection paranoia — it's about preventing malformed output when user input contains characters that could break JSON.

Singleton client. We instantiate the Anthropic client once and reuse it. The SDK handles connection pooling internally.

The system prompt for our threat analysis endpoint specifies the exact schema:

Return a JSON object with:
- "name": string
- "strideCategory": "SPOOFING" | "TAMPERING" | ...
- "likelihood": 1-5
- "fairEstimates": {
    "threatEventFrequency": number,
    "vulnerability": 0.0-1.0,
    "annualizedLossExpectancy": number
  }
Return ONLY valid JSON. No markdown, no explanation.

The more specific your prompt, the more reliable your parsing.

Step 6: Parse with a Safety Net

Claude is reliable, but not infallible. Sometimes it wraps JSON in markdown code fences. Sometimes it adds a preamble sentence. Our extractJSON utility handles both:

const text =
  response.content[0].type === 'text'
    ? response.content[0].text
    : '';

let analysis;
try {
  analysis = JSON.parse(extractJSON(text));
} catch {
  console.error('Failed to parse AI response');
  return NextResponse.json(
    { error: 'Failed to parse AI response' },
    { status: 500 }
  );
}

The extractJSON function strips markdown fences and finds the first { to last } in the response. It's 10 lines of code that saves hours of debugging.

Step 7: Security Headers on Every Response

Every response — success or error — gets security headers. No exceptions.

const jsonResponse = NextResponse.json({
  analysis,
  usage: {
    inputTokens: response.usage.input_tokens,
    outputTokens: response.usage.output_tokens,
  },
});

jsonResponse.headers.set('X-RateLimit-Remaining', String(remaining));
return addSecurityHeaders(jsonResponse);

We also return token usage in every AI response. This isn't for the user — it's for us. When your Claude bill spikes, you want to know which endpoint is responsible.

Error Handling: Distinguish API Errors from Everything Else

The Anthropic SDK throws typed errors. Catch them separately so you can return appropriate status codes:

catch (error) {
  if (error instanceof Anthropic.APIError) {
    return addSecurityHeaders(
      NextResponse.json(
        { error: 'AI service temporarily unavailable' },
        { status: error.status || 503 }
      )
    );
  }
  return addSecurityHeaders(
    NextResponse.json(
      { error: 'Internal server error' },
      { status: 500 }
    )
  );
}

Never expose internal error messages to the client. Log them server-side, return a generic message to the user.

The Pattern in Practice

This 7-step flow scales from simple analysis endpoints to complex orchestration. Our report generation route extends it with Supabase persistence. Our cron-triggered routes swap session auth for bearer token auth. Our community voting routes add cascade effects — when a vote crosses a threshold, the item auto-publishes and posts to X.

The skeleton stays the same. The details change per endpoint.

Why This Matters

Most AI tutorials show you how to call the API. They don't show you what happens when 50 users hit your endpoint simultaneously, or when Claude returns markdown instead of JSON, or when someone sends a 200KB payload to an endpoint expecting a paragraph.

Production AI isn't about the model. It's about everything around the model. Get the plumbing right and the AI part is the easy part.