Ship Reliable AI: How Zod and Schema‑Driven Development Harden LLM
In 2026 the honeymoon of "talking" to AI is over. As Senior Software Engineers, we've learned a hard truth: software is not a conversation — it's a contract.
The hype insists a 'perfect prompt' is all you need to ship AI features; that's dangerously misleading. In production, Large Language Models (LLMs) are non-deterministic, prone to being creative when you need them to be rigid, and can return prose when your NodeJS backend expects pristine JSON. When that happens, systems don't just misbehave—they crash.
Relying on a single LLM call to succeed every time is building a house of cards. Here's how to move from "vibes-based development" to responsible AI engineering.
Note: You can find the full, production-ready source code for these examples in my GitHub repository: Pawl0/llm-outputs-with-zod.
When LLMs Get Fuzzy — Make Schemas the Source of Truth
The biggest mistake I see is developers treating an LLM like a predictable function. It’s not. It’s an untrusted third-party API.
The first step to sanity is Schema-Driven Development. Before you even think about the prompt, you define the contract using Zod. If the data doesn't pass the Zod audit, it doesn't exist to the rest of your system. Period.
Phase 1: The "Happy Path" (And why it fails)
This is the code that looks great in a LinkedIn post but dies in production. It assumes the LLM will follow instructions because you asked nicely.
import 'dotenv/config';
import { z } from 'zod';
import OpenAI from 'openai';
// 1. The Contract: If it's not here, it doesn't enter the system
const TicketSchema = z.object({
category: z.enum(['billing', 'technical', 'feature_request']),
priority: z.enum(['low', 'medium', 'high']),
tags: z.array(z.string()).max(3),
summary: z.string().min(10)
});
interface RawSupportRequest {
from: string;
subject: string;
body: string;
}
const messyInput: RawSupportRequest = {
from: "john.doe@gmail.com",
subject: "HELP!!! PLS",
body: "Hey, I've been trying to log in for 3 hours and it keeps saying 'invalid credentials' even though I reset my password twice. Also, your checkout page is super slow. Fix this or I'm canceling my Pro subscription!!"
};
type Ticket = z.infer<typeof TicketSchema>;
async function classifyTicket(input: RawSupportRequest): Promise<Ticket> {
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY! });
const prompt = `
Analyze this support request and extract structured data.
SENDER: ${input.from}
SUBJECT: ${input.subject}
MESSAGE BODY: "${input.body}"
INSTRUCTIONS:
1. Determine the primary category.
2. Assess priority (look for urgency keywords like "canceling", "hours", "HELP").
3. Extract up to 3 relevant tags.
4. Write a 1-sentence technical summary.
Return ONLY JSON matching the schema.
`;
const response = await openai.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: prompt }],
response_format: { type: "json_object" }
});
const rawContent = response.choices[0].message.content;
try {
const json = JSON.parse(rawContent || '{}');
// Our guardrail
return TicketSchema.parse(json);
} catch (error) {
console.error("The AI hallucinated and the contract broke:", error);
throw new Error("Failed to process ticket");
}
}
classifyTicket(messyInput).then(ticket => {
console.log(ticket);
}).catch(error => {
console.error(error);
});
The Result: The "Brittle" Outcome
When you run this with messy real-world data, the LLM often fails the contract. It might return invalid enums or skip fields entirely. Here is the actual crash log:
LLM returned invalid schema. Triggering retry logic... ZodError: [
{
"code": "invalid_value",
"values": ["billing", "technical", "feature_request"],
"path": ["category"],
"message": "Invalid option: expected one of \"billing\"|\"technical\"|\"feature_request\""
},
{
"expected": "string",
"code": "invalid_type",
"path": ["summary"],
"message": "Invalid input: expected string, received undefined"
}
]
Error: Failed to process ticket
Execution stops. Your backend returns a 500. The user is frustrated.
Life After Deploy: Why One-Shot is a Myth
In a production environment, the model will fail you. It will invent a category that doesn’t exist or skip the summary requirement. If your code just throws an error and gives up, you’re generating technical debt and ruining the user experience.
Software is never "done," and AI-integrated systems need Self-Healing mechanisms.
Phase 2: The Self-Correction Loop
A Senior Developer doesn't just catch errors; they use them as feedback. If Zod tells us exactly what’s wrong, we throw that error back at the LLM and demand a fix. This is how you turn a 90% success rate into 99.9%.
We also need Observability. If it takes three tries to get a valid JSON, your costs just tripled. You need to know that.
// Define a wrapper for the result to include metadata
interface ClassificationResult {
data: Ticket;
attempts: number;
}
async function classifyTicketWithRetry(input: RawSupportRequest): Promise<ClassificationResult> {
const maxRetries = 3;
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY! });
let lastError = "";
const prompt = `
Analyze this support request: "${input.body}"
STRICT REQUIREMENTS:
1. You must return a JSON object.
2. "category" MUST be one of: ["billing", "technical", "feature_request"].
3. "priority" MUST be one of: ["low", "medium", "high"].
4. "summary" MUST be a string of at least 10 characters.
5. "tags" MUST be an array of strings.
Example Output:
{"category": "technical", "priority": "high", "summary": "User cannot log in...", "tags": ["auth"]}
`;
for (let attempt = 0; attempt <= maxRetries; attempt++) {
const feedback = attempt > 0 ? `\n\nERROR FROM PREVIOUS ATTEMPT: ${lastError}. Please fix the JSON schema.` : "";
const response = await openai.chat.completions.create({
model: "gpt-4o",
messages: [
{ role: "system", content: "You are a precise data extractor that only outputs valid JSON." },
{ role: "user", content: prompt + feedback }
],
response_format: { type: "json_object" }
});
try {
const rawContent = response.choices[0].message.content;
const json = JSON.parse(rawContent || '{}');
const validated = TicketSchema.parse(json);
return {
data: validated,
attempts: attempt + 1
};
} catch (error) {
if (error instanceof z.ZodError) {
// Flatten the errors so the LLM gets a clear "Fix-it" list
lastError = JSON.stringify(error.flatten().fieldErrors);
console.warn(`Attempt ${attempt + 1} failed. Feedback sent to LLM.`);
}
}
}
throw new Error("LLM failed to produce a valid schema after multiple retries.");
}
classifyTicketWithRetry(messyInput).then(result => {
console.log(`✅ Success after ${result.attempts} attempt(s):`, result.data);
}).catch(error => {
console.error(error);
});
The Result: The "Resilient" Outcome
Instead of a crash, the system heals itself by explaining its own validation rules to the AI.
Attempt 1 failed. Feedback sent to LLM.
✅ Success after 2 attempt(s): {
category: 'technical',
priority: 'high',
tags: [ 'authentication', 'slow performance', 'account' ],
summary: 'User experiencing login issues and slow checkout process.'
}
Conclusion: Engineering vs. Luck
Dealing with AI in 2026 requires the same discipline as any other mission-critical system. If you don't have type validation, you don't have a system; you have a hope.
Using Zod + TypeScript turns the chaos of natural language into concrete data. Using Retry Loops turns stochastic failures into resilient features.
Stop searching for the "magic prompt." It doesn't exist. Focus on building the guardrails that survive the AI's inevitable imperfections. That is the difference between juniors playing with APIs and seniors shipping real software.
For a deep dive into the code and to run these tests yourself, check out the repository: https://github.com/Pawl0/llm-outputs-with-zod
