Skip to content
the cache
Blog
AI7 min read

Stop parsing prose: structured output patterns that survive production

How to get an LLM to return data you can actually use: schema-first instead of regex, constrained decoding, validation at the boundary with Zod, retry-with-error-feedback, and when a clean second pass beats one more retry.

By Tristan

It always starts the same way. You ask the model for "the priority and the category," it answers in a tidy sentence, and you reach for a regex to pull the fields out. It works in the demo. It works for a week. Then the model rephrases — "I'd classify this as high priority" becomes "This looks urgent to me" — and your /priority:\s*(\w+)/ returns null, the downstream code throws, and a support ticket silently routes to nowhere.

The instinct is to make the regex smarter. That's the wrong layer. You don't have a parsing problem; you have a contract problem. The model produces prose by default because prose is what it was trained to produce, and prose is the one thing your pipeline can't depend on. The fix isn't a better parser downstream — it's deciding, up front, the exact shape of the data you'll accept, and refusing everything else. Four techniques get you there, and they stack. Here they are in the order you should reach for them.

1. Define the contract first

Before you write a single line of the prompt, write the schema. Not as documentation, not as an afterthought once the model "mostly works" — as the spec the rest of the pipeline is built against. If the downstream code needs a priority of low | medium | high | urgent and a category from a fixed list, that constraint is a fact about your system, and the schema is where it lives.

This inverts the usual flow. The naive version prompts first, sees what comes back, then writes code to cope with it — which means the model's whims define your data model. Schema-first reverses the dependency: the schema is the authority, the prompt's job is to satisfy it, and everything after the boundary gets to assume it held. A good schema makes invalid states unrepresentable. There is no priority: "kind of high" because the type doesn't allow it to exist.

When to skip it: almost never. Even when you want a freeform paragraph back — a summary, a draft reply — wrap it in a one-field object. The cost is one line; the payoff is that "the output" is always addressable, versionable, and testable.

2. Constrain generation — don't just hope

A schema you only check after the fact still lets the model emit garbage; you just catch it later. The stronger move is to make invalid output hard to generate in the first place. Three rungs, in rough order of strength:

  • JSON mode — the provider guarantees syntactically valid JSON. It stops unparseable text and stray prose, but says nothing about your fields. You can still get well-formed JSON with the wrong keys.
  • Tool / function calling — you hand the model a JSON Schema for the arguments, and it fills them in. Now the shape is constrained, not just the syntax. For most production work this is the sweet spot, and it's the path a library like Instructor leans on under the hood — see the glue layer in the 2026 open-source AI stack.
  • Grammar-constrained decoding — the decoder is masked at each step so only tokens that fit your grammar can be sampled. Invalid output becomes literally impossible to emit. This is the strongest guarantee and it's open-weight territory: Outlines and similar run against models you serve yourself.

A tool-call schema is just your contract, expressed in the provider's dialect:

triage-tool.ts
const triageTool = {
  name: "triage_ticket",
  description: "Classify a support ticket.",
  parameters: {
    type: "object",
    properties: {
      priority: { type: "string", enum: ["low", "medium", "high", "urgent"] },
      category: { type: "string", enum: ["billing", "bug", "howto", "other"] },
      needsHuman: { type: "boolean" },
    },
    required: ["priority", "category", "needsHuman"],
    additionalProperties: false,
  },
} as const;

When NOT to reach for it: constrained decoding isn't free or universal. Provider and model support varies — grammar-level constraints often aren't available on hosted APIs at all. Complex or deeply nested schemas cost latency, and over-constraining can quietly degrade answer quality, because you're narrowing the model's path while it's still reasoning. If your provider only offers plain JSON mode, take it and lean harder on the next step. Constraint reduces the failure rate; it doesn't replace validation.

3. Validate at the boundary

Here is the rule that makes everything downstream safe: treat model output as untrusted external input. Not "usually fine," not "the tool call guarantees it" — untrusted, the same way you'd treat a request body or a third-party API response. The boundary is the single place where your code earns the right to trust the data, and it earns it by parsing, not by hoping.

Zod is the clean way to do this in TypeScript. You define the contract once, parse the raw output through it, and on the other side you have a fully typed value or a precise error — never a half-trusted blob:

import { z } from "zod";
 
const Triage = z.object({
  priority: z.enum(["low", "medium", "high", "urgent"]),
  category: z.enum(["billing", "bug", "howto", "other"]),
  needsHuman: z.boolean(),
  // normalize deliberately: trim + cap a free-text note, default to "".
  note: z.string().trim().max(280).default(""),
});
 
type Triage = z.infer<typeof Triage>;
 
function parseTriage(raw: string): Triage {
  const json = JSON.parse(raw); // throws on non-JSON — that's fine, catch upstream
  return Triage.parse(json); // throws ZodError with field-level detail
}

Three things are doing work here. The enums reject any value outside the contract, so a hallucinated "critical" priority fails loudly instead of slipping through. The .trim().max(280).default("") is deliberate normalization — you decide how to coerce, rather than letting downstream code discover the model returned 4,000 characters. And Triage.parse fails fast with a typed error: when it throws, you know exactly which field broke and why, which turns out to be the most valuable thing in the whole pipeline — because it's what you feed back in the next step.

When NOT to over-do it: validate, but don't re-validate the same value at every function three layers deep. Parse once, at the boundary, then pass the typed value around and trust it. The boundary is a line you cross exactly once.

4. Retry with the error fed back

When validation fails, most code throws and moves on. But the ZodError you just caught is the most useful signal you have — it says, in plain terms, what was wrong with the output. The model that produced the bad answer is fully capable of producing a good one if you simply tell it what broke. So hand the error back as the next turn's input:

async function triageWithRetry(
  ticket: string,
  maxAttempts = 3,
): Promise<Triage> {
  let lastError = "";
 
  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    const correction = lastError
      ? `\n\nYour previous reply failed validation:\n${lastError}\nReturn only valid JSON matching the schema.`
      : "";
 
    // callModel and triagePrompt are stand-ins for your own API call and
    // prompt builder — swap in whatever your stack uses.
    const raw = await callModel(triagePrompt(ticket) + correction);
 
    // safeJsonParse: a non-throwing JSON.parse that returns undefined on bad
    // JSON, so a malformed reply becomes a retryable failure, not a throw.
    const result = Triage.safeParse(safeJsonParse(raw));
    if (result.success) return result.data;
 
    // Feed the exact validation failure back into the next attempt.
    lastError = result.error.issues
      .map((i) => `- ${i.path.join(".")}: ${i.message}`)
      .join("\n");
  }
 
  throw new Error(`Triage failed after ${maxAttempts} attempts: ${lastError}`);
}

The loop is bounded on purpose — an unbounded retry is just a way to spend money and rack up latency while the model fails the same way forever. Three attempts is a sane default; pick yours from the cost and the criticality of the call.

When NOT to lean on it: two cautions worth stating plainly. First, cost: every retry is another full inference, so a 10% failure rate with retries is real spend, not a rounding error — measure it. Second, idempotency: a retry has to be safe to repeat, so the model step must not have already triggered a side effect. Classify, then act; never act mid-loop.

This is the same closed-loop discipline that separates a real coding agent from a confident guesser — generate, check against ground truth, feed the failure back, try again. It's the throughline of the field guide to AI coding agents: the loop is only as good as the check that closes it.

When to fall back to a second pass

Retries have a ceiling, and it's important to recognize when you've hit it. A retry re-rolls the same approach — same prompt, same schema, same model, hoping the dice land differently. A second pass changes the approach. Those are different tools, and reaching for the wrong one wastes both money and time.

The heuristic is simple: watch how the retries fail. If they fail differently each time — valid once, missing a field the next, malformed after that — you're fighting noise, and one more retry is genuinely the cheaper fix. But if they fail the same way every time — the model keeps refusing to emit one nested field, or keeps inventing an enum value that isn't there — retrying is just paying to watch it lose the same fight. That's your signal to change the approach.

A second pass usually means one of three moves. Shrink the schema — ask for the two fields the model can reliably produce, derive the rest in code. Split extract-then-format — one cheap call to pull the raw facts, a second to shape them into the contract, so neither call has to do two hard things at once. Or escalate the model — send the cases that keep failing to a stronger or simply different model, and keep the cheap one for the easy majority. The unifying idea: when retries plateau, stop re-rolling and reduce the difficulty of the thing you're asking for.

When NOT to bother: if a single retry already clears 99% of cases, a second-pass pipeline is complexity you don't need yet. Build the fallback when the failure logs tell you to, not on speculation.

The cache

A few things worth keeping — principles, not prescriptions, the way we keep everything else here:

  • Make invalid states unrepresentable. The best bug is the one the type system won't let you write. A tight schema with enums and required fields does more for reliability than any amount of defensive parsing bolted on afterward.
  • Validate at the boundary, once. Model output is untrusted external input. Parse it through the schema at the edge, fail fast with a typed error, and let everything downstream trust the result because you checked it exactly where it entered.
  • The validator's error is your retry prompt. Don't discard the failure — it's the clearest correction signal you'll ever get. Feed it back, bound the loop, and measure what the retries cost you.
  • Know retry from second pass. Different-every-time failures want one more re-roll; same-every-time failures want a simpler schema or a different model. Match the tool to the failure mode.

Strip it all the way down and there's one idea underneath: the schema is the contract, and the model is a fallible external system. Wrap it the way you'd wrap any unreliable dependency — constrain what you can, validate everything, correct with the error in hand, and keep a simpler path ready for when it won't cooperate. Do that, and you stop parsing prose and start consuming data.

Write the schema first. Everything else is downstream of that one decision.