OpenAI’s structured output vs. instructor and outlines

On August 6 OpenAI released structured outputs in their API. Is structured outputs a replacement for instructor, outlines and other libraries that provide structured outputs for language models? Let’s compare them.

OpenAI’s structured outputs makes the following code possible:

import json

from pydantic import BaseModel
from openai import OpenAI


class Ingredient(BaseModel):
    name: str
    amount: str
    kcal: int


class Recipe(BaseModel):
    ingredients: list[Ingredient]
    instructions: str


client = OpenAI()

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[{"role": "user", "content": "Write an apple pie recipe"}],
    response_format=Recipe,
)

apple_pie_recipe = Recipe(**json.loads(completion.choices[0].message.content))

It’s guaranteed that the output will be JSON that can be parsed into a Recipe object. The code looks very similar to the code you’d write with any of the 10 libraries I compared in May.

Besides removing the need for a library, structured output works quite differently from function calling under the hood. With function calling the model is trained to follow an instruction given as a JSON schema and is likely but not guaranteed to follow it. At any token position it’s still free to output a token that doesn’t fit the schema. With structured output, the output of the model is constrained to fit the schema. This is the same approach as the outlines library uses for open source models.

Pros and cons

The structured output feature has several advantages over function calling:

✅ The definition of the output format doesn’t count as input tokens, making it significantly cheaper, especially for short input messages and complex output formats.
✅ The output is 100% guaranteed to follow the structure, in contrast to JSON mode and function calling which are just very likely to follow the structure.
✅ It doesn’t slow down the generation process, rather it speeds it up because tokens with no alternatives can be automatically placed rather than generated by the model.

But also some downsides:

❌ OpenAI’s implementation only works with its own models.
❌ It only supports a subset of JSON schema. In particular, they don’t support minLength and maxLength constraints. See their docs. These are supported by outlines and instructor.
❌ The first API call with a schema has a higher latency than subsequent calls because the schema has to be compiled.

I expect that the first two downsides will be addressed in the future. Thanks to the outlines library, the implementation of structured outputs is already available for open source models. Perhaps providers like Fireworks AI and Groq will adopt it with the same API specification as OpenAI. They’ve done this with function calling. In turn, platform-agnostic libraries like mirascope, marvin and instructor may adopt it as well.

Are instructor and other structured output libraries obsolete?

Right after the announcement, Jason Liu, author of instructor posted:

They solved instructor.

on X. Later he added a longer post with his thoughts.

Yes, the core value proposition of: “give me a Pydantic model and I’ll use function calling to guarantee the output fits the schema” is now covered for OpenAI models, but only for OpenAI models. If you’re using other models or want to stay flexible, structured output libraries are still useful. Each library also comes with additional features, as I’ve covered in my comparison. Examples are multiple provider support, error handling, caching, chaining and more.

So in short: no, they’re not obsolete, but their space is getting squeezed.

Conclusion

If you’re exclusively using OpenAI models and only need basic structured responses, I recommend using OpenAI’s structured outputs. It’s the most convenient, secure and cheapest method. If you prefer other LLM providers or want your code to be provider-agnostic, I recommend sticking with outlines (if self-hosting) or instructor (if using API providers).