codezone

OpenAI cures structured data

2024-08-07

Ryan Daws is a senior editor at TechForge Media with over a decade of experience in crafting compelling narratives and making complex topics accessible. His articles and interviews with industry leaders have earned him recognition as a key influencer by organisations like Onalytica. Under his leadership, publications have been praised by analyst firms such as Forrester for their excellence and performance. Connect with him on X (@gadget_ry) or Mastodon (@gadgetry@techhub.social)

OpenAI has unveiled “Structured Outputs”, a new API feature designed to address the long-standing challenge of reliably generating structured data from large language models (LLMs). The feature, available now, guarantees that model-generated outputs will adhere to developer-defined JSON Schemas.

Generating structured data from unstructured input is a cornerstone of many AI applications today. Developers leverage the OpenAI API to build sophisticated assistants capable of fetching data, answering complex questions via function calling, extracting structured data for seamless data entry, and enabling multi-step workflows where LLMs can take specific actions.

However, the inherent limitations of LLMs in consistently producing structured output have led developers to employ workarounds such as open-source tooling, intricate prompting techniques, and repeated request retries. These workarounds, while functional, add complexity and compromise efficiency.

OpenAI’s Structured Outputs promises to eliminate these workarounds. It achieves this by constraining OpenAI models to match developer-supplied schemas and by training models to better understand and adhere to complex data structures.

“Structured Outputs solves this problem by constraining OpenAI models to match developer-supplied schemas and by training our models to better understand complicated schemas,” OpenAI said in a blog post.

Internal evaluations using complex JSON schemas have shown remarkable results. The latest model, gpt-4o-2024-08-06, achieved a perfect 100% score in adherence to structured outputs, a significant improvement over the previous gpt-4-0613, which scored less than 40%.