OpenAPI 3.2 brings native, first‑class ways to describe APIs that send data as a sequence of events instead of a single, monolithic payload. This is a big deal for real‑time apps—LLMs, analytics feeds, chat, logs, and anything that benefits from progressive rendering or lower perceived latency. At a high level, OpenAPI 3.2 formalizes “sequential” media types and lets you specify the schema of each item in the stream using a new itemSchema on a response media type. That means your documentation can be explicit about the shape of each event, not just the overall connection.
- New capability: Use itemSchema under content to define the structure of each streamed event.
Supported sequential media types:
- SSE: text/event-stream
- JSON Lines: application/jsonl
- JSON Sequences: application/json-seq
- Multipart Mixed: multipart/mixed
This unlocks consistent tooling, clearer client expectations, and better validation for streaming APIs.
A quick primer: how streaming differs from “normal” responses
- Normal response: One payload, one schema.
- Streaming response: Many items over time; each item conforms to the itemSchema. The transport stays open while the server emits items; the client processes incrementally.
Common patterns you’ll see:
- SSE for text/token streams in browsers
- JSONL for structured event logs and incremental model outputs
- Multipart for mixed binary/text chunks (e.g., speech + text)
- Sentinel events like [DONE] to cleanly signal the end of a stream
Example: describing an LLM’s streaming API (SSE)
Here’s a minimal but realistic OpenAPI 3.2 spec for a text‑generation endpoint that streams tokens via Server‑Sent Events. The server emits events as they become available; clients render tokens progressively.
openapi: 3.2.0
info:
title: LLM Streaming API
version: 1.0.0
paths:
/generate:
post:
summary: Stream generated text from the LLM
description: |
Streams model output incrementally using Server-Sent Events (SSE).
Each event contains a token chunk; a sentinel signals the end.
requestBody:
required: true
content:
application/json:
schema:
type: object
required: [model, prompt]
properties:
model:
type: string
description: Model identifier (e.g., "gpt-4o-mini")
prompt:
type: string
description: The input prompt for text generation
max_tokens:
type: integer
minimum: 1
default: 256
temperature:
type: number
minimum: 0
maximum: 2
default: 0.7
responses:
"200":
description: Stream of generation events via SSE
headers:
Content-Type:
schema:
type: string
enum: ["text/event-stream"]
content:
text/event-stream:
itemSchema:
oneOf:
- type: object
required: [event, data]
properties:
event:
type: string
enum: ["token"]
description: Event type
data:
type: object
required: [text, index]
properties:
text:
type: string
description: Token or text chunk
index:
type: integer
minimum: 0
description: Incrementing token index
logprobs:
type: number
nullable: true
description: Optional per-token log prob
finish_reason:
type: string
nullable: true
enum: ["stop", "length", "content_filter", null]
- type: object
required: [event, data]
properties:
event:
type: string
enum: ["summary"]
data:
type: object
properties:
usage:
type: object
properties:
prompt_tokens: { type: integer, minimum: 0 }
completion_tokens: { type: integer, minimum: 0 }
total_tokens: { type: integer, minimum: 0 }
model:
type: string
- type: object
required: [event, data]
properties:
event:
type: string
enum: ["done"]
data:
type: string
enum: ["[DONE]"]
"400":
description: Invalid request
content:
application/json:
schema:
type: object
required: [error]
properties:
error:
type: string
"429":
description: Rate limited
"500":
description: Server errorNotes:
- text/event-stream matches the SSE transport browsers understand.
- itemSchema with oneOf captures normal token events, an optional final summary, and the explicit end‑of‑stream sentinel.
- Errors follow regular non‑streaming JSON shapes with standard HTTP codes.
Variant: JSON Lines (JSONL) streaming for SDKs and backend clients
If your clients prefer framed JSON instead of SSE, you can offer application/jsonl with the same itemSchema. Each line is one JSON object.
responses:
"200":
description: Stream of generation events via JSON Lines
content:
application/jsonl:
itemSchema:
oneOf:
- type: object
required: [type, token]
properties:
type:
type: string
enum: ["token"]
token:
type: object
required: [text, index]
properties:
text: { type: string }
index: { type: integer, minimum: 0 }
- type: object
required: [type, data]
properties:
type:
type: string
enum: ["done"]
data:
type: string
enum: ["[DONE]"]Designing a great streaming contract
- Be explicit about termination: Include a clear sentinel (e.g., event: "done" with data: "[DONE]") so clients can stop reading deterministically.
- Separate event kinds: Use oneOf to model different event shapes (tokens vs. summary vs. control).
- Carry minimal context: Include a token index, optional finish_reason, and a final summary with usage for billing/UX.
- Document timeouts and reconnection: For SSE you can also expose an SSE retry interval; for clients, document expected timeouts and backoff.
- Error semantics: Prefer failing fast (non‑200) before starting the stream. If you must signal errors mid‑stream, reserve a control event type (e.g., event: "error") and describe it in oneOf.
Client expectations
- SSE clients: Use EventSource in browsers or an SSE library on servers. Parse event and data fields.
- JSONL clients: Read line‑delimited JSON; process each line as one event.
- Backpressure: Streaming responses are pull‑driven by TCP; keep events small and frequent for smoother UX.
- Idempotency: If you support retries, include id/cursor fields so clients can resume safely.
Why this is useful for LLMs
- Faster first token: Users see output immediately, improving perceived latency.
- Progressive rendering: Stream tokens as they’re generated; UIs don’t block.
- Richer telemetry: Emit intermediary signals (e.g., tool_call, reasoning, usage) as distinct event types.
- Interoperability: A standardized spec makes SDKs and gateways simpler to build and maintain.
See the OpenAPI 3.2 specification here.