Event streaming in OpenAPI 3.2: What changed and why it matters

OpenAPI 3.2 brings native, first‑class ways to describe APIs that send data as a sequence of events instead of a single, monolithic payload. This is a big deal for real‑time apps—LLMs, analytics feeds, chat, logs, and anything that benefits from progressive rendering or lower perceived latency. At a high level, OpenAPI 3.2 formalizes “sequential” media types and lets you specify the schema of each item in the stream using a new itemSchema on a response media type. That means your documentation can be explicit about the shape of each event, not just the overall connection.

New capability: Use itemSchema under content to define the structure of each streamed event.

Supported sequential media types:

SSE: text/event-stream
JSON Lines: application/jsonl
JSON Sequences: application/json-seq
Multipart Mixed: multipart/mixed

This unlocks consistent tooling, clearer client expectations, and better validation for streaming APIs.

A quick primer: how streaming differs from “normal” responses

Normal response: One payload, one schema.
Streaming response: Many items over time; each item conforms to the itemSchema. The transport stays open while the server emits items; the client processes incrementally.

Common patterns you’ll see:

SSE for text/token streams in browsers
JSONL for structured event logs and incremental model outputs
Multipart for mixed binary/text chunks (e.g., speech + text)
Sentinel events like [DONE] to cleanly signal the end of a stream

Example: describing an LLM’s streaming API (SSE)

Here’s a minimal but realistic OpenAPI 3.2 spec for a text‑generation endpoint that streams tokens via Server‑Sent Events. The server emits events as they become available; clients render tokens progressively.

openapi: 3.2.0
info:
  title: LLM Streaming API
  version: 1.0.0
paths:
  /generate:
    post:
      summary: Stream generated text from the LLM
      description: |
        Streams model output incrementally using Server-Sent Events (SSE).
        Each event contains a token chunk; a sentinel signals the end.
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required: [model, prompt]
              properties:
                model:
                  type: string
                  description: Model identifier (e.g., "gpt-4o-mini")
                prompt:
                  type: string
                  description: The input prompt for text generation
                max_tokens:
                  type: integer
                  minimum: 1
                  default: 256
                temperature:
                  type: number
                  minimum: 0
                  maximum: 2
                  default: 0.7
      responses:
        "200":
          description: Stream of generation events via SSE
          headers:
            Content-Type:
              schema:
                type: string
                enum: ["text/event-stream"]
          content:
            text/event-stream:
              itemSchema:
                oneOf:
                  - type: object
                    required: [event, data]
                    properties:
                      event:
                        type: string
                        enum: ["token"]
                        description: Event type
                      data:
                        type: object
                        required: [text, index]
                        properties:
                          text:
                            type: string
                            description: Token or text chunk
                          index:
                            type: integer
                            minimum: 0
                            description: Incrementing token index
                          logprobs:
                            type: number
                            nullable: true
                            description: Optional per-token log prob
                          finish_reason:
                            type: string
                            nullable: true
                            enum: ["stop", "length", "content_filter", null]
                  - type: object
                    required: [event, data]
                    properties:
                      event:
                        type: string
                        enum: ["summary"]
                      data:
                        type: object
                        properties:
                          usage:
                            type: object
                            properties:
                              prompt_tokens: { type: integer, minimum: 0 }
                              completion_tokens: { type: integer, minimum: 0 }
                              total_tokens: { type: integer, minimum: 0 }
                          model:
                            type: string
                  - type: object
                    required: [event, data]
                    properties:
                      event:
                        type: string
                        enum: ["done"]
                      data:
                        type: string
                        enum: ["[DONE]"]
        "400":
          description: Invalid request
          content:
            application/json:
              schema:
                type: object
                required: [error]
                properties:
                  error:
                    type: string
        "429":
          description: Rate limited
        "500":
          description: Server error

Notes:

text/event-stream matches the SSE transport browsers understand.
itemSchema with oneOf captures normal token events, an optional final summary, and the explicit end‑of‑stream sentinel.
Errors follow regular non‑streaming JSON shapes with standard HTTP codes.

Variant: JSON Lines (JSONL) streaming for SDKs and backend clients

If your clients prefer framed JSON instead of SSE, you can offer application/jsonl with the same itemSchema. Each line is one JSON object.

responses:
  "200":
    description: Stream of generation events via JSON Lines
    content:
      application/jsonl:
        itemSchema:
          oneOf:
            - type: object
              required: [type, token]
              properties:
                type:
                  type: string
                  enum: ["token"]
                token:
                  type: object
                  required: [text, index]
                  properties:
                    text: { type: string }
                    index: { type: integer, minimum: 0 }
            - type: object
              required: [type, data]
              properties:
                type:
                  type: string
                  enum: ["done"]
                data:
                  type: string
                  enum: ["[DONE]"]

Designing a great streaming contract

Be explicit about termination: Include a clear sentinel (e.g., event: "done" with data: "[DONE]") so clients can stop reading deterministically.
Separate event kinds: Use oneOf to model different event shapes (tokens vs. summary vs. control).
Carry minimal context: Include a token index, optional finish_reason, and a final summary with usage for billing/UX.
Document timeouts and reconnection: For SSE you can also expose an SSE retry interval; for clients, document expected timeouts and backoff.
Error semantics: Prefer failing fast (non‑200) before starting the stream. If you must signal errors mid‑stream, reserve a control event type (e.g., event: "error") and describe it in oneOf.

Client expectations

SSE clients: Use EventSource in browsers or an SSE library on servers. Parse event and data fields.
JSONL clients: Read line‑delimited JSON; process each line as one event.
Backpressure: Streaming responses are pull‑driven by TCP; keep events small and frequent for smoother UX.
Idempotency: If you support retries, include id/cursor fields so clients can resume safely.

Why this is useful for LLMs

Faster first token: Users see output immediately, improving perceived latency.
Progressive rendering: Stream tokens as they’re generated; UIs don’t block.
Richer telemetry: Emit intermediary signals (e.g., tool_call, reasoning, usage) as distinct event types.
Interoperability: A standardized spec makes SDKs and gateways simpler to build and maintain.

See the OpenAPI 3.2 specification here.