Chat Completions

The chat completions endpoint provides OpenAI-compatible LLM inference. It supports both streaming (Server-Sent Events) and non-streaming (JSON) responses.

This endpoint is protected -- you must authenticate with a JWT token or provide x402 payment proof.

Endpoint

POST /v1/chat/completions

Authentication

Method
Header
Description

API Token

Authorization: Bearer <token>

Use the token from /v1/authorize

Request Body

Field
Type
Required
Default
Description

messages

Message[]

Yes

--

Array of conversation messages (min 1)

model

string

No

--

Model identifier (e.g., anthropic/claude-sonnet-4.5)

stream

boolean

No

false

Enable Server-Sent Events streaming

temperature

number

No

--

Sampling temperature (0-2)

max_tokens

number

No

--

Maximum tokens to generate

top_p

number

No

--

Nucleus sampling threshold (0-1)

top_k

number

No

--

Top-K sampling

frequency_penalty

number

No

--

Frequency penalty (-2 to 2)

presence_penalty

number

No

--

Presence penalty (-2 to 2)

stop

string | string[]

No

--

Stop sequence(s)

response_format

object

No

--

{ type: "text" } or { type: "json_object" }

tools

Tool[]

No

--

Function definitions for tool calling

tool_choice

ToolChoice

No

--

Tool selection strategy

Message Format

Content Parts (for multimodal messages):

Tool Definition

Tool Choice

Value
Description

"none"

Do not call any tools

"auto"

Let the model decide

"required"

The model must call a tool

{ type: "function", function: { name: "..." } }

Force a specific tool


Non-Streaming Response

When stream is false or omitted, the endpoint returns a single JSON response.

Example Request

Example Response


Streaming Response

When stream is true, the endpoint returns Server-Sent Events (SSE).

Example Request

Response Headers

Stream Format

Each chunk is sent as data: {json}\n\n:

Chunk Fields

Field
Type
Description

id

string

Unique ID, same across all chunks

object

string

Always "chat.completion.chunk"

created

number

Unix timestamp

model

string

Model identifier

choices[].delta.role

string

"assistant" (first chunk only)

choices[].delta.content

string

Incremental text content

choices[].finish_reason

string?

Set in final content chunk ("stop", "length", "tool_calls")

usage

object

Token usage (final chunk only)

The stream terminates with data: [DONE]\n\n.


Error Responses

Validation Error (400)

Rate Limit (429)

Includes headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, Retry-After.

Payment Required (402)

When no JWT token or x402 payment is provided, the server returns HTTP 402 with payment requirements specifying accepted schemes, networks, and amounts.

Streaming Errors

If an error occurs after streaming has started, it is sent as an SSE event:


Usage Tracking

Token usage is automatically tracked for authenticated users. After each completion, the server records:

  • Prompt tokens and completion tokens

  • Cost breakdown (base cost + 10% commission)

  • Wallet address association

This data is used for debt accumulation and billing through the smart account payment system.

Last updated