# tsfm — Complete Documentation

> TypeScript SDK for Apple's Foundation Models framework — on-device Apple Intelligence inference in Node.js via FFI. macOS 26+, Apple Silicon only.

---
Source: /guide/getting-started
---

# Getting Started

TSFM gives Node.js applications access to Apple's on-device large language model through the on-device Foundation Models framework. It loads a pre-compiled dynamic library [via FFI](https://koffi.dev/), allowing it the same access as native Swift and ObjC applications.

TSFM is **<u>not</u>** a browser library or a cloud API. TSFM requires Node.js ≥20 on an Apple Silicon Mac running macOS 26+ with Apple Intelligence enabled. No matter what your AI assistant tells you, TSFM  **<u>will not work</u>**  in browser client-side code, on Windows/Linux, on Intel Macs or on macs without Apple Intelligence installed.

You might use TSFM for CLI tools, local dev tooling, Electron apps, automation scripts or small Mac-native services written in TypeScript.

## Requirements

- **macOS 26** (Tahoe) or later, Apple Silicon
- **Apple Intelligence** enabled in System Settings
- **Node.js 20+**

## Installation

```bash
npm install tsfm-sdk
```

Xcode is not required to use this package. The NPM package ships with a prebuilt dylib for macOS 26.0+.  If you know your machine requires a different dylib, see [Building from Source](#building-from-source).

## Quick Start

```ts
import { SystemLanguageModel, LanguageModelSession } from "tsfm-sdk";

const model = new SystemLanguageModel();
const { available } = await model.waitUntilAvailable();
if (!available) process.exit(1);

const session = new LanguageModelSession({
  instructions: "You are a concise assistant.",
});

const reply = await session.respond("What is the capital of France?");
console.log(reply); // "The capital of France is Paris."

session.dispose();
model.dispose();
```

## Key Concepts

**Apple Intelligence** refers to Apple's suite of generative AI features (Siri, Writing Tools, Image Playground, and more). The **Foundation Models** framework exposes **SystemLanguageModel**, the **on-device** large language model at the core of Apple Intelligence that runs on Macs, iPhones and iPads with no network required.

TSFM basically mirrors the Swift Foundation Models API (same class names, same method signatures, same concepts) with TypeScript translating the same actions to the same underlying model. For the most part, [Apple's own documentation](https://developer.apple.com/documentation/FoundationModels) will translate pretty directly.

| SDK class | Role |
| --- | --- |
| `SystemLanguageModel` | Entry point. Wraps the native model pointer and gates availability before you create sessions. |
| `LanguageModelSession` | Holds conversation state. All generation (text, structured, streaming, tool use) goes through a session. |
| `.dispose()` or  `Symbol.dispose` | Releases native resources. Required for any object that holds a C pointer. |

## Where To Go From Here

- [Model Configuration](/guide/model-configuration) — Use cases, guardrails, availability
- [Sessions](/guide/sessions) — Creating and using sessions
- [Streaming](/guide/streaming) — Token-by-token response streaming
- [Structured Outputs](/guide/structured-output) — Typed generation with dictionary or JSON schemas
- [Tools](/guide/tools) — Function calling
- [Error Handling](/guide/error-handling) — Error types and recovery
- [Chat API Compatibility](/guide/chat-api) — Drop-in Chat API compatible interface

## Building from Source

If you are working on TSFM as a developer, or need to rebuild the native library, run:

```bash
git clone https://github.com/codybrom/tsfm.git
cd tsfm
npm run build
```

Rebuilding from source requires **Xcode 26+** to compile the libFoundationModels.dylib Swift bridge.


---
Source: /guide/model-configuration
---

# Model Configuration

`SystemLanguageModel` is the entry point for the on-device model. It wraps the native model pointer to gate availability before you create sessions.

::: info
The **Swift** equivalent is [`SystemLanguageModel`](https://developer.apple.com/documentation/foundationmodels/systemlanguagemodel).
:::

## Creating a Model

```ts
import {
  SystemLanguageModel,
  SystemLanguageModelUseCase,
  SystemLanguageModelGuardrails,
} from "tsfm-sdk";

const model = new SystemLanguageModel({
  useCase: SystemLanguageModelUseCase.GENERAL,
  guardrails: SystemLanguageModelGuardrails.DEFAULT,
});
```

Both options are optional and default to the values shown above.

## Guardrails

Guardrails control how the model handles potentially unsafe content in prompts and responses.

::: info
The **Swift** equivalent is [`SystemLanguageModel.Guardrails`](https://developer.apple.com/documentation/foundationmodels/systemlanguagemodel/guardrails).
:::

| Value | Description |
| --- | --- |
| `DEFAULT` | Blocks unsafe content in both prompts and responses. Use this for most applications. |
| `PERMISSIVE_CONTENT_TRANSFORMATIONS` | Allows transforming potentially unsafe text input into text responses. Use this when your app needs to process user-generated content that may contain sensitive material (e.g., content moderation tools, text rewriting). |

```ts
const model = new SystemLanguageModel({
  guardrails: SystemLanguageModelGuardrails.PERMISSIVE_CONTENT_TRANSFORMATIONS,
});
```

With `DEFAULT` guardrails, unsafe content may trigger a `GuardrailViolationError`. With `PERMISSIVE_CONTENT_TRANSFORMATIONS`, the model may attempt to transform the content instead of rejecting it outright.

## Use Cases

Use cases hint to the model what kind of task you're performing.

::: info
The **Swift** equivalent is [`SystemLanguageModel.UseCase`](https://developer.apple.com/documentation/foundationmodels/systemlanguagemodel/usecase).
:::

| Value | Description |
| --- | --- |
| `GENERAL` | General-purpose text generation (default) |
| `CONTENT_TAGGING` | Optimized for classification and labeling tasks |

```ts
const tagger = new SystemLanguageModel({
  useCase: SystemLanguageModelUseCase.CONTENT_TAGGING,
});
```

## Checking Availability

The on-device model may not be available if Apple Intelligence is disabled, assets haven't finished downloading, or the hardware doesn't support it. Always check before creating a session.

### Synchronous Check

```ts
const { available, reason } = model.isAvailable();
if (!available) {
  console.log("Unavailable:", reason);
}
```

### Waiting for Availability

`waitUntilAvailable()` polls until the model is ready, with a default timeout of 30 seconds. If the failure is permanent (`DEVICE_NOT_ELIGIBLE` or `APPLE_INTELLIGENCE_NOT_ENABLED`), it returns immediately rather than waiting the full timeout. It only retries when the reason is `MODEL_NOT_READY`.

```ts
const { available } = await model.waitUntilAvailable();
const { available } = await model.waitUntilAvailable(10_000); // custom timeout
```

## Unavailability Reasons

When `available` is `false`, the `reason` field indicates why:

| Reason | Description |
| --- | --- |
| `APPLE_INTELLIGENCE_NOT_ENABLED` | Apple Intelligence is turned off in Settings |
| `MODEL_NOT_READY` | Model assets are still downloading |
| `DEVICE_NOT_ELIGIBLE` | Hardware doesn't support Foundation Models |

## Cleanup

Release native resources when you're done with the model:

```ts
model.dispose();
```


---
Source: /guide/sessions
---

# Sessions

`LanguageModelSession` manages conversation state and provides all generation methods. Each session maintains its own context window and [transcript](/guide/transcripts).

::: info
The **Swift** equivalent is [`LanguageModelSession`](https://developer.apple.com/documentation/foundationmodels/languagemodelsession).
:::

## Creating a Session

```ts
import { LanguageModelSession } from "tsfm-sdk";

const session = new LanguageModelSession({
  instructions: "You are a concise assistant.",
});
```

### With a Specific Model

```ts
const model = new SystemLanguageModel({ useCase: SystemLanguageModelUseCase.CONTENT_TAGGING });
const session = new LanguageModelSession({ model });
```

### With Tools

```ts
const session = new LanguageModelSession({
  tools: [weatherTool, calculatorTool],
});
```

## Generating Responses

### Text Response

```ts
const reply = await session.respond("What is the capital of France?");
console.log(reply); // "The capital of France is Paris."
```

### With Generation Options

```ts
const reply = await session.respond("Write a poem", {
  options: {
    temperature: 0.9,
    maximumResponseTokens: 200,
  },
});
```

See [Generation Options](/guide/generation-options) for all available options.

## Concurrency

Sessions serialize concurrent calls automatically. If you call `respond()` while another request is in progress, it queues and runs after the first completes:

```ts
// These run sequentially, not in parallel
const [a, b] = await Promise.all([
  session.respond("First question"),
  session.respond("Second question"),
]);
```

## Cancellation

Cancel an in-progress request with `cancel()`:

```ts
const promise = session.respond("Tell me a long story");
session.cancel();
```

Cancellation is advisory — the response may still complete if the model finishes before the cancel is processed. After cancellation, the session resets to idle and is ready for new requests.

## Checking State

`isResponding` tells you whether the session is currently processing a request:

```ts
if (session.isResponding) {
  // A generation call is in flight
}
```

## Cleanup

Always dispose sessions when done to release native memory:

```ts
session.dispose();
```

::: tip
If you prefer a higher-level interface, the [Chat API compatibility layer](/guide/chat-api) manages sessions automatically behind a more standard `chat.completions.create()` interface.
:::


---
Source: /guide/streaming
---

# Streaming

TSFM can stream responses token-by-token using an async iterator. The on-device model produces cumulative snapshots, and the SDK diffs them internally so you receive only the new tokens on each iteration.

::: info
The **Swift** equivalent is [`LanguageModelSession.ResponseStream`](https://developer.apple.com/documentation/foundationmodels/languagemodelsession/responsestream).
:::

## Basic Streaming

```ts
import { LanguageModelSession } from "tsfm-sdk";

const session = new LanguageModelSession();

for await (const chunk of session.streamResponse("Tell me a joke")) {
  process.stdout.write(chunk);
}
console.log();

session.dispose();
```

Each `chunk` is a string containing only the **new** tokens since the last iteration.

## With Options

```ts
for await (const chunk of session.streamResponse("Write a story", {
  options: { temperature: 0.8, maximumResponseTokens: 500 },
})) {
  process.stdout.write(chunk);
}
```

## Collecting the Full Response

If you want both streaming output and the complete text:

```ts
let full = "";
for await (const chunk of session.streamResponse("Explain TypeScript")) {
  process.stdout.write(chunk);
  full += chunk;
}
console.log("\n\nFull response length:", full.length);
```

## Chat API Streaming

If you prefer the Chat API streaming interface, the [compatibility layer](/guide/chat-api#streaming) provides `stream: true` with `ChatCompletionChunk` objects:

```ts
import Client from "tsfm-sdk/chat";
const client = new Client();

const stream = await client.chat.completions.create({
  messages: [{ role: "user", content: "Tell me a joke" }],
  stream: true,
});

for await (const chunk of stream) {
  const delta = chunk.choices[0].delta.content;
  if (delta) process.stdout.write(delta);
}
client.close();
```

## Cleanup

The stream reference is released automatically when iteration completes or the session is disposed. The SDK keeps the Node.js event loop alive while streaming, so the process won't exit mid-stream.


---
Source: /guide/structured-output
---

# Structured Output

When you provide a schema, the on-device model uses constrained sampling to guarantee its output matches your types and structure with no string parsing needed.

::: info
The **Swift** equivalents are the [`@Generable`](https://developer.apple.com/documentation/foundationmodels/generable) macro for compile-time schemas and [`DynamicGenerationSchema`](https://developer.apple.com/documentation/foundationmodels/dynamicgenerationschema) for runtime schemas. TSFM's `GenerationSchema` maps to the same underlying dictionary format.
:::

If you already have or prefer to use JSON Schema objects, you can use `respondWithJsonSchema` instead and the SDK will convert it at runtime.

If you're unsure which schema format you should use, see [Picking a Schema Format](#picking-a-schema-format).

## Defining a Schema (Native Format)

```ts
import { GenerationSchema, GenerationGuide } from "tsfm-sdk";

const schema = new GenerationSchema("Person", "A person profile")
  .property("name", "string", { description: "Full name" })
  .property("age", "integer", {
    description: "Age in years",
    guides: [GenerationGuide.range(0, 120)],
  })
  .property("tags", "array", {
    guides: [GenerationGuide.maxItems(5)],
    optional: true,
  });
```

### Property Types

`"string"` | `"integer"` | `"number"` | `"boolean"` | `"array"` | `"object"`

## Generation Guides

Guides constrain the model's output for a property.

::: info
The **Swift** equivalent is Foundation Models' [`@Guide`](https://developer.apple.com/documentation/foundationmodels/guide()) annotations. See Apple's [Generating Swift Data Structures with Guided Generation](https://developer.apple.com/documentation/foundationmodels/generating-swift-data-structures-with-guided-generation) guide.
:::

| Method | Constrains |
| --- | --- |
| `GenerationGuide.anyOf(["a", "b"])` | Enumerated string values |
| `GenerationGuide.constant("fixed")` | Exact string value |
| `GenerationGuide.range(min, max)` | Numeric range (inclusive) |
| `GenerationGuide.minimum(n)` | Numeric lower bound |
| `GenerationGuide.maximum(n)` | Numeric upper bound |
| `GenerationGuide.regex(pattern)` | String pattern |
| `GenerationGuide.count(n)` | Exact array length |
| `GenerationGuide.minItems(n)` | Minimum array length |
| `GenerationGuide.maxItems(n)` | Maximum array length |
| `GenerationGuide.element(guide)` | Applies a guide to array elements |

## Generating Structured Output

```ts
const session = new LanguageModelSession();
const content = await session.respondWithSchema("Describe a software engineer", schema);
```

### Extracting Values

Use `content.value<T>(key)` to extract typed values:

```ts
const name = content.value<string>("name");
const age = content.value<number>("age");
```

### Full Example

```ts
interface Cat {
  name: string;
  age: number;
  breed: string;
}

const schema = new GenerationSchema("Cat", "A rescue cat")
  .property("name", "string", { description: "The cat's name" })
  .property("age", "integer", {
    description: "Age in years",
    guides: [GenerationGuide.range(0, 20)],
  })
  .property("breed", "string", { description: "The cat's breed" });

const content = await session.respondWithSchema("Generate a rescue cat", schema);

const cat: Cat = {
  name: content.value("name"),
  age: content.value("age"),
  breed: content.value("breed"),
};
```

## Generating Structured Output with JSON Schema

If you already have a JSON Schema definition, or are porting from OpenAI or another API, you can pass it directly with respondWithJsonSchema instead of building a GenerationSchema first:

```ts
const content = await session.respondWithJsonSchema("Generate a person profile", {
  type: "object",
  properties: {
    name: { type: "string", description: "Full name" },
    age: { type: "integer", description: "Age in years" },
    occupation: { type: "string", description: "Job title" },
  },
  required: ["name", "age", "occupation"],
});

const person = content.toObject();
// { name: "Ada Lovelace", age: 36, occupation: "Mathematician" }
```

The SDK converts JSON Schema to Apple's native format automatically. Use toObject to get the full result as a plain object instead of extracting properties individually.

## Picking a Schema Format

Both methods are capable of producing constrained output. The choice comes down to whether you need [generation guides](#generation-guides) and what format you already have.

Use **`respondWithSchema`** when you need the extra constraints possible only with generation guides. It takes a `GenerationSchema` built with TSFM, which is the native [dictionary](https://developer.apple.com/documentation/swift/dictionary) format that Foundation Models uses internally. This option is also the only path that supports [generation guides](#generation-guides). Guides like `constant`, `anyOf`, and `element` have no JSON Schema equivalent. Generation guides allow you to constrain token selection at generation time rather than validating output after.

Use **`respondWithJsonSchema`** when you already have existing JSON schemas or don't need the extra constraints possible with generation guides. It accepts a standard JSON Schema object and TSFM converts it to the model's dictionary format at runtime. Standard constraints like `enum`, `minimum`/`maximum`, and `pattern` all work, but the model's more specific generation guides aren't available if you pass a JSON Schema.

If you don't need guides, either works. If you already have JSON schemas or are porting from another API, `respondWithJsonSchema` is the faster path.

::: tip
The [Chat API compatibility layer](/guide/chat-api#structured-output) also supports structured output via `response_format: { type: "json_schema" }`, using the same JSON Schema format as the Chat Completions API.
:::


---
Source: /guide/tools
---

# Tools

Tools let the model call your functions during generation. It is up to the model to decide if a tool can help, generate arguments matching your schema, call the tool, receive the result and continue generating.

::: info
The **Swift** equivalent is the Foundation Models [`Tool`](https://developer.apple.com/documentation/foundationmodels/tool) protocol.
:::

## Defining a Tool

Extend the abstract `Tool` class:

```ts
import { Tool, GenerationSchema, GeneratedContent, GenerationGuide } from "tsfm-sdk";

class WeatherTool extends Tool {
  readonly name = "get_weather";
  readonly description = "Gets current weather for a city.";

  readonly argumentsSchema = new GenerationSchema("WeatherParams", "")
    .property("city", "string", { description: "City name" })
    .property("units", "string", {
      description: "Temperature units",
      guides: [GenerationGuide.anyOf(["celsius", "fahrenheit"])],
    });

  async call(args: GeneratedContent): Promise<string> {
    const city = args.value<string>("city");
    const units = args.value<string>("units");
    return `Sunny, 22°C in ${city} (${units})`;
  }
}
```

### Required Members

| Member | Type | Description |
| --- | --- | --- |
| `name` | `string` | Unique tool identifier |
| `description` | `string` | What the tool does (shown to the model) |
| `argumentsSchema` | `GenerationSchema` | Schema for the tool's arguments |
| `call(args)` | `async (GeneratedContent) => string` | Handler that returns a string result |

## Using Tools in a Session

Pass tools when creating a session:

```ts
const tool = new WeatherTool();
const session = new LanguageModelSession({
  instructions: "You are a helpful assistant.",
  tools: [tool],
});

const reply = await session.respond("What's the weather in Tokyo?");
// The model calls get_weather, receives the result, and formulates a response
```

## Error Handling

If `call()` throws, it's wrapped in a `ToolCallError`:

```ts
try {
  await session.respond("...");
} catch (e) {
  if (e instanceof ToolCallError) {
    console.log(e.message); // includes tool name and original error
  }
}
```

## Cleanup

Tools register a native callback that must be released:

```ts
session.dispose();
tool.dispose();
```

Tools can be reused across sessions — just dispose after all sessions are done.

## Best Practices

The Foundation Model [`Tool` documentation](https://developer.apple.com/documentation/foundationmodels/tool) recommends:

- **Limit to 3–5 tools per session.** Tool schemas and descriptions consume context window space. More tools means less room for conversation. If your session exceeds the context size, split work across new sessions.
- **Keep descriptions short.** A brief phrase is enough. Long descriptions add latency and use up context.
- **Pre-run essential tools.** If a tool's output is always needed, call it yourself and include the result in the prompt or instructions rather than waiting for the model to discover it needs the tool.

## Tool Chaining

The model can call multiple tools in sequence within a single `respond()` call. If the first tool's output informs a second tool call, the model handles the chaining automatically — you don't need to loop.

## Chat API Tool Calling

If you prefer the Chat API tool calling interface, the [compatibility layer](/guide/chat-api#tool-calling) supports `tools` with the standard `ChatCompletionTool` format. You define tools as JSON objects instead of extending the `Tool` class, and handle tool execution yourself between requests.


---
Source: /guide/transcripts
---

# Transcripts

Transcripts let you save and restore session history, enabling persistent conversations across process restarts. The transcript records instructions, user prompts, responses and tool results as a linear history.

::: info
The **Swift** equivalent is Foundation Models' [`Transcript`](https://developer.apple.com/documentation/foundationmodels/transcript).
:::

## Entry Types

A transcript is a linear sequence of entries.

::: info
The **Swift** equivalent is [`Transcript.Entry`](https://developer.apple.com/documentation/foundationmodels/transcript).
:::

| Role | Description |
| --- | --- |
| `instructions` | Behavioral directives provided to the model when creating the session. |
| `user` | User input passed to `respond()` or `streamResponse()`. |
| `response` | Model-generated output (text, structured content, or tool calls). |
| `tool` | Results returned from executed tools. |

## Inspecting Entries

Use `entries()` to access typed transcript entries without manually parsing JSON:

```ts
const entries = session.transcript.entries();

for (const entry of entries) {
  if (entry.role === "response" && entry.contents) {
    for (const content of entry.contents) {
      if (content.type === "text") console.log(content.text);
    }
  }
}
```

Each entry has a `role` (`"instructions"`, `"user"`, `"response"`, or `"tool"`) and role-specific fields:

| Field | Roles | Description |
| --- | --- | --- |
| `contents` | all | Array of text or structured content items. |
| `tools` | `instructions` | Tool definitions registered with the session. |
| `options` | `user` | Generation options for this prompt. |
| `responseFormat` | `user` | Schema constraint for structured output. |
| `toolCalls` | `response` | Tool invocations with name and arguments. |
| `assets` | `response` | Asset references in the response. |
| `toolName` | `tool` | Name of the tool that produced this output. |
| `toolCallID` | `tool` | ID linking this output to its tool call. |

## Exporting a Transcript

Every session has a `transcript` property:

```ts
const session = new LanguageModelSession();
await session.respond("My name is Cody.");
await session.respond("I work on open source.");

// Export as JSON string
const json = session.transcript.toJson();

// Or as a dictionary object
const dict = session.transcript.toDict();
```

## Restoring a Session

Create a new session from a saved transcript:

```ts
import { Transcript, LanguageModelSession } from "tsfm-sdk";

// From JSON string
const transcript = Transcript.fromJson(json);
const resumed = LanguageModelSession.fromTranscript(transcript);

// From dictionary object
const transcript = Transcript.fromDict(dict);
const resumed = LanguageModelSession.fromTranscript(transcript);
```

The restored session has full context of the previous conversation:

```ts
const reply = await resumed.respond("What's my name?");
// The model remembers: "Your name is Cody."
```

## Full Example

```ts
// First session
const session = new LanguageModelSession();
await session.respond("My name is Cody.");
const json = session.transcript.toJson();
session.dispose();

// Later — resume from saved transcript
const resumed = LanguageModelSession.fromTranscript(Transcript.fromJson(json));
const recall = await resumed.respond("What's my name?");
console.log(recall); // References "Cody"
resumed.dispose();
```

::: warning
You must access `session.transcript` *before* calling `session.dispose()`. Transcripts are read from the native session pointer and will be lost when dispose runs.
:::


---
Source: /guide/generation-options
---

# Generation Options

Control temperature, token limits, and sampling strategy for any generation method.

::: info
The **Swift** equivalent is Foundation Models' [`GenerationOptions`](https://developer.apple.com/documentation/foundationmodels/generationoptions).
:::

## Usage

Pass `options` as part of the second argument to any generation method:

```ts
import { SamplingMode } from "tsfm-sdk";

const reply = await session.respond("Write a haiku about rain", {
  options: {
    temperature: 0.9,
    maximumResponseTokens: 100,
    sampling: SamplingMode.random({ top: 50, seed: 42 }),
  },
});
```

## Options

| Option | Type | Description |
| --- | --- | --- |
| `temperature` | `number` | Influences the confidence of the model's response. Higher values produce more varied output. Lower values produce more deterministic output. |
| `maximumResponseTokens` | `number` | Maximum tokens the model is allowed to produce. Enforcing a strict limit can lead to truncated or grammatically incorrect responses. |
| `sampling` | `SamplingMode` | Controls how the model picks tokens from its probability distribution (see below). |

## Sampling Modes

The model builds its response token by token. At each step it produces a probability distribution over its vocabulary. The sampling mode controls how a token is selected from that distribution.

::: info
The **Swift** equivalent is Foundation Models' [`SamplingMode`](https://developer.apple.com/documentation/foundationmodels/generationoptions/samplingmode).
:::

### Greedy (Most Deterministic)

Always chooses the most likely token.  The same prompt should always produce the same output.

```ts
SamplingMode.greedy()
```

### Random

Samples from a subset of likely tokens. You must choose **one** of `top` or `probabilityThreshold`, but not both. Either can be combined with `seed` for reproducibility:

| Parameter | Description |
| --- | --- |
| `top` | Pick from the K most likely tokens (fixed set). Cannot be combined with `probabilityThreshold`. Maps to Apple's `random(top:seed:)`. |
| `probabilityThreshold` | Pick from the smallest set of tokens whose probabilities sum to this threshold. Cannot be combined with `top`. Maps to Apple's `random(probabilityThreshold:seed:)`. |
| `seed` | Random seed for reproducible output. Works with either constraint. |

```ts
// Top-K: pick from the 50 most likely tokens
SamplingMode.random({ top: 50, seed: 42 })

// Top-P (nucleus): pick from the smallest set of tokens whose probabilities add up to 0.9
SamplingMode.random({ probabilityThreshold: 0.9 })
```


---
Source: /guide/error-handling
---

# Error Handling

All SDK errors extend `FoundationModelsError`. Generation-specific errors extend `GenerationError`, which itself extends `FoundationModelsError`. TSFM also adds `ServiceCrashedError` and `ToolCallError`.

::: info
The **Swift** equivalent is [`LanguageModelSession.GenerationError`](https://developer.apple.com/documentation/foundationmodels/languagemodelsession/generationerror).
:::

## Error Hierarchy

::: info FoundationModelsError
All errors inherit from `FoundationModelsError`.

**GenerationError** — errors during generation:

- `ExceededContextWindowSizeError`
- `AssetsUnavailableError`
- `GuardrailViolationError`
- `UnsupportedGuideError`
- `UnsupportedLanguageOrLocaleError`
- `DecodingFailureError`
- `RateLimitedError`
- `ConcurrentRequestsError`
- `RefusalError`
- `InvalidGenerationSchemaError`
- `ServiceCrashedError`

**ToolCallError** — a tool's `call()` method threw
:::

## Catching Errors

```ts
import {
  ExceededContextWindowSizeError,
  GuardrailViolationError,
  RateLimitedError,
} from "tsfm-sdk";

try {
  await session.respond("...");
} catch (e) {
  if (e instanceof ExceededContextWindowSizeError) {
    // Start a new session — context window is full
  } else if (e instanceof GuardrailViolationError) {
    // Content policy was triggered
  } else if (e instanceof RateLimitedError) {
    // Too many requests — wait and retry
  }
}
```

## Error Reference

### ExceededContextWindowSizeError

The session's accumulated context has exceeded the model's limit. All content (instructions, prompts, responses, tool schemas, tool calls, and tool output) share one context window. Long conversations or large tool outputs will eventually hit this. Dispose the session and start a new one, optionally seeding it with a trimmed [transcript](/guide/transcripts). Apple recommends splitting large tasks across multiple sessions.

### AssetsUnavailableError

The on-device model files haven't finished downloading. This typically happens right after enabling Apple Intelligence or after a macOS update. Call `model.waitUntilAvailable()` before creating a session — it will resolve once the assets are ready.

### GuardrailViolationError

The model's safety [guardrails](/guide/model-configuration#guardrails) flagged the prompt or the generated response. With `DEFAULT` guardrails, this means unsafe content was detected and blocked. With `PERMISSIVE_CONTENT_TRANSFORMATIONS`, you should see this less often as the model will attempt to transform content instead of rejecting it. Either way, you should attempt to catch this and surface a user-friendly message.

### UnsupportedGuideError

A `GenerationGuide` on one of your schema properties isn't supported by the current model version. This can happen if you use a guide that was introduced in a newer OS version than the user is running. Check your guide types against the [guides reference](/guide/structured-output#generation-guides).

### UnsupportedLanguageOrLocaleError

The system locale or the language of the prompt isn't supported by the on-device model. Foundation Models supports a subset of languages — this error means you've hit one it can't handle.

### DecodingFailureError

The model generated output during structured generation, but it couldn't be decoded into your schema. This can happen with complex or deeply nested schemas. Simplify the schema or add more descriptive property descriptions to guide the model.

### RateLimitedError

Too many requests to the on-device model in a short window. This is an OS-level rate limit, not a network API limit. Back off and retry after a short delay.

### ConcurrentRequestsError

You called a generation method on a session that's already processing a request. The SDK serializes calls internally via `_enqueue()`, so you shouldn't normally hit this. If you do, check that you're `await`ing calls or use `session.isResponding` to check state before calling.

### RefusalError

The model declined to generate a response. This is distinct from `GuardrailViolationError` — refusal means the model chose not to answer (e.g., the prompt asks for something outside its capabilities), not that a content filter triggered.

### InvalidGenerationSchemaError

Your `GenerationSchema` is malformed or was rejected by the on-device model. Common causes: unsupported property types, conflicting guides, or schemas that are too complex for the model to constrain. Also thrown when the native layer returns a `ModelManagerError Code=1041` rejection.

### ServiceCrashedError

The Apple Intelligence background service (`generativeexperiencesd`) has crashed. This is an OS-level issue, not an SDK bug. The error message includes the restart command:

```bash
launchctl kickstart -k gui/$(id -u)/com.apple.generativeexperiencesd
```

After restarting the service, create a new session and retry.

### ToolCallError

Your tool's `call()` method threw during execution. The SDK wraps the original error with the tool name so you can identify which tool failed and why. Access the original error via `err.cause`.

## Catching All SDK Errors

```ts
import { FoundationModelsError } from "tsfm-sdk";

try {
  await session.respond("...");
} catch (e) {
  if (e instanceof FoundationModelsError) {
    console.error("SDK error:", e.message);
  }
}
```


---
Source: /guide/chat-api
---

# Chat & Responses APIs

## SDK Options

TSFM offers two ways to interact with the on-device Foundation Model:

1. The **Native SDK** covered in the rest of this guide<br>
   <small>(mostly mirrors [the original Swift FoundationModels API](https://developer.apple.com/documentation/foundationmodels))</small>
2. **Compatibility APIs** that mirror popular cloud interfaces

The `tsfm-sdk/chat` module translates familiar OpenAI-style calls into native Foundation Models operations, so you can swap in on-device Apple Intelligence with minimal code changes.

For full control over sessions, schemas, and tools, use the [native SDK](/guide/sessions) instead.

## Tradeoffs

If you use `tsfm-sdk/chat` you will lose access to some features:

- Each `create()` call builds and tears down a session (slightly higher overhead)
- No direct access to [generation guides](/guide/structured-output#generation-guides) (e.g. `anyOf`, `regex`, `range` constraints)
- No persistent sessions (you manage conversation history yourself and pass it each call)
- No direct access to the underlying [transcript](/guide/transcripts) (though you can build your own from the messages array)

## When to Use

- Migrating an existing OpenAI-based codebase to on-device inference
- Building apps that switch between cloud and on-device models
- Prototyping quickly with a familiar interface

```ts
import Client from "tsfm-sdk/chat";

const client = new Client();

// Responses API (recommended)
const response = await client.responses.create({
  model: "SystemLanguageModel",
  instructions: "You are a helpful assistant.",
  input: "What is the capital of France?",
});
console.log(response.output_text);

// Chat Completions API
const completion = await client.chat.completions.create({
  model: "SystemLanguageModel",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "What is the capital of France?" },
  ],
});
console.log(completion.choices[0].message.content);

client.close();
```

If you've used the OpenAI Node SDK or similar APIs, the interface should feel familiar. The biggest difference is that the `model` param can be omitted or set to `"SystemLanguageModel"`

## What TSFM Supports

Both APIs support the same core capabilities:

| Feature | Responses API | Chat Completions API | tsfm Support |
| --- | --- | --- | --- |
| Text generation | `input: "..."` | `messages: [...]` | Full |
| Multi-turn conversations | `input: [...]` (message array) | `messages: [...]` | Full |
| Streaming | `stream: true` | `stream: true` | Full |
| Structured output | `text: { format: { type: "json_schema" } }` | `response_format: { type: "json_schema" }` | Full |
| Tool calling | `tools: [{ type: "function", name, ... }]` | `tools: [{ type: "function", function: { name, ... } }]` | Full |
| `temperature`, `max_output_tokens` | `temperature`, `max_output_tokens` | `temperature`, `max_tokens` / `max_completion_tokens` | Full |
| `top_p`, `seed` | `top_p`, `seed` | `top_p`, `seed` | Full |
| Image/audio content | `input_image`, `input_file` | Image URLs | Not supported (warns) |
| `usage` / token counts | `usage` | `usage` | Always `null` |

---

## Responses API

The Responses-style API uses a `client.responses.create()` function with a simpler input model and richer output structure.

### Basic Usage

The simplest `responses.create()` call takes a string `input`:

```ts
const response = await client.responses.create({
  input: "What is the capital of France?",
});

// Outputs are available on the response object
console.log(response.output_text); 
```

### Instructions

In the Responses API, system instructions are a top-level parameter rather than a message role:

```ts
const response = await client.responses.create({
  instructions: "You are a concise math tutor.",
  input: "What is 2 + 2?",
});
```

### Multi-turn Conversations

For multi-turn conversations with `responses.create()`, pass an array of input items:

```ts
const response = await client.responses.create({
  instructions: "You are a math tutor.",
  input: [
    { role: "user", content: "What is 2 + 2?" },
    { role: "assistant", content: "4" },
    { role: "user", content: "Multiply that by 3" },
  ],
});
```

### Streaming

Pass `stream: true` to `responses.create()` to get a `ResponseStream` of typed events:

```ts
const stream = await client.responses.create({
  input: "Tell me a story",
  stream: true,
});

for await (const event of stream) {
  if (event.type === "response.output_text.delta") {
    process.stdout.write(event.delta);
  }
}
```

Key event types:

| Event type | Description |
| --- | --- |
| `response.created` | Response object created |
| `response.in_progress` | Generation started |
| `response.output_item.added` | New output item (message or function call) |
| `response.output_text.delta` | Text token |
| `response.output_text.done` | Full text complete |
| `response.function_call_arguments.delta` | Function arguments chunk |
| `response.function_call_arguments.done` | Full function call complete |
| `response.output_item.done` | Output item complete |
| `response.completed` | Full response complete |
| `response.incomplete` | Generation stopped early |

::: warning
When streaming structured output or tool calls, the full response is generated before any events are emitted. This is because Foundation Models uses constrained generation (a grammar that forces valid JSON), which cannot be interrupted mid-token. Plain text generation is the only mode that streams incrementally as tokens are produced.
:::

### Structured Output

The Responses API uses `text.format` with `type: "json_schema"` for structured output:

```ts
const response = await client.responses.create({
  input: "Extract: Alice is 28 and lives in Seattle",
  text: {
    format: {
      type: "json_schema",
      name: "Person",
      schema: {
        type: "object",
        properties: {
          name: { type: "string" },
          age: { type: "integer" },
          city: { type: "string" },
        },
        required: ["name", "age", "city"],
      },
    },
  },
});

const person = JSON.parse(response.output_text);
// { name: "Alice", age: 28, city: "Seattle" }
```

### Tool Calling

The Responses API uses a flat tool format with `name` and `parameters` at the top level (not nested under `function` like Chat Completions):

```ts
const response = await client.responses.create({
  input: "What's the weather in Tokyo?",
  tools: [
    {
      type: "function",
      name: "get_weather",
      description: "Get current weather for a city",
      parameters: {
        type: "object",
        properties: {
          city: { type: "string", description: "City name" },
        },
        required: ["city"],
      },
    },
  ],
});

// Check for function calls in the output
for (const item of response.output) {
  if (item.type === "function_call") {
    console.log(item.name);      // "get_weather"
    console.log(item.arguments); // '{"city":"Tokyo"}'
    console.log(item.call_id);   // "call_<uuid>" — use this to send results back
  }
}
```

### Sending Tool Results Back

Send results using `function_call_output` input items. Pass back the original `function_call` item alongside its output:

```ts
const fc = response.output.find((item) => item.type === "function_call")!;

const followUp = await client.responses.create({
  input: [
    { role: "user", content: "What's the weather in Tokyo?" },
    fc,  // pass the function_call back
    {
      type: "function_call_output",
      call_id: fc.call_id,
      output: JSON.stringify({ temp: 22, condition: "Sunny" }),
    },
  ],
  tools: [/* same tools */],
});

console.log(followUp.output_text);
// "It's currently 22°C and sunny in Tokyo."
```

### Generation Options

```ts
const response = await client.responses.create({
  input: "Write a creative haiku",
  temperature: 0.8,
  max_output_tokens: 50,
  seed: 42,
});
```

### Error Mapping

| Native error | Responses API equivalent |
| --- | --- |
| `ExceededContextWindowSizeError` | `status: "incomplete"`, `incomplete_details.reason: "max_output_tokens"` |
| `RefusalError` | Output contains `{ type: "refusal", refusal: "..." }` |
| `GuardrailViolationError` | `status: "incomplete"`, `incomplete_details.reason: "content_filter"` |
| `RateLimitedError` | Thrown as error with status `429` |

### Response Object

```ts
{
  id: "resp_...",
  object: "response",
  created_at: 1710000000,
  model: "SystemLanguageModel",
  output: [
    {
      id: "msg_...",
      type: "message",
      role: "assistant",
      status: "completed",
      content: [{ type: "output_text", text: "...", annotations: [] }]
    }
  ],
  output_text: "...",              // convenience: concatenated text
  status: "completed",            // "completed" | "failed" | "incomplete"
  error: null,
  incomplete_details: null,        // { reason: "max_output_tokens" | "content_filter" }
  instructions: "...",
  usage: null                      // not tracked
}
```

---

## Chat Completions API

The Chat Completions API uses the classic `client.chat.completions.create()` interface.

### Messages

The Chat Completions API accepts all standard message roles:

| Role | Behavior |
| --- | --- |
| `system` | Mapped to the session's `instructions`. Only the first system message becomes instructions — subsequent ones are treated as user messages with a `[System]` prefix. |
| `developer` | Same as `system`. |
| `user` | Mapped to a user transcript entry. The last user message becomes the prompt. |
| `assistant` | Mapped to a response transcript entry. Tool calls are preserved. |
| `tool` | Mapped to a user message formatted as `[Tool result for toolName]: content`. |

#### Chat: Multi-turn Conversations

Pass the full conversation history in the `messages` array. The client converts it to a native Foundation Models [transcript](/guide/transcripts) behind the scenes — each `create()` call builds a fresh session from the messages you provide.

```ts
const response = await client.chat.completions.create({
  messages: [
    { role: "system", content: "You are a math tutor." },
    { role: "user", content: "What is 2 + 2?" },
    { role: "assistant", content: "4" },
    { role: "user", content: "Multiply that by 3" },
  ],
});
```

### Chat: Streaming

Pass `stream: true` to get an async iterable of `ChatCompletionChunk` objects:

```ts
const stream = await client.chat.completions.create({
  messages: [{ role: "user", content: "Tell me a story" }],
  stream: true,
});

for await (const chunk of stream) {
  const delta = chunk.choices[0].delta.content;
  if (delta) process.stdout.write(delta);
}
```

The `Stream` object supports:

- **`for await...of`** — iterates chunks, auto-closes on completion or `break`
- **`stream.close()`** — eagerly release resources without finishing iteration
- **`stream.toReadableStream()`** — convert to a Web `ReadableStream` for HTTP responses

::: warning
Structured output and tool call responses are buffered — the model must finish constrained generation before the response is emitted. Only plain text streams token-by-token.
:::

### Chat: Structured Output

Use `response_format` with `type: "json_schema"` to get guaranteed JSON output:

```ts
const response = await client.chat.completions.create({
  messages: [{ role: "user", content: "Extract: Alice is 28 and lives in Seattle" }],
  response_format: {
    type: "json_schema",
    json_schema: {
      name: "Person",
      schema: {
        type: "object",
        properties: {
          name: { type: "string" },
          age: { type: "integer" },
          city: { type: "string" },
        },
        required: ["name", "age", "city"],
      },
    },
  },
});

const person = JSON.parse(response.choices[0].message.content!);
// { name: "Alice", age: 28, city: "Seattle" }
```

The JSON schema is converted to Apple's native generation schema format at runtime. The model uses constrained sampling to guarantee valid output — no retry or validation needed.

### Chat: Tool Calling

Define tools using the standard function tool format:

```ts
const tools = [
  {
    type: "function" as const,
    function: {
      name: "get_weather",
      description: "Get current weather for a city",
      parameters: {
        type: "object",
        properties: {
          city: { type: "string", description: "City name" },
        },
        required: ["city"],
      },
    },
  },
];

const response = await client.chat.completions.create({
  messages: [{ role: "user", content: "What's the weather in Tokyo?" }],
  tools,
});
```

When the model decides to call a tool, the response has `finish_reason: "tool_calls"` and `message.tool_calls` contains the calls:

```ts
const choice = response.choices[0];
if (choice.finish_reason === "tool_calls" && choice.message.tool_calls) {
  const call = choice.message.tool_calls[0];
  console.log(call.function.name);      // "get_weather"
  console.log(call.function.arguments); // '{"city":"Tokyo"}'
}
```

#### Chat: Sending Tool Results Back

After executing the tool, send the result back with a follow-up request that includes the full conversation:

```ts
const followUp = await client.chat.completions.create({
  messages: [
    { role: "user", content: "What's the weather in Tokyo?" },
    { role: "assistant", content: null, tool_calls: [call] },
    {
      role: "tool",
      tool_call_id: call.id,
      content: JSON.stringify({ temp: 22, condition: "Sunny" }),
    },
  ],
  tools,
});

console.log(followUp.choices[0].message.content);
// "It's currently 22°C and sunny in Tokyo."
```

::: info
Under the hood, tool calling uses structured output with a discriminated schema. The model chooses between `"text"` and `"tool_call"` as the first generated token, then fills in the tool name and arguments. Tools are suppressed when the last message is a tool result to prevent the model from re-calling the same tool.
:::

### Chat: Generation Options

| Param | Maps to |
| --- | --- |
| `temperature` | `GenerationOptions.temperature` |
| `max_tokens` / `max_completion_tokens` | `GenerationOptions.maximumResponseTokens` (`max_completion_tokens` takes priority) |
| `top_p` | `SamplingMode.random({ probabilityThreshold })` |
| `seed` | `SamplingMode.random({ seed })` |

```ts
const response = await client.chat.completions.create({
  messages: [{ role: "user", content: "Say hello" }],
  temperature: 0,
  max_tokens: 50,
  seed: 42,
});
```

## Chat Completions Error Mapping

| Native error | Chat Completions equivalent |
| --- | --- |
| `ExceededContextWindowSizeError` | `finish_reason: "length"` |
| `RefusalError` | `message.refusal` set, `content: null` |
| `GuardrailViolationError` | `finish_reason: "content_filter"` |
| `RateLimitedError` | Thrown as error with status `429` |

## Chat Completions Response Format

```ts
{
  id: "chatcmpl-...",           // Unique ID
  object: "chat.completion",    // Or "chat.completion.chunk" for streaming
  created: 1710000000,          // Unix timestamp (seconds)
  model: "SystemLanguageModel",
  choices: [{
    index: 0,
    message: {
      role: "assistant",
      content: "...",           // null when tool_calls present
      refusal: null,            // Set on RefusalError
      tool_calls: [...]         // Present when finish_reason is "tool_calls"
    },
    finish_reason: "stop"       // "stop" | "length" | "tool_calls" | "content_filter"
  }],
  usage: null,                  // Not tracked
  system_fingerprint: null
}
```

---

## Cleanup

Call `client.close()` when you're done to release the native model pointer:

```ts
const client = new Client();
// ... use client ...
client.close();
```

Each `create()` call manages its own session lifecycle internally — sessions are created from the messages array and disposed after the response completes (or after streaming finishes).

## What's Next

- [Structured Output](/guide/structured-output) — Schema-based generation with the native SDK
- [Tools](/guide/tools) — Native tool calling with the `Tool` class
- [Streaming](/guide/streaming) — Native streaming API
- [Error Handling](/guide/error-handling) — Full error reference


---
Source: /api/
---

# API Reference

Complete reference for all public exports from `tsfm`.

## Classes

| Class | Description |
| --- | --- |
| [SystemLanguageModel](/api/system-language-model) | On-device model access and availability |
| [LanguageModelSession](/api/language-model-session) | Conversation session with all generation methods |
| [GenerationSchema](/api/generation-schema) | Schema builder for structured output |
| [Tool](/api/tool) | Abstract base class for tool calling |
| [Transcript](/api/transcript) | Session history export and import |

## Types and Enums

| Export | Description |
| --- | --- |
| [GenerationOptions](/api/generation-options) | Options for temperature, tokens, sampling |
| [SamplingMode](/api/generation-options#samplingmode) | Greedy or random sampling strategies |
| [GenerationGuide](/api/generation-schema#generationguide) | Output constraints for schema properties |
| [GeneratedContent](/api/generation-schema#generatedcontent) | Structured generation result |
| [Errors](/api/errors) | Error hierarchy and error codes |

## Chat & Responses APIs

| Export | Description |
| --- | --- |
| [Client](/api/chat) | Chat-style and Responses-style API client backed by on-device Apple Intelligence |

```ts
import Client from "tsfm-sdk/chat";
```

See the [Chat & Responses API reference](/api/chat) for full type documentation.

## Installation

```ts
import {
  SystemLanguageModel,
  LanguageModelSession,
  GenerationSchema,
  GenerationGuide,
  Tool,
  Transcript,
  SamplingMode,
} from "tsfm-sdk";

// Chat API compatible interface
import Client from "tsfm-sdk/chat";
```


---
Source: /api/system-language-model
---

# SystemLanguageModel

Represents the on-device Foundation Models language model. Provides availability checking and model configuration.

## Constructor

```ts
new SystemLanguageModel(options?: {
  useCase?: SystemLanguageModelUseCase;
  guardrails?: SystemLanguageModelGuardrails;
})
```

| Parameter | Default | Description |
| --- | --- | --- |
| `useCase` | `GENERAL` | Model use case |
| `guardrails` | `DEFAULT` | Guardrail configuration |

## Methods

### `isAvailable()`

Synchronously checks if the model is ready.

```ts
isAvailable(): AvailabilityResult
```

Returns `{ available: true }` or `{ available: false, reason: SystemLanguageModelUnavailableReason }`.

### `waitUntilAvailable()`

Polls until the model is available or the timeout expires.

```ts
waitUntilAvailable(timeoutMs?: number): Promise<AvailabilityResult>
```

| Parameter | Default | Description |
| --- | --- | --- |
| `timeoutMs` | `30000` | Maximum wait time in milliseconds |

### `dispose()`

Releases the native model reference.

```ts
dispose(): void
```

## Enums

### `SystemLanguageModelUseCase`

| Value | Description |
| --- | --- |
| `GENERAL` | General-purpose generation |
| `CONTENT_TAGGING` | Classification and labeling |

### `SystemLanguageModelGuardrails`

| Value | Description |
| --- | --- |
| `DEFAULT` | Standard content safety guardrails |

### `SystemLanguageModelUnavailableReason`

| Value | Description |
| --- | --- |
| `APPLE_INTELLIGENCE_NOT_ENABLED` | Apple Intelligence is off |
| `MODEL_NOT_READY` | Model assets still downloading |
| `DEVICE_NOT_ELIGIBLE` | Hardware not supported |

## Types

### `AvailabilityResult`

```ts
type AvailabilityResult =
  | { available: true }
  | { available: false; reason: SystemLanguageModelUnavailableReason };
```


---
Source: /api/language-model-session
---

# LanguageModelSession

Manages conversation state and provides all generation methods — text, streaming, structured, and JSON Schema.

## Constructor

```ts
new LanguageModelSession(options?: {
  instructions?: string;
  model?: SystemLanguageModel;
  tools?: Tool[];
})
```

| Parameter | Default | Description |
| --- | --- | --- |
| `instructions` | `undefined` | System prompt for the session |
| `model` | Default model | A configured `SystemLanguageModel` |
| `tools` | `[]` | Tools available during generation |

## Methods

### `respond()`

Generate a text response.

```ts
respond(prompt: string, options?: {
  options?: GenerationOptions
}): Promise<string>
```

### `respondWithSchema()`

Generate structured output matching a `GenerationSchema`.

```ts
respondWithSchema(prompt: string, schema: GenerationSchema, options?: {
  options?: GenerationOptions
}): Promise<GeneratedContent>
```

Returns a [`GeneratedContent`](/api/generation-schema#generatedcontent) with typed property access.

### `respondWithJsonSchema()`

Generate structured output from a JSON Schema object.

```ts
respondWithJsonSchema(prompt: string, schema: object, options?: {
  options?: GenerationOptions
}): Promise<GeneratedContent>
```

Returns a [`GeneratedContent`](/api/generation-schema#generatedcontent) with `toObject()` for the full result.

### `streamResponse()`

Stream a response token-by-token.

```ts
streamResponse(prompt: string, options?: {
  options?: GenerationOptions
}): AsyncIterable<string>
```

Each yielded string contains only the new tokens since the last iteration.

### `cancel()`

Cancel an in-progress request. Advisory — the response may complete before cancellation takes effect.

```ts
cancel(): void
```

### `dispose()`

Release the native session. Access `transcript` before calling this.

```ts
dispose(): void
```

## Properties

### `isResponding`

```ts
readonly isResponding: boolean
```

`true` while a generation request is in progress.

### `transcript`

```ts
readonly transcript: Transcript
```

The session's conversation history. See [Transcript](/api/transcript).

## Static Methods

### `fromTranscript()`

Create a new session from a saved transcript.

```ts
static fromTranscript(transcript: Transcript, options?: {
  instructions?: string;
  model?: SystemLanguageModel;
  tools?: Tool[];
}): LanguageModelSession
```


---
Source: /api/generation-schema
---

# GenerationSchema

Builder for typed schemas that constrain structured generation output.

## Constructor

```ts
new GenerationSchema(name: string, description: string)
```

## Methods

### `property()`

Add a property to the schema. Returns `this` for chaining.

```ts
property(name: string, type: PropertyType, options?: {
  description?: string;
  guides?: GenerationGuide[];
  optional?: boolean;
}): this
```

### `toDict()`

Export the schema as a JSON Schema-compatible dictionary.

```ts
toDict(): object
```

## Types

### `PropertyType`

```ts
type PropertyType = "string" | "integer" | "number" | "boolean" | "array" | "object"
```

## GenerationGuide

Factory methods that create output constraints for schema properties.

### String Guides

```ts
GenerationGuide.anyOf(values: string[])    // enumerated values
GenerationGuide.constant(value: string)     // exact value
GenerationGuide.regex(pattern: string)      // regex pattern
```

### Numeric Guides

```ts
GenerationGuide.range(min: number, max: number)  // inclusive range
GenerationGuide.minimum(n: number)                // lower bound
GenerationGuide.maximum(n: number)                // upper bound
```

### Array Guides

```ts
GenerationGuide.count(n: number)            // exact length
GenerationGuide.minItems(n: number)         // minimum length
GenerationGuide.maxItems(n: number)         // maximum length
GenerationGuide.element(guide: GenerationGuide)  // constrain elements
```

## GeneratedContent

Returned by `respondWithSchema()` and `respondWithJsonSchema()`.

### `value()`

Extract a typed property value:

```ts
value<T>(key: string): T
```

### `toObject()`

Get the full result as a plain object:

```ts
toObject(): Record<string, unknown>
```

## GenerationSchemaProperty

Represents a single property in a schema. Created internally by `GenerationSchema.property()`.

## GuideType

Enum of guide types used internally:

| Value | Description |
| --- | --- |
| `ANY_OF` | Enumerated values |
| `CONSTANT` | Fixed value |
| `RANGE` | Numeric range |
| `MINIMUM` | Lower bound |
| `MAXIMUM` | Upper bound |
| `REGEX` | Pattern match |
| `COUNT` | Exact array length |
| `MIN_ITEMS` | Minimum array length |
| `MAX_ITEMS` | Maximum array length |
| `ELEMENT` | Element constraint |


---
Source: /api/generation-options
---

# GenerationOptions

Options that control generation behavior across all response methods.

## Interface

```ts
interface GenerationOptions {
  temperature?: number;
  maximumResponseTokens?: number;
  sampling?: SamplingMode;
}
```

| Property | Type | Description |
| --- | --- | --- |
| `temperature` | `number` | Controls randomness. Higher = more varied. |
| `maximumResponseTokens` | `number` | Max tokens in the response. |
| `sampling` | `SamplingMode` | Sampling strategy. |

## Usage

```ts
await session.respond("prompt", {
  options: {
    temperature: 0.8,
    maximumResponseTokens: 500,
    sampling: SamplingMode.greedy(),
  },
});
```

## SamplingMode

### `SamplingMode.greedy()`

Deterministic sampling — always picks the most likely token.

```ts
static greedy(): SamplingMode
```

### `SamplingMode.random()`

Stochastic sampling with optional constraints.

```ts
static random(options?: {
  top?: number;
  seed?: number;
  probabilityThreshold?: number;
}): SamplingMode
```

| Parameter | Description |
| --- | --- |
| `top` | Top-K: only consider the K most likely tokens |
| `seed` | Random seed for reproducible output |
| `probabilityThreshold` | Top-P / nucleus: cumulative probability threshold |

### `SamplingModeType`

```ts
type SamplingModeType = "greedy" | "random"
```


---
Source: /api/tool
---

# Tool

Abstract base class for defining tools the model can call during generation.

## Abstract Members

Subclasses must implement:

```ts
abstract class Tool {
  abstract readonly name: string;
  abstract readonly description: string;
  abstract readonly argumentsSchema: GenerationSchema;
  abstract call(args: GeneratedContent): Promise<string>;
}
```

| Member | Type | Description |
| --- | --- | --- |
| `name` | `string` | Unique tool identifier |
| `description` | `string` | What the tool does (visible to the model) |
| `argumentsSchema` | `GenerationSchema` | Schema defining the tool's arguments |
| `call(args)` | `async (GeneratedContent) => string` | Handler invoked when the model calls this tool |

## Properties

### `onCall`

Optional callback fired at the start of each tool invocation, before `call()` runs. Useful for showing UI indicators (e.g. "Using tool: search") while the model waits for the tool result.

```ts
onCall?: (toolName: string) => void;
```

```ts
const tool = new WeatherTool();
tool.onCall = (name) => console.log(`Tool invoked: ${name}`);
```

## Methods

### `dispose()`

Release the native callback. Call after all sessions using this tool are done.

```ts
dispose(): void
```

## Example

```ts
import { Tool, GenerationSchema, GeneratedContent, GenerationGuide } from "tsfm-sdk";

class WeatherTool extends Tool {
  readonly name = "get_weather";
  readonly description = "Gets current weather for a city.";

  readonly argumentsSchema = new GenerationSchema("WeatherParams", "")
    .property("city", "string", { description: "City name" })
    .property("units", "string", {
      description: "Temperature units",
      guides: [GenerationGuide.anyOf(["celsius", "fahrenheit"])],
    });

  async call(args: GeneratedContent): Promise<string> {
    const city = args.value<string>("city");
    const units = args.value<string>("units");
    return `Sunny, 22°C in ${city} (${units})`;
  }
}
```

## Lifecycle

1. Create the tool instance
2. Pass to `LanguageModelSession({ tools: [tool] })`
3. The tool's callback is registered internally when the session is created
4. After all sessions are disposed, call `tool.dispose()`

Tools can be shared across multiple sessions. The native callback remains registered until `dispose()` is called.


---
Source: /api/transcript
---

# Transcript

Represents a session's conversation history. Used to export and restore sessions.

## Accessing

Every session exposes its transcript:

```ts
const transcript = session.transcript;
```

::: warning
Access the transcript before calling `session.dispose()`. The transcript reads from the native session pointer.
:::

## Methods

### `toJson()`

Export the transcript as a JSON string.

```ts
toJson(): string
```

### `toDict()`

Export the transcript as a dictionary object.

```ts
toDict(): object
```

## Static Methods

### `fromJson()`

Create a transcript from a JSON string.

```ts
static fromJson(json: string): Transcript
```

### `fromDict()`

Create a transcript from a dictionary object.

```ts
static fromDict(dict: object): Transcript
```

## Restoring a Session

```ts
const transcript = Transcript.fromJson(savedJson);
const session = LanguageModelSession.fromTranscript(transcript);
```

See [LanguageModelSession.fromTranscript()](/api/language-model-session#fromtranscript) for full options.


---
Source: /api/errors
---

# Errors

All SDK errors extend `FoundationModelsError`. Import specific error classes to handle them individually.

## Hierarchy

```
FoundationModelsError
├── GenerationError
│   ├── ExceededContextWindowSizeError
│   ├── AssetsUnavailableError
│   ├── GuardrailViolationError
│   ├── UnsupportedGuideError
│   ├── UnsupportedLanguageOrLocaleError
│   ├── DecodingFailureError
│   ├── RateLimitedError
│   ├── ConcurrentRequestsError
│   ├── RefusalError
│   └── InvalidGenerationSchemaError
└── ToolCallError
```

## Error Reference

| Error | Code | When |
| --- | --- | --- |
| `ExceededContextWindowSizeError` | 1 | Session history too long |
| `AssetsUnavailableError` | 2 | Model not downloaded |
| `GuardrailViolationError` | 3 | Content policy violation |
| `UnsupportedGuideError` | 4 | Unsupported generation guide |
| `UnsupportedLanguageOrLocaleError` | 5 | Language not supported |
| `DecodingFailureError` | 6 | Structured output parse failure |
| `RateLimitedError` | 7 | Too many requests |
| `ConcurrentRequestsError` | 8 | Session already responding |
| `RefusalError` | 9 | Model declined to answer |
| `InvalidGenerationSchemaError` | 10 | Malformed schema |
| `ToolCallError` | — | Tool's `call()` threw |

## GenerationErrorCode

Enum mapping status codes to error types:

```ts
enum GenerationErrorCode {
  SUCCESS = 0,
  EXCEEDED_CONTEXT_WINDOW_SIZE = 1,
  ASSETS_UNAVAILABLE = 2,
  GUARDRAIL_VIOLATION = 3,
  UNSUPPORTED_GUIDE = 4,
  UNSUPPORTED_LANGUAGE_OR_LOCALE = 5,
  DECODING_FAILURE = 6,
  RATE_LIMITED = 7,
  CONCURRENT_REQUESTS = 8,
  REFUSAL = 9,
  INVALID_SCHEMA = 10,
  UNKNOWN_ERROR = 255,
}
```

## Usage

```ts
import {
  FoundationModelsError,
  GenerationError,
  ExceededContextWindowSizeError,
  GuardrailViolationError,
  ToolCallError,
} from "tsfm-sdk";

try {
  await session.respond("...");
} catch (e) {
  if (e instanceof ToolCallError) {
    // Tool handler threw
  } else if (e instanceof GenerationError) {
    // Any generation error
  } else if (e instanceof FoundationModelsError) {
    // Any SDK error
  }
}
```


---
Source: /api/chat
---

# Chat & Responses API Reference

API reference for `tsfm-sdk/chat`. This module provides a compatibility layer with a Responses API and Chat Completions API backed by on-device Apple Intelligence.

```ts
import Client, { Stream, ResponseStream, MODEL_DEFAULT } from "tsfm-sdk/chat";
```

## Client

Main client class. Provides Chat-style and Responses-style API interfaces backed by on-device Apple Intelligence.

### Constructor

```ts
const client = new Client();
```

No arguments. No API key needed.

### Properties

| Property | Type | Description |
| --- | --- | --- |
| `responses` | `Responses` | Responses API endpoint |
| `chat.completions` | `Completions` | Chat Completions API endpoint |

### Methods

#### `close()`

Releases the native model pointer. Call when you're done with the client.

```ts
client.close(): void
```

---

## Responses API

### Responses

Accessed via `client.responses`. Similar to the  modern Responses API interface used by OpenAI.

#### `responses.create(params)`

Creates a response.

```ts
// Non-streaming
create(params: ResponseCreateParams & { stream?: false | null }): Promise<Response>

// Streaming
create(params: ResponseCreateParams & { stream: true }): Promise<ResponseStream>
```

---

### ResponseCreateParams

| Parameter | Type | Required | Description |
| --- | --- | --- | --- |
| `input` | `string \| ResponseInputItem[]` | Yes | Text prompt or array of input items |
| `model` | `string` | No | Ignored. Always uses on-device model. |
| `instructions` | `string` | No | System instructions |
| `stream` | `boolean` | No | Enable streaming |
| `temperature` | `number` | No | Sampling temperature |
| `max_output_tokens` | `number` | No | Maximum response tokens |
| `top_p` | `number` | No | Probability threshold for sampling |
| `seed` | `number` | No | Random seed for reproducibility |
| `tools` | `FunctionTool[]` | No | Tool definitions |
| `tool_choice` | `string \| object` | No | Accepted but ignored |
| `text` | `ResponseTextConfig` | No | Structured output configuration |

All other params (`previous_response_id`, `conversation`, `store`, `truncation`, `metadata`, `reasoning`, etc.) are accepted but ignored with a runtime warning.

---

### Input Types

#### ResponseInputItem

```ts
type ResponseInputItem = EasyInputMessage | ResponseFunctionToolCall | FunctionCallOutput;
```

#### EasyInputMessage

```ts
{
  role: "user" | "assistant" | "system" | "developer";
  content: string | ResponseInputContent[];
  type?: "message";
}
```

#### ResponseFunctionToolCall

Passed back to continue a conversation after a function call:

```ts
{
  type: "function_call";
  name: string;
  arguments: string;
  call_id: string;
  status?: "in_progress" | "completed" | "incomplete";
}
```

#### FunctionCallOutput

Provides the result of a function call:

```ts
{
  type: "function_call_output";
  call_id: string;
  output: string;
}
```

#### ResponseInputContent

```ts
type ResponseInputContent =
  | { type: "input_text"; text: string }
  | { type: "input_image"; image_url?: string }   // not supported
  | { type: "input_file"; file_data?: string };    // not supported
```

Only `input_text` is supported. Other types log a warning and are skipped.

---

### Tool Types (Responses API)

#### FunctionTool

Flat format — `name` and `parameters` are top-level (not nested under `function`):

```ts
{
  type: "function";
  name: string;
  parameters: Record<string, unknown> | null;  // JSON Schema
  description?: string;
  strict?: boolean | null;
}
```

---

### Structured Output (Responses API)

#### ResponseTextConfig

```ts
{ format?: ResponseFormatConfig }
```

#### ResponseFormatConfig

```ts
type ResponseFormatConfig =
  | { type: "text" }
  | { type: "json_object" }
  | {
      type: "json_schema";
      name: string;
      schema: Record<string, unknown>;
      description?: string;
      strict?: boolean | null;
    };
```

Only `json_schema` triggers constrained generation.

---

### Response Object

```ts
{
  id: string;                    // "resp_<uuid>"
  object: "response";
  created_at: number;            // Unix timestamp (seconds)
  model: string;                 // "SystemLanguageModel"
  output: ResponseOutputItem[];
  output_text: string;           // convenience: concatenated text from output messages
  status: "completed" | "failed" | "incomplete";
  error: ResponseError | null;
  incomplete_details: { reason?: "max_output_tokens" | "content_filter" } | null;
  instructions: string | null;
  metadata: Record<string, string> | null;
  temperature: number | null;
  top_p: number | null;
  max_output_tokens: number | null;
  tool_choice: "none" | "auto" | "required" | { type: "function"; name: string };
  tools: FunctionTool[];
  parallel_tool_calls: boolean;
  text: ResponseTextConfig;
  truncation: "auto" | "disabled" | null;
  usage: null;                   // not tracked
}
```

### ResponseOutputItem

```ts
type ResponseOutputItem = ResponseOutputMessage | ResponseOutputFunctionToolCall;
```

### ResponseOutputMessage

```ts
{
  id: string;
  type: "message";
  role: "assistant";
  status: "completed" | "incomplete" | "in_progress";
  content: Array<ResponseOutputText | ResponseOutputRefusal>;
}
```

### ResponseOutputText

```ts
{ type: "output_text"; text: string; annotations: unknown[] }
```

### ResponseOutputRefusal

```ts
{ type: "refusal"; refusal: string }
```

### ResponseOutputFunctionToolCall

```ts
{
  type: "function_call";
  id: string;
  call_id: string;              // use this in FunctionCallOutput
  name: string;
  arguments: string;            // JSON string
  status: "completed";
}
```

---

### ResponseStream

Async iterable wrapper for Responses API streaming events.

```ts
class ResponseStream implements AsyncIterable<ResponseStreamEvent>
```

| Method | Description |
| --- | --- |
| `[Symbol.asyncIterator]()` | Iterate events with `for await...of` |
| `close()` | Eagerly release resources |
| `toReadableStream()` | Convert to Web `ReadableStream<ResponseStreamEvent>` |

### ResponseStreamEvent

Union of all event types. See [Streaming Events](#streaming-events-reference) for the full list.

---

### Streaming Events Reference

| Event type | Key fields |
| --- | --- |
| `response.created` | `response: Response` |
| `response.in_progress` | `response: Response` |
| `response.completed` | `response: Response` |
| `response.failed` | `response: Response` |
| `response.incomplete` | `response: Response` |
| `response.output_item.added` | `item: ResponseOutputItem`, `output_index` |
| `response.output_item.done` | `item: ResponseOutputItem`, `output_index` |
| `response.content_part.added` | `part`, `item_id`, `output_index`, `content_index` |
| `response.content_part.done` | `part`, `item_id`, `output_index`, `content_index` |
| `response.output_text.delta` | `delta: string`, `item_id`, `output_index`, `content_index` |
| `response.output_text.done` | `text: string`, `item_id`, `output_index`, `content_index` |
| `response.refusal.delta` | `delta: string`, `item_id` |
| `response.refusal.done` | `refusal: string`, `item_id` |
| `response.function_call_arguments.delta` | `delta: string`, `item_id`, `output_index` |
| `response.function_call_arguments.done` | `arguments: string`, `name`, `call_id`, `item_id` |

All events include a `sequence_number` field.

---

## Chat Completions API

### Completions

Accessed via `client.chat.completions`.

#### `chat.completions.create(params)`

Creates a chat completion.

```ts
// Non-streaming
create(params: ChatCompletionCreateParams & { stream?: false | null }): Promise<ChatCompletion>

// Streaming
create(params: ChatCompletionCreateParams & { stream: true }): Promise<Stream>
```

---

## ChatCompletionCreateParams

Request parameters for `create()`.

| Parameter | Type | Required | Description |
| --- | --- | --- | --- |
| `messages` | `ChatCompletionMessageParam[]` | Yes | Conversation messages |
| `model` | `string` | No | Ignored. Always uses on-device model. |
| `stream` | `boolean` | No | Enable streaming |
| `temperature` | `number` | No | Sampling temperature |
| `max_tokens` | `number` | No | Maximum response tokens |
| `max_completion_tokens` | `number` | No | Same as `max_tokens` (takes priority) |
| `top_p` | `number` | No | Probability threshold for sampling |
| `seed` | `number` | No | Random seed for reproducibility |
| `tools` | `ChatCompletionTool[]` | No | Tool definitions |
| `response_format` | `ResponseFormat` | No | Output format constraint |

All other Chat Completions parameters (`n`, `stop`, `logprobs`, `frequency_penalty`, `presence_penalty`, `logit_bias`, `tool_choice`, `parallel_tool_calls`, etc.) are accepted but ignored. A warning is logged at runtime for each unsupported parameter that has a non-null value.

---

## Message Types

### ChatCompletionMessageParam

Union of all message types:

```ts
type ChatCompletionMessageParam =
  | ChatCompletionSystemMessageParam
  | ChatCompletionDeveloperMessageParam
  | ChatCompletionUserMessageParam
  | ChatCompletionAssistantMessageParam
  | ChatCompletionToolMessageParam;
```

### ChatCompletionSystemMessageParam

```ts
{ role: "system"; content: string; name?: string }
```

### ChatCompletionDeveloperMessageParam

```ts
{ role: "developer"; content: string; name?: string }
```

### ChatCompletionUserMessageParam

```ts
{ role: "user"; content: string | ChatCompletionContentPart[]; name?: string }
```

### ChatCompletionAssistantMessageParam

```ts
{
  role: "assistant";
  content?: string | null;
  tool_calls?: ChatCompletionMessageToolCall[];
  refusal?: string | null;
  name?: string;
}
```

### ChatCompletionToolMessageParam

```ts
{ role: "tool"; content: string; tool_call_id: string }
```

### ChatCompletionContentPart

```ts
type ChatCompletionContentPart =
  | { type: "text"; text: string }
  | { type: "image_url"; image_url: { url: string } }
  | { type: "input_audio"; input_audio: { data: string; format: string } }
  | { type: "file"; file: { file_data: string; filename: string } }
  | { type: "refusal"; refusal: string };
```

Only `text` parts are supported. Other content types log a warning and are skipped.

---

## Tool Types

### ChatCompletionTool

```ts
{
  type: "function";
  function: {
    name: string;
    description?: string;
    parameters?: Record<string, unknown>;  // JSON Schema
    strict?: boolean | null;
  };
}
```

### ChatCompletionMessageToolCall

```ts
{
  id: string;          // "call_<uuid>"
  type: "function";
  function: {
    name: string;
    arguments: string; // JSON string
  };
}
```

---

## Response Format

### ResponseFormat

```ts
type ResponseFormat =
  | { type: "text" }
  | { type: "json_object" }
  | {
      type: "json_schema";
      json_schema: {
        name: string;
        description?: string;
        schema?: Record<string, unknown>;
        strict?: boolean | null;
      };
    };
```

Only `json_schema` triggers constrained generation. `text` and `json_object` are treated as plain text generation.

---

## Response Types

### ChatCompletion

```ts
{
  id: string;                    // "chatcmpl-<uuid>"
  object: "chat.completion";
  created: number;               // Unix timestamp (seconds)
  model: string;                 // "SystemLanguageModel"
  choices: ChatCompletionChoice[];
  usage: null;
  system_fingerprint: null;
}
```

### ChatCompletionChoice

```ts
{
  index: number;
  message: ChatCompletionMessage;
  finish_reason: "stop" | "length" | "tool_calls" | "content_filter";
}
```

### ChatCompletionMessage

```ts
{
  role: "assistant";
  content: string | null;
  refusal: string | null;
  tool_calls?: ChatCompletionMessageToolCall[];
}
```

---

## Streaming Types

### Stream

Async iterable wrapper with resource cleanup.

```ts
class Stream implements AsyncIterable<ChatCompletionChunk>
```

| Method | Description |
| --- | --- |
| `[Symbol.asyncIterator]()` | Iterate chunks with `for await...of` |
| `close()` | Eagerly release resources |
| `toReadableStream()` | Convert to Web `ReadableStream<ChatCompletionChunk>` |

The stream auto-closes on iteration completion, `break`, or error. A `FinalizationRegistry` ensures cleanup if the stream is abandoned without being fully consumed.

### ChatCompletionChunk

```ts
{
  id: string;
  object: "chat.completion.chunk";
  created: number;
  model: string;
  choices: ChatCompletionChunkChoice[];
  usage: null;
  system_fingerprint: null;
}
```

### ChatCompletionChunkDelta

```ts
{
  role?: "assistant";
  content?: string | null;
  tool_calls?: Array<{
    index: number;
    id?: string;
    type?: "function";
    function?: { name?: string; arguments?: string };
  }>;
  refusal?: string | null;
}
```

---

## Constants

### MODEL_DEFAULT

```ts
const MODEL_DEFAULT = "SystemLanguageModel";
```

Placeholder model identifier for the on-device foundation model. It can be omitted since only one model is available.

---

## CompatError

Error class with an HTTP-style status code, thrown for `RateLimitedError` (status 429).

```ts
class CompatError extends Error {
  status: number;
}
```


---
Source: /examples/
---

# Examples

Runnable examples demonstrating each feature of the SDK. All examples are in the [`examples/`](https://github.com/codybrom/tsfm/tree/main/examples) directory.

## Available Examples

| Example | Description |
| --- | --- |
| [Basic](/examples/basic) | Simple prompt and response |
| [Streaming](/examples/streaming) | Token-by-token streaming |
| [Structured Output](/examples/structured-output) | Typed schemas with `GenerationSchema` |
| [JSON Schema](/examples/json-schema) | JSON Schema-based structured output |
| [Tools](/examples/tools) | Tool calling with a calculator |
| [Generation Options](/examples/generation-options) | Temperature, sampling, token limits |
| [Transcripts](/examples/transcript) | Session history persistence |
| [Content Tagging](/examples/content-tagging) | Content tagging use case |
| [Chat & Responses APIs](/examples/chat-api) | Chat-style and Responses-style API interface |


---
Source: /examples/basic
---

# Basic

A simple prompt and response using `SystemLanguageModel` and `LanguageModelSession`.

<<< @/../examples/basic/basic.ts

## What This Shows

1. Create a `SystemLanguageModel` and wait for availability
2. Create a session with system instructions
3. Generate a text response with `respond()`
4. Dispose both session and model to free native resources


---
Source: /examples/streaming
---

# Streaming

Token-by-token streaming using `streamResponse()`.

<<< @/../examples/streaming/streaming.ts

## What This Shows

1. Create a session (model availability check can be skipped for brevity)
2. Use `for await...of` to iterate over response chunks
3. Each chunk contains only new tokens — write directly to stdout


---
Source: /examples/structured-output
---

# Structured Output

Generate typed objects using `GenerationSchema` and `respondWithSchema()`.

<<< @/../examples/structured-output/structured-output.ts

## What This Shows

1. Define a schema with `GenerationSchema` and typed properties
2. Use `GenerationGuide.range()` to constrain numeric values
3. Extract typed values with `content.value<T>(key)`
4. Map results to a TypeScript interface


---
Source: /examples/json-schema
---

# JSON Schema

Generate structured output using a standard JSON Schema object.

<<< @/../examples/json-schema/json-schema.ts

## What This Shows

1. Build a schema using the `GenerationSchema` builder, then export with `toDict()`
2. Pass the schema object to `respondWithJsonSchema()`
3. Get the full result as a plain object with `content.toObject()`


---
Source: /examples/tools
---

# Tools

Tool calling with a calculator that performs arithmetic operations.

<<< @/../examples/tools/tools.ts

## What This Shows

1. Extend `Tool` with `name`, `description`, `argumentsSchema`, and `call()`
2. Use `GenerationGuide.anyOf()` to constrain argument values
3. Pass tools to a session via `{ tools: [calculator] }`
4. The model decides when to call the tool and incorporates the result
5. Dispose both the session and tool when done


---
Source: /examples/generation-options
---

# Generation Options

Control temperature, sampling strategy, and token limits.

<<< @/../examples/generation-options/generation-options.ts

## What This Shows

1. Set `temperature` for creative output
2. Use `SamplingMode.random()` with top-K and a seed for reproducible randomness
3. Limit response length with `maximumResponseTokens`


---
Source: /examples/transcript
---

# Transcripts

Save and restore session history across process restarts.

<<< @/../examples/transcript/transcript.ts

## What This Shows

1. Build context in a session with multiple `respond()` calls
2. Export the transcript with `toJson()`
3. Restore a session with `Transcript.fromJson()` and `LanguageModelSession.fromTranscript()`
4. The restored session retains full conversation context


---
Source: /examples/content-tagging
---

# Content Tagging

Use the `CONTENT_TAGGING` use case for classification tasks.

<<< @/../examples/content-tagging/content-tagging.ts

## What This Shows

1. Create a model with `SystemLanguageModelUseCase.CONTENT_TAGGING`
2. Pass the model to a session
3. The model is optimized for classification rather than general chat


---
Source: /examples/chat-api
---

# Chat & Responses APIs

Examples using the Chat-style and Responses-style API interfaces at `tsfm-sdk/chat`. Both the Responses API and Chat Completions API are shown.

## Responses API

### Basic Text Generation

```ts
import Client from "tsfm-sdk/chat";

const client = new Client();

// String input — the simplest form
const response = await client.responses.create({
  instructions: "You are a helpful assistant. Be concise.",
  input: "What is the capital of France?",
});

console.log(response.output_text);
// "The capital of France is Paris."

client.close();
```

### Multi-turn Conversation

```ts
const response = await client.responses.create({
  instructions: "You are a math tutor. Be concise.",
  input: [
    { role: "user", content: "What is 2 + 2?" },
    { role: "assistant", content: "4" },
    { role: "user", content: "Multiply that by 3" },
  ],
});

console.log(response.output_text);
// "12"
```

### Streaming

```ts
const stream = await client.responses.create({
  input: "Count from 1 to 5, one per line.",
  stream: true,
});

for await (const event of stream) {
  if (event.type === "response.output_text.delta") {
    process.stdout.write(event.delta);
  }
}
console.log();
```

### Structured Output

```ts
const response = await client.responses.create({
  input: "Extract: Alice is 28 years old and lives in Seattle",
  text: {
    format: {
      type: "json_schema",
      name: "Person",
      schema: {
        type: "object",
        properties: {
          name: { type: "string" },
          age: { type: "integer" },
          city: { type: "string" },
        },
        required: ["name", "age", "city"],
      },
    },
  },
});

const person = JSON.parse(response.output_text);
console.log(person);
// { name: "Alice", age: 28, city: "Seattle" }
```

### Tool Calling

```ts
const tools = [
  {
    type: "function" as const,
    name: "get_weather",
    description: "Get current weather for a city",
    parameters: {
      type: "object",
      properties: {
        city: { type: "string", description: "City name" },
      },
      required: ["city"],
    },
  },
];

// Step 1: Model decides to call a tool
const response = await client.responses.create({
  input: "What's the weather in Tokyo?",
  tools,
});

const fc = response.output.find((item) => item.type === "function_call");

if (fc && fc.type === "function_call") {
  console.log("Tool:", fc.name);
  console.log("Args:", fc.arguments);

  // Step 2: Execute the tool and send results back
  const result = JSON.stringify({ temp: 22, condition: "Sunny" });

  const followUp = await client.responses.create({
    input: [
      { role: "user", content: "What's the weather in Tokyo?" },
      fc,  // pass the function_call back
      { type: "function_call_output", call_id: fc.call_id, output: result },
    ],
    tools,
  });

  console.log(followUp.output_text);
  // "It's currently 22°C and sunny in Tokyo."
}
```

### Generation Options

```ts
const response = await client.responses.create({
  input: "Write a creative haiku",
  temperature: 0.8,
  max_output_tokens: 50,
  seed: 42,
});

console.log(response.output_text);
```

### Handling Errors

```ts
const response = await client.responses.create({
  input: "...",
});

if (response.status === "incomplete") {
  console.log("Incomplete:", response.incomplete_details?.reason);
  // "max_output_tokens" or "content_filter"
}

// Check for refusals
for (const item of response.output) {
  if (item.type === "message") {
    for (const content of item.content) {
      if (content.type === "refusal") {
        console.log("Refused:", content.refusal);
      }
    }
  }
}
```

---

## Chat Completions API

### Chat: Basic Text Generation

```ts
import Client from "tsfm-sdk/chat";

const client = new Client();

const response = await client.chat.completions.create({
  messages: [
    { role: "system", content: "You are a helpful assistant. Be concise." },
    { role: "user", content: "What is the capital of France?" },
  ],
});

console.log(response.choices[0].message.content);
// "The capital of France is Paris."

client.close();
```

### Chat: Multi-turn Conversation

```ts
const response = await client.chat.completions.create({
  messages: [
    { role: "system", content: "You are a math tutor. Be concise." },
    { role: "user", content: "What is 2 + 2?" },
    { role: "assistant", content: "4" },
    { role: "user", content: "Multiply that by 3" },
  ],
});

console.log(response.choices[0].message.content);
// "12"
```

### Chat: Streaming

```ts
const stream = await client.chat.completions.create({
  messages: [{ role: "user", content: "Count from 1 to 5, one per line." }],
  stream: true,
});

for await (const chunk of stream) {
  const delta = chunk.choices[0].delta.content;
  if (delta) process.stdout.write(delta);
}
console.log();
```

### Chat: Structured Output

```ts
const response = await client.chat.completions.create({
  messages: [
    { role: "user", content: "Extract: Alice is 28 years old and lives in Seattle" },
  ],
  response_format: {
    type: "json_schema",
    json_schema: {
      name: "Person",
      schema: {
        type: "object",
        properties: {
          name: { type: "string" },
          age: { type: "integer" },
          city: { type: "string" },
        },
        required: ["name", "age", "city"],
      },
    },
  },
});

const person = JSON.parse(response.choices[0].message.content!);
console.log(person);
// { name: "Alice", age: 28, city: "Seattle" }
```

### Chat: Tool Calling

```ts
const tools = [
  {
    type: "function" as const,
    function: {
      name: "get_weather",
      description: "Get current weather for a city",
      parameters: {
        type: "object",
        properties: {
          city: { type: "string", description: "City name" },
        },
        required: ["city"],
      },
    },
  },
];

// Step 1: Model decides to call a tool
const response = await client.chat.completions.create({
  messages: [{ role: "user", content: "What's the weather in Tokyo?" }],
  tools,
});

const choice = response.choices[0];

if (choice.finish_reason === "tool_calls" && choice.message.tool_calls) {
  const call = choice.message.tool_calls[0];
  console.log("Tool:", call.function.name);
  console.log("Args:", call.function.arguments);

  // Step 2: Execute the tool and send results back
  const result = JSON.stringify({ temp: 22, condition: "Sunny" });

  const followUp = await client.chat.completions.create({
    messages: [
      { role: "user", content: "What's the weather in Tokyo?" },
      { role: "assistant", content: null, tool_calls: [call] },
      { role: "tool", tool_call_id: call.id, content: result },
    ],
    tools,
  });

  console.log(followUp.choices[0].message.content);
  // "It's currently 22°C and sunny in Tokyo."
}
```


---
Source: /changelog
---

# Changelog

<!--@include: ../CHANGELOG.md{3,}-->