Streaming

TSFM can stream responses token-by-token using an async iterator. The on-device model produces cumulative snapshots, and the SDK diffs them internally so you receive only the new tokens on each iteration.

INFO

The Swift equivalent is LanguageModelSession.ResponseStream.

Basic Streaming

import { LanguageModelSession } from "tsfm-sdk";

const session = new LanguageModelSession();

for await (const chunk of session.streamResponse("Tell me a joke")) {
  process.stdout.write(chunk);
}
console.log();

session.dispose();

Each chunk is a string containing only the new tokens since the last iteration.

With Options

for await (const chunk of session.streamResponse("Write a story", {
  options: { temperature: 0.8, maximumResponseTokens: 500 },
})) {
  process.stdout.write(chunk);
}

Collecting the Full Response

If you want both streaming output and the complete text:

let full = "";
for await (const chunk of session.streamResponse("Explain TypeScript")) {
  process.stdout.write(chunk);
  full += chunk;
}
console.log("\n\nFull response length:", full.length);

Chat API Streaming

If you prefer the Chat API streaming interface, the compatibility layer provides stream: true with ChatCompletionChunk objects:

import Client from "tsfm-sdk/chat";
const client = new Client();

const stream = await client.chat.completions.create({
  messages: [{ role: "user", content: "Tell me a joke" }],
  stream: true,
});

for await (const chunk of stream) {
  const delta = chunk.choices[0].delta.content;
  if (delta) process.stdout.write(delta);
}
client.close();

Cleanup

The stream reference is released automatically when iteration completes or the session is disposed. The SDK keeps the Node.js event loop alive while streaming, so the process won't exit mid-stream.

Streaming ​

Basic Streaming ​

With Options ​

Collecting the Full Response ​