iOS 26 FoundationModels: Comprehensive Swift/SwiftUI Reference

Overview

FoundationModels is Apple's framework for accessing the on-device large language model that powers Apple Intelligence. Introduced at WWDC 2025, it gives apps direct access to the same model behind Writing Tools, Smart Replies, and Mail Summaries — running entirely on-device, with no network requests and no data leaving the device.

Key characteristics:

On-device only — no cloud fallback, no API key, no latency from network round-trips
Privacy-first — all inference happens locally; Apple never sees your prompts or responses
Availability-gated — requires Apple Intelligence to be enabled; not all devices qualify
iOS 26+ only — requires iPhone 15 Pro / iPhone 15 Pro Max or later (or equivalent iPad)
Shared resource — the model serves all apps; system may rate-limit under load

What it excels at:

Text correction, normalisation, and reformatting
Entity extraction and classification
Summarisation of short-to-medium content
Structured output generation (via guided generation)
Context-aware suggestions and completions

What it is not:

A replacement for frontier models (GPT-4, Claude, Gemini) for complex reasoning
A cloud API — if the model is unavailable, there is no fallback infrastructure
A general-purpose search or retrieval system

Minimum requirements:

iOS 26.0+, iPadOS 26.0+, macOS Tahoe 26.0+
Xcode 26.0+
Device must support Apple Intelligence (iPhone 15 Pro or later)
Apple Intelligence must be enabled in Settings

Availability & Setup
SystemLanguageModel, availability cases, AnyObject? pattern
Sessions & Basic Prompting
LanguageModelSession, Instructions, Prompt, .content gotcha
Prompt Engineering for On-Device Models
What works, what doesn't, #Playground
Guided Generation (@Generable)
@Guide, constraints, PartiallyGenerated
Streaming
streamResponse(), ResponseStream, .collect()
Generation Options
temperature, SamplingMode, maximumResponseTokens
Tool Calling
Tool protocol, pre-fetch vs inject, context cost
Token Budget
tokenUsage(for:), contextSize, overflow strategies
The Transcript
Transcript.Entry, saving/resuming sessions
Failure Modes & Graceful Degradation
GenerationError, never-throws pattern
Testing
Four test categories, .disabled() on-device tests
Example Use Cases
10 concrete patterns across app domains
Quick Reference & Anti-Patterns
Cheatsheet + 10 things not to do
Context Engineering
Select/inject/compress/pre-summarise
Advanced Patterns
Actor isolation, @Generable enums with associated values, Observable monitoring, PromptRepresentable chaining, bounded domain injection

Part 1: Availability & Setup

`SystemLanguageModel.default`

SystemLanguageModel.default is the singleton entry point for the on-device language model. You do not initialise it — it is a static property you reference directly. Everything in FoundationModels starts here.

let model = SystemLanguageModel.default

switch model.availability {
case .available:
    // model is ready — create a session and run prompts
case .unavailable(let reason):
    // handle the specific reason
@unknown default:
    break
}
let model = SystemLanguageModel.default

switch model.availability {
case .available:
    // model is ready — create a session and run prompts
case .unavailable(let reason):
    // handle the specific reason
@unknown default:
    break
}

SystemLanguageModel is an Observable final class, so you can observe .availability changes in SwiftUI via @State or inside .task {} blocks without any special wiring.

Availability Cases

SystemLanguageModel.Availability is a @frozen enum with two top-level cases: .available and .unavailable(UnavailableReason). Always handle @unknown default — Apple will add cases in future OS versions.

.available

The model is downloaded, Apple Intelligence is enabled, and the device is eligible. Create a LanguageModelSession and proceed.

.unavailable(.deviceNotEligible)

The hardware does not support Apple Intelligence. This applies to iPhone 14 and earlier, and equivalent iPad/Mac models. This is permanent for the lifetime of the device — no amount of waiting or retrying will change it. When you see this case, remove the AI code path from your UI entirely and show a permanent alternative experience.

.unavailable(.appleIntelligenceNotEnabled)

The device is eligible but the user has not turned on Apple Intelligence in Settings > Apple Intelligence & Siri. This is a user choice, not a hardware limitation. You can optionally prompt the user to enable it:

// Optionally deep-link to Settings
if let url = URL(string: UIApplication.openSettingsURLString) {
    UIApplication.shared.open(url)
}
// Optionally deep-link to Settings
if let url = URL(string: UIApplication.openSettingsURLString) {
    UIApplication.shared.open(url)
}

Respect the user's decision. If they choose not to enable it, show the non-AI path without nagging.

.unavailable(.modelNotReady)

This is the most misunderstood case. It does not mean the model is permanently unavailable — it means the model weights are currently downloading. There is no programmatic download API. You cannot trigger the download, request it, or track its progress. The OS manages download timing based on network conditions, battery level, device temperature, and system load. Download can take minutes to hours.

Treat .modelNotReady as a transient state. Do not show a permanent "not supported" message. Instead, show a softer "not available right now — check back later" state and retry on the next app launch or session.

func checkAvailability() -> String {
    switch SystemLanguageModel.default.availability {
    case .available:
        return "Ready"
    case .unavailable(let reason):
        switch reason {
        case .deviceNotEligible:
            return "Device not supported"
        case .appleIntelligenceNotEnabled:
            return "Enable Apple Intelligence in Settings"
        case .modelNotReady:
            return "Downloading... check back soon"
        @unknown default:
            return "Unavailable"
        }
    @unknown default:
        return "Unknown"
    }
}
func checkAvailability() -> String {
    switch SystemLanguageModel.default.availability {
    case .available:
        return "Ready"
    case .unavailable(let reason):
        switch reason {
        case .deviceNotEligible:
            return "Device not supported"
        case .appleIntelligenceNotEnabled:
            return "Enable Apple Intelligence in Settings"
        case .modelNotReady:
            return "Downloading... check back soon"
        @unknown default:
            return "Unavailable"
        }
    @unknown default:
        return "Unknown"
    }
}

`isAvailable` Convenience Property

SystemLanguageModel.default.isAvailable is a Bool shorthand. Use it when you only need to gate a code path and don't need to distinguish between unavailability reasons:

guard SystemLanguageModel.default.isAvailable else { return }
// proceed with AI code path
guard SystemLanguageModel.default.isAvailable else { return }
// proceed with AI code path

If you need to communicate why the model is unavailable to the user, use the full .availability switch instead.

`UseCase` — `.general` vs `.contentTagging`

SystemLanguageModel.UseCase selects a specialised version of the model. There are two options:

SystemLanguageModel.UseCase.general (the default for SystemLanguageModel.default) — a general-purpose model for writing assistance, analysis, correction, extraction, and summarisation. This is what you get when you use SystemLanguageModel.default.

SystemLanguageModel.UseCase.contentTagging — specialised for classification and extraction tasks. When you use this model, it always responds with tags — it is tuned to identify topics, emotions, actions, and objects. Use this when you want to categorise or label content rather than transform or generate it.

// General model (default — used for most tasks)
let model = SystemLanguageModel.default

// Content tagging model — for classification/extraction
let taggingModel = SystemLanguageModel(useCase: .contentTagging)
let session = LanguageModelSession(model: taggingModel)
// General model (default — used for most tasks)
let model = SystemLanguageModel.default

// Content tagging model — for classification/extraction
let taggingModel = SystemLanguageModel(useCase: .contentTagging)
let session = LanguageModelSession(model: taggingModel)

Do not use .contentTagging for text correction or generation tasks. The model will produce tags rather than prose, regardless of your instructions.

`Guardrails`

SystemLanguageModel.Guardrails controls content safety filtering on model inputs and outputs. There are two presets:

SystemLanguageModel.Guardrails.default — the standard setting. Blocks unsafe content in both prompts and responses. When triggered, throws LanguageModelSession.GenerationError.guardrailViolation(_:).

SystemLanguageModel.Guardrails.permissiveContentTransformations — allows potentially sensitive source material to pass through for string generation tasks. Use this when your app legitimately processes user-generated content that might incidentally contain sensitive words (e.g., a chat moderation tool, a study app covering difficult topics). This mode only applies to string output — guided generation (@Generable) always uses default guardrails.

// Default guardrails (most apps)
let model = SystemLanguageModel.default

// Permissive mode — for apps that must process sensitive source text
let model = SystemLanguageModel(guardrails: .permissiveContentTransformations)
// Default guardrails (most apps)
let model = SystemLanguageModel.default

// Permissive mode — for apps that must process sensitive source text
let model = SystemLanguageModel(guardrails: .permissiveContentTransformations)

Even in permissive mode, the model may still refuse certain content — it retains its own layer of safety separate from the guardrail system.

The `AnyObject?` Pattern for SwiftUI

iOS 26 types require @available(iOS 26, *) annotations. Annotating a @State property with @available propagates that constraint to the entire containing view struct — meaning the whole view requires iOS 26, which is likely not what you want.

The solution is to store iOS 26-only service instances as AnyObject? and cast them back inside #available guards:

// DON'T do this — @available propagates to the whole view
@available(iOS 26, *)
@State private var service: MyAI26Service?  // ❌ forces view to require iOS 26

// DO this instead — no @available constraint on the view struct
@State private var service: AnyObject?      // ✅ clean — AnyObject has no availability

// Store the service (inside a #available guard)
if #available(iOS 26, *) {
    self.service = MyAI26Service()
}

// Use the service (inside a #available guard)
if #available(iOS 26, *),
   let s = self.service as? MyAI26Service {
    let result = try await s.process(text)
}
// DON'T do this — @available propagates to the whole view
@available(iOS 26, *)
@State private var service: MyAI26Service?  // ❌ forces view to require iOS 26

// DO this instead — no @available constraint on the view struct
@State private var service: AnyObject?      // ✅ clean — AnyObject has no availability

// Store the service (inside a #available guard)
if #available(iOS 26, *) {
    self.service = MyAI26Service()
}

// Use the service (inside a #available guard)
if #available(iOS 26, *),
   let s = self.service as? MyAI26Service {
    let result = try await s.process(text)
}

This pattern lets you write a single view struct that gracefully degrades on older OS versions without any @available annotation on the view itself.

Part 2: Sessions & Basic Prompting

`LanguageModelSession` — Init Variants

LanguageModelSession is the object you interact with to send prompts and receive responses. The two most common init patterns are:

Fresh session (most common):

// With builder-style instructions
let session = LanguageModelSession {
    "You are a BJJ terminology corrector."
    "Fix misrecognised terms to their canonical spellings."
}

// With a specific model
let session = LanguageModelSession(model: SystemLanguageModel.default) {
    "You are a motivational coach."
}

// With string instructions — also valid
let session = LanguageModelSession(
    instructions: "You are a code review assistant."
)
// With builder-style instructions
let session = LanguageModelSession {
    "You are a BJJ terminology corrector."
    "Fix misrecognised terms to their canonical spellings."
}

// With a specific model
let session = LanguageModelSession(model: SystemLanguageModel.default) {
    "You are a motivational coach."
}

// With string instructions — also valid
let session = LanguageModelSession(
    instructions: "You are a code review assistant."
)

Resume from transcript (multi-turn):

// Rehydrate a session from a saved transcript to continue a conversation
let session = LanguageModelSession(
    model: SystemLanguageModel.default,
    tools: [],
    transcript: savedTranscript
)
// Rehydrate a session from a saved transcript to continue a conversation
let session = LanguageModelSession(
    model: SystemLanguageModel.default,
    tools: [],
    transcript: savedTranscript
)

The session is an Observable final class. It is also Sendable, so you can safely hold a reference from a @MainActor context and call its methods from async tasks.

`Instructions`

Instructions defines the model's persona, rules, and domain — what the model is and how it behaves. Set it once at session creation. Instructions apply to every prompt in that session.

Use @InstructionsBuilder (result builder syntax) to compose instructions from multiple strings:

let instructions = Instructions {
    "You are a BJJ terminology corrector."
    "Fix misrecognised BJJ terms to their canonical spellings."
    "Common corrections: kimora→Kimura, half card→Half Guard, darce→D'Arce"
}
let instructions = Instructions {
    "You are a BJJ terminology corrector."
    "Fix misrecognised BJJ terms to their canonical spellings."
    "Common corrections: kimora→Kimura, half card→Half Guard, darce→D'Arce"
}

Or pass a plain String directly:

let session = LanguageModelSession(
    instructions: "You are a concise summariser. Respond in three sentences maximum."
)
let session = LanguageModelSession(
    instructions: "You are a concise summariser. Respond in three sentences maximum."
)

Instructions are not the user's question — that is the Prompt. Instructions define the container; the prompt fills it.

The framework injects instructions as the system-level context for the model. The model follows instructions at higher priority than prompt content, so put your constraints and rules in instructions, not in the prompt itself.

`Prompt`

Prompt is the user's input — the actual question, text, or content you want the model to process. Use @PromptBuilder for dynamic construction:

// Builder style — for dynamic prompts
let prompt = Prompt {
    "Correct this transcript: \(rawText)"
}

// String literal — also valid
let response = try await session.respond(to: "Summarise the following: \(article)")
// Builder style — for dynamic prompts
let prompt = Prompt {
    "Correct this transcript: \(rawText)"
}

// String literal — also valid
let response = try await session.respond(to: "Summarise the following: \(article)")

Prompt strings accept string interpolation. Keep prompts concise — every token in a prompt consumes context budget that competes with the response.

The Critical `.content` Gotcha

respond(to:options:) returns LanguageModelSession.Response<T>, not T directly. Response<T> is a wrapper struct. The actual generated value is at .content.

This is the single most common mistake when first using the framework:

// WRONG — response is Response<String>, not String
let text = try await session.respond(to: prompt)
print(text.uppercased())  // compile error: Response<String> has no uppercased()

// RIGHT
let response = try await session.respond(to: prompt)
let text = response.content  // String
print(text.uppercased())

// With typed guided generation
let response = try await session.respond(
    to: prompt,
    generating: MyOutputType.self
)
let value = response.content  // MyOutputType
// WRONG — response is Response<String>, not String
let text = try await session.respond(to: prompt)
print(text.uppercased())  // compile error: Response<String> has no uppercased()

// RIGHT
let response = try await session.respond(to: prompt)
let text = response.content  // String
print(text.uppercased())

// With typed guided generation
let response = try await session.respond(
    to: prompt,
    generating: MyOutputType.self
)
let value = response.content  // MyOutputType

Internalise this: respond() always returns Response<T>. Always unwrap .content before using the value.

`Response.rawContent`

response.rawContent gives you the unprocessed GeneratedContent before guided generation parsing. This is the raw structured output the model produced, before it was decoded into your @Generable type. Use it for debugging when a response fails to parse or produces unexpected values — it shows you exactly what the model generated.

Session-Per-Call vs Persistent Sessions

This is a key architectural decision. Get it right at design time.

Session-per-call — create a new LanguageModelSession for each request. No conversation history accumulates. This is the correct pattern for the vast majority of use cases: text correction, extraction, summarisation, classification, entity detection. Each request is independent.

// Session-per-call — correct for stateless tasks
func normalise(_ text: String) async throws -> String {
    let session = LanguageModelSession {
        "Fix speech-to-text errors in BJJ transcripts."
        "Corrections: kimora→Kimura, half card→Half Guard, darce→D'Arce"
    }
    let response = try await session.respond(to: Prompt { text })
    return response.content
}
// Session-per-call — correct for stateless tasks
func normalise(_ text: String) async throws -> String {
    let session = LanguageModelSession {
        "Fix speech-to-text errors in BJJ transcripts."
        "Corrections: kimora→Kimura, half card→Half Guard, darce→D'Arce"
    }
    let response = try await session.respond(to: Prompt { text })
    return response.content
}

Persistent session — keep the LanguageModelSession alive across multiple respond() calls. The session accumulates its Transcript as you go, so the model remembers previous exchanges. Use this only when the model needs that history to answer correctly — for example, a coaching chatbot where the user refers to something they said three turns ago.

// Persistent session — for multi-turn conversation
@Observable
class ChatAssistant {
    private let session = LanguageModelSession {
        "You are a BJJ coach assistant."
        "Help the user analyse and improve their game based on their training logs."
    }

    func chat(_ message: String) async throws -> String {
        let response = try await session.respond(to: Prompt { message })
        return response.content  // transcript accumulates automatically
    }
}
// Persistent session — for multi-turn conversation
@Observable
class ChatAssistant {
    private let session = LanguageModelSession {
        "You are a BJJ coach assistant."
        "Help the user analyse and improve their game based on their training logs."
    }

    func chat(_ message: String) async throws -> String {
        let response = try await session.respond(to: Prompt { message })
        return response.content  // transcript accumulates automatically
    }
}

The risk with persistent sessions: the transcript grows with each exchange and eventually hits the context window limit, throwing LanguageModelSession.GenerationError.exceededContextWindowSize. For long-running conversations, you need a strategy for trimming or summarising history. Session-per-call has no such risk.

Default to session-per-call. Only reach for persistent sessions when you have a concrete requirement for cross-turn memory.

Part 3: Prompt Engineering for On-Device Models

The On-Device Model is Smaller — This Changes Everything

The model powering FoundationModels is Apple’s private on-device LLM — not GPT-4, not Claude, not Gemini. It is significantly smaller (estimated ~3B parameters) than frontier cloud models. This is a feature, not a bug — it runs entirely on your device with sub-second latency — but it fundamentally changes how you should write prompts.

Techniques that work reliably on frontier models can actively degrade performance on the on-device model. Treat every prompt engineering heuristic you have learned from cloud models as a starting point to validate, not a rule to apply.

Principle 1: Short, Direct Instructions

Keep instructions under approximately 200 words total. Longer instructions dilute the signal — the model struggles to prioritise which parts matter most and may partially ignore sections buried deep in a long system prompt.

Every sentence in your instructions should earn its place. If you can remove a sentence without changing the model’s behaviour, remove it.

// WEAK — verbose, repetitive
let session = LanguageModelSession {
    "You are a helpful assistant specialising in Brazilian Jiu-Jitsu."
    "Your primary purpose is to help users with BJJ-related queries."
    "When you see text from speech recognition, carefully examine it."
    "Your goal is to correct any speech recognition errors in the text."
    "Please make sure to handle common BJJ terminology correctly."
}

// STRONG — dense, direct
let session = LanguageModelSession {
    "Fix speech-to-text errors in BJJ transcripts."
    "Correct misrecognised terms. Return only the corrected text."
}
// WEAK — verbose, repetitive
let session = LanguageModelSession {
    "You are a helpful assistant specialising in Brazilian Jiu-Jitsu."
    "Your primary purpose is to help users with BJJ-related queries."
    "When you see text from speech recognition, carefully examine it."
    "Your goal is to correct any speech recognition errors in the text."
    "Please make sure to handle common BJJ terminology correctly."
}

// STRONG — dense, direct
let session = LanguageModelSession {
    "Fix speech-to-text errors in BJJ transcripts."
    "Correct misrecognised terms. Return only the corrected text."
}

Principle 2: Explicit Corrections Beat Implied Inference

If you have known domain-specific misrecognitions or corrections, list them explicitly. Do not rely on the model inferring what “fix BJJ terms” means — it may not know the canonical spellings for niche vocabulary.

// WEAK — relies on the model knowing BJJ terminology
"Fix any incorrectly transcribed Brazilian Jiu-Jitsu terminology."

// STRONG — explicit correction table
let session = LanguageModelSession {
    "Fix speech-to-text errors in BJJ transcripts."
    "Common misrecognitions: kimora/kimura -> Kimura, half card/half god -> Half Guard,"
    "darce/dart -> D'Arce, rnc/arnc -> Rear Naked Choke, omoa plata -> Omoplata."
}
// WEAK — relies on the model knowing BJJ terminology
"Fix any incorrectly transcribed Brazilian Jiu-Jitsu terminology."

// STRONG — explicit correction table
let session = LanguageModelSession {
    "Fix speech-to-text errors in BJJ transcripts."
    "Common misrecognitions: kimora/kimura -> Kimura, half card/half god -> Half Guard,"
    "darce/dart -> D'Arce, rnc/arnc -> Rear Naked Choke, omoa plata -> Omoplata."
}

The on-device model does not have the deep BJJ domain knowledge that a frontier model trained on vast internet corpora might have. Make your domain knowledge explicit in the prompt rather than hoping the model already knows it.

Principle 3: Include a Domain Vocabulary in Instructions

For niche domains — BJJ, medicine, legal, finance, specialised engineering — include a vocabulary list or canonical term glossary in your instructions. This gives the model the reference it needs to make correct corrections or use correct terminology in its output.

let session = LanguageModelSession {
    "You are a BJJ transcript corrector."
    "Canonical terms: Guard, Half Guard, Mount, Back Mount, Side Control,"
    "North-South, Turtle, Closed Guard, Open Guard, De La Riva, X-Guard,"
    "Kimura, Armbar, Triangle, Rear Naked Choke, D'Arce, Anaconda,"
    "Omoplata, Heel Hook, Kneebar, Toe Hold."
    "Correct misrecognised terms to their canonical forms."
}
let session = LanguageModelSession {
    "You are a BJJ transcript corrector."
    "Canonical terms: Guard, Half Guard, Mount, Back Mount, Side Control,"
    "North-South, Turtle, Closed Guard, Open Guard, De La Riva, X-Guard,"
    "Kimura, Armbar, Triangle, Rear Naked Choke, D'Arce, Anaconda,"
    "Omoplata, Heel Hook, Kneebar, Toe Hold."
    "Correct misrecognised terms to their canonical forms."
}

This is more token-efficient than hoping for inference, and significantly more reliable.

Principle 4: One Task Per Session

Do not ask the model to perform multiple distinct tasks in one session. Correction AND summarisation AND extraction in a single prompt will produce worse results on the on-device model than running them as separate sessions.

// WEAK — three tasks in one call
let response = try await session.respond(to: Prompt {
    "Correct BJJ terms, summarise the session, and extract techniques used."
    rawText
})

// STRONG — one focused task per session
let corrected = try await correctSession.respond(to: Prompt { rawText })
let summary = try await summarySession.respond(to: Prompt { corrected.content })
let techniques = try await extractSession.respond(to: Prompt { corrected.content })
// WEAK — three tasks in one call
let response = try await session.respond(to: Prompt {
    "Correct BJJ terms, summarise the session, and extract techniques used."
    rawText
})

// STRONG — one focused task per session
let corrected = try await correctSession.respond(to: Prompt { rawText })
let summary = try await summarySession.respond(to: Prompt { corrected.content })
let techniques = try await extractSession.respond(to: Prompt { corrected.content })

The overhead of running multiple sessions is minimal compared to the reliability gain from focused, single-task prompts.

Principle 5: Avoid Chain-of-Thought Prompting

"Think step by step", "Let’s reason through this", and similar chain-of-thought prompts improve performance on large models but add noise on smaller on-device models. The model produces reasoning tokens that consume context budget without materially improving the final answer — and can sometimes cause the model to talk itself into a worse answer.

Do not use CoT prompting for on-device tasks. Give direct instructions and ask for direct output.

// WEAK — chain-of-thought on a small model
"Think step by step about what BJJ terms might have been misrecognised, then correct them."

// STRONG — direct instruction
"Correct misrecognised BJJ terms. Return only the corrected text."
// WEAK — chain-of-thought on a small model
"Think step by step about what BJJ terms might have been misrecognised, then correct them."

// STRONG — direct instruction
"Correct misrecognised BJJ terms. Return only the corrected text."

Frontier Model vs On-Device: Comparison

Technique	Frontier Model	On-Device Model
Chain-of-thought prompting	Works well ✅	Degrades performance ❌
Long, elaborate instructions	Fine ✅	Unreliable ⚠️
Implicit domain inference	Often works ✅	Unreliable for niche domains ⚠️
Explicit correction lists	Helpful ✅	Critical ✅✅
Multi-task instructions	Usually works ✅	Fails ❌
Short, direct instructions	Works ✅	Works best ✅✅
CoT / "think step by step"	Major boost ✅	Noise and overhead ❌
Few-shot examples in prompt	Works ✅	Works, watch token budget ⚠️

The `#Playground` Macro — Fast Prompt Iteration

Available from iOS 26.4+ (February 2026 Foundation Models update), the #Playground macro lets you iterate on prompts directly in Xcode without building and running the full app. Write a #Playground block in a Swift file, run it from the Xcode canvas, and see the response inline.

When you run the canvas, the output shows Input Token Count and Response Token Count separately — useful for understanding your prompt’s cost against the ~4,096 token context window estimate shown in canvas.

import FoundationModels

#Playground {
    let session = LanguageModelSession {
        "Fix BJJ transcript errors."
        "kimora -> Kimura, half card -> Half Guard, darce -> D'Arce"
    }
    let response = try await session.respond(
        to: "worked kimora from half card today, finished with darce"
    )
    response.content  // displayed in Xcode canvas
}
import FoundationModels

#Playground {
    let session = LanguageModelSession {
        "Fix BJJ transcript errors."
        "kimora -> Kimura, half card -> Half Guard, darce -> D'Arce"
    }
    let response = try await session.respond(
        to: "worked kimora from half card today, finished with darce"
    )
    response.content  // displayed in Xcode canvas
}

This is the fastest feedback loop for prompt engineering. Iterate on your instructions in the playground before wiring them into the app. Test with the exact on-device model, not a frontier proxy — behaviour differs significantly, and a prompt that works on GPT-4 may not work well on the Apple on-device model.

Part 4: Guided Generation (`@Generable`)

What `@Generable` Does

@Generable is an attached macro that synthesises Generable protocol conformance on a struct or enum. At compile time it does three things:

Generates a PartiallyGenerated associated type — a mirror of the struct where every stored property is Optional. This is the type you receive when iterating a stream mid-generation.
Infers a JSON schema from the struct's property types and any @Guide annotations. That schema drives constrained sampling, which guarantees the output is always structurally valid — no parsing, no runtime crashes from malformed responses.
Synthesises ConvertibleFromGeneratedContent and ConvertibleToGeneratedContent conformances, which handle encoding and decoding between the model's internal representation and your Swift type.

The model generates properties in the order they are declared, so put properties that should influence later ones first.

Basic Usage

@Generable
struct BookReview {
    var title: String
    var rating: Int
    var summary: String
}

let session = LanguageModelSession()
let response = try await session.respond(
    to: "Review this book: \(bookTitle)",
    generating: BookReview.self
)
let review = response.content  // BookReview — fully populated, no parsing needed
@Generable
struct BookReview {
    var title: String
    var rating: Int
    var summary: String
}

let session = LanguageModelSession()
let response = try await session.respond(
    to: "Review this book: \(bookTitle)",
    generating: BookReview.self
)
let review = response.content  // BookReview — fully populated, no parsing needed

`@Guide` — Descriptions

@Guide(description:) tells the model what a property means. Include descriptions for any property where the name alone is ambiguous. Keep them concise — long descriptions consume context and add latency.

@Generable
struct NormalisedTranscript {
    @Guide(description: "The full transcript with BJJ terms corrected and properly cased")
    var normalisedText: String

    @Guide(description: "BJJ terms found in the transcript, each in canonical form e.g. 'Kimura', 'Half Guard'")
    var extractedTerms: [String]
}
@Generable
struct NormalisedTranscript {
    @Guide(description: "The full transcript with BJJ terms corrected and properly cased")
    var normalisedText: String

    @Guide(description: "BJJ terms found in the transcript, each in canonical form e.g. 'Kimura', 'Half Guard'")
    var extractedTerms: [String]
}

You can also annotate the struct itself via @Generable(description:):

@Generable(description: "A classified support ticket with priority and routing metadata")
struct TicketClassification {
    @Guide(description: "Urgency level for routing decisions")
    var priority: Int
}
@Generable(description: "A classified support ticket with priority and routing metadata")
struct TicketClassification {
    @Guide(description: "Urgency level for routing decisions")
    var priority: Int
}

`@Guide` — Constraints with `GenerationGuide`

@Guide also accepts one or more GenerationGuide<T> values to enforce numeric bounds and array sizes. All bounds are inclusive.

@Generable
struct ProductReview {
    @Guide(description: "Star rating", .range(1...5))
    var rating: Int

    @Guide(description: "Key selling points, at most three", .maximumCount(3))
    var keyPoints: [String]

    @Guide(description: "Topics addressed, at least one", .minimumCount(1))
    var topics: [String]

    @Guide(description: "Quality score", .minimum(0), .maximum(100))
    var qualityScore: Double
}
@Generable
struct ProductReview {
    @Guide(description: "Star rating", .range(1...5))
    var rating: Int

    @Guide(description: "Key selling points, at most three", .maximumCount(3))
    var keyPoints: [String]

    @Guide(description: "Topics addressed, at least one", .minimumCount(1))
    var topics: [String]

    @Guide(description: "Quality score", .minimum(0), .maximum(100))
    var qualityScore: Double
}

Available GenerationGuide constraints:

Constraint	Applies To	Behaviour
`.range(n...m)`	Numeric types	Value must fall within the closed range (inclusive both ends)
`.minimum(n)`	Numeric types	Value must be ≥ n
`.maximum(n)`	Numeric types	Value must be ≤ n
`.minimumCount(n)`	`[T]` arrays	Array must contain ≥ n elements
`.maximumCount(n)`	`[T]` arrays	Array must contain ≤ n elements

Multiple guides can be combined on a single property as variadic arguments — .minimum(0), .maximum(100) is valid.

Enums as `@Generable` Types

Mark enums with @Generable to use them as property types inside other @Generable structs. The constrained sampler restricts output to valid case names only:

@Generable
enum Sentiment {
    case positive
    case neutral
    case negative
}

@Generable
struct MessageClassification {
    @Guide(description: "Overall tone of the message")
    var sentiment: Sentiment

    @Guide(description: "Urgency, 1 = routine, 5 = escalate immediately", .range(1...5))
    var urgency: Int
}
@Generable
enum Sentiment {
    case positive
    case neutral
    case negative
}

@Generable
struct MessageClassification {
    @Guide(description: "Overall tone of the message")
    var sentiment: Sentiment

    @Guide(description: "Urgency, 1 = routine, 5 = escalate immediately", .range(1...5))
    var urgency: Int
}

Enums with associated values are also supported — the @Generable macro ensures all associated and nested values are themselves generable.

`PartiallyGenerated` — Streaming Snapshots

Every @Generable type gets a synthesised PartiallyGenerated associated type. It is a version of the struct where all stored properties are Optional, representing work-in-progress output during streaming:

for try await snapshot in session.streamResponse(
    to: "Review: \(bookTitle)",
    generating: BookReview.self
) {
    let partial = snapshot.content  // BookReview.PartiallyGenerated
    // partial.title might be "The G..." while still generating
    // partial.rating is nil until the model has written that property
    if let title = partial.title {
        titleLabel.text = title
    }
}
// After the loop completes, collect() gives a Response<BookReview> with all properties set
for try await snapshot in session.streamResponse(
    to: "Review: \(bookTitle)",
    generating: BookReview.self
) {
    let partial = snapshot.content  // BookReview.PartiallyGenerated
    // partial.title might be "The G..." while still generating
    // partial.rating is nil until the model has written that property
    if let title = partial.title {
        titleLabel.text = title
    }
}
// After the loop completes, collect() gives a Response<BookReview> with all properties set

PartiallyGenerated is a streaming-only concern. When you call respond() (non-streaming), you receive the completed Content type directly — no optionals, no partial states to handle.

`GeneratedContent` — Untyped Escape Hatch

GeneratedContent is the framework's internal structured representation of model output. You normally never interact with it — @Generable handles encoding and decoding automatically.

When you need raw access, every Response exposes:

let response = try await session.respond(to: prompt, generating: BookReview.self)
response.content     // BookReview — your typed result
response.rawContent  // GeneratedContent — the underlying parsed value
let response = try await session.respond(to: prompt, generating: BookReview.self)
response.content     // BookReview — your typed result
response.rawContent  // GeneratedContent — the underlying parsed value

rawContent is useful for debugging when model output does not match your type. You can inspect it to see exactly what the model produced before your ConvertibleFromGeneratedContent init ran.

For fully dynamic schemas (where the type is not known at compile time), use respond(schema:) with a GenerationSchema built from DynamicGenerationSchema. The response will have Content == GeneratedContent, and you decode manually via value(_:forProperty:):

let response = try await session.respond(to: prompt, schema: schema)
let soup: String = try response.content.value(forProperty: "dailySoup")
let response = try await session.respond(to: prompt, schema: schema)
let soup: String = try response.content.value(forProperty: "dailySoup")

Independent Constructability — Critical for Testing

@Generable types must be constructable via their memberwise initialiser without running the model. This is the property that makes them unit-testable:

// Your output type
@Generable
struct NormalisedTranscript {
    @Guide(description: "Corrected transcript text")
    var normalisedText: String

    @Guide(description: "Extracted BJJ terms in canonical form")
    var extractedTerms: [String]
}

// Tests run on any machine — no Apple Intelligence required
func testNormalisationOutputType() {
    let result = NormalisedTranscript(
        normalisedText: "Worked Kimura from Half Guard",
        extractedTerms: ["Kimura", "Half Guard"]
    )
    #expect(result.normalisedText.contains("Kimura"))
    #expect(result.extractedTerms.count == 2)
}
// Your output type
@Generable
struct NormalisedTranscript {
    @Guide(description: "Corrected transcript text")
    var normalisedText: String

    @Guide(description: "Extracted BJJ terms in canonical form")
    var extractedTerms: [String]
}

// Tests run on any machine — no Apple Intelligence required
func testNormalisationOutputType() {
    let result = NormalisedTranscript(
        normalisedText: "Worked Kimura from Half Guard",
        extractedTerms: ["Kimura", "Half Guard"]
    )
    #expect(result.normalisedText.contains("Kimura"))
    #expect(result.extractedTerms.count == 2)
}

If your @Generable type has custom initialisers that depend on model output, or computed properties with side effects, you have broken this contract. Keep output types as plain data containers — structs with stored properties and no embedded behaviour.

Protocol Hierarchy

You rarely interact with these directly — @Generable wires everything up — but understanding the hierarchy helps when debugging conformance errors or writing manual implementations:

Protocol	Role
`Generable`	Synthesised by `@Generable`. Requires `PartiallyGenerated` associated type, `generationSchema`, and `ConvertibleFromGeneratedContent` init. Inherits from both `Convertible*` protocols.
`ConvertibleFromGeneratedContent`	Types constructable from model output. `Int`, `String`, `Bool`, `Float`, `Double`, `Decimal`, `Array`, enums, and `@Generable` structs all conform automatically.
`ConvertibleToGeneratedContent`	Types that can be serialised back to `GeneratedContent`. Used for tool output and prompt injection. Inherits from `PromptRepresentable`.
`PromptRepresentable`	Types that can appear inside a `@PromptBuilder` closure. `@Generable` types conform, so you can pass model output directly back as prompt input in a subsequent call.

Part 5: Streaming

The Core Decision: Stream or Not?

Use Case	Method	Reason
Live text appearing for the user (typing effect)	`streamResponse()`	User sees progress, engagement increases
Processing output programmatically	`respond()`	Simpler — no partial state handling
Background pipeline (normalisation, extraction)	`respond()`	No UI benefit; streaming increases rate-limit risk in background
Long-form generation the user is watching	`streamResponse()`	Progress feedback reduces perceived latency
Structured `@Generable` output	`respond()` preferred	Partial structs with all-Optional properties add complexity for no gain

Apple's own docs note that background tasks should use the non-streaming respond() to reduce the likelihood of encountering GenerationError.rateLimited errors.

String Streaming

let stream = session.streamResponse(to: "Summarise: \(text)")

for try await snapshot in stream {
    let partial: String = snapshot.content  // String grows with each chunk
    await MainActor.run {
        self.displayText = partial
    }
}

// Or skip the loop entirely and just collect the final result
let fullResponse = try await stream.collect()
let finalText = fullResponse.content  // String — complete
let stream = session.streamResponse(to: "Summarise: \(text)")

for try await snapshot in stream {
    let partial: String = snapshot.content  // String grows with each chunk
    await MainActor.run {
        self.displayText = partial
    }
}

// Or skip the loop entirely and just collect the final result
let fullResponse = try await stream.collect()
let finalText = fullResponse.content  // String — complete

Typed (`@Generable`) Streaming

let stream = session.streamResponse(
    to: "Review: \(text)",
    generating: BookReview.self
)

for try await snapshot in stream {
    let partial = snapshot.content  // BookReview.PartiallyGenerated
    // All properties are Optional — may be nil while the model generates earlier properties
    if let title = partial.title {
        titleLabel.text = title
    }
    if let rating = partial.rating {
        updateStars(rating)
    }
}

// Collect to receive the complete, fully-typed result
let response = try await stream.collect()
let review = response.content  // BookReview — all properties non-nil
let stream = session.streamResponse(
    to: "Review: \(text)",
    generating: BookReview.self
)

for try await snapshot in stream {
    let partial = snapshot.content  // BookReview.PartiallyGenerated
    // All properties are Optional — may be nil while the model generates earlier properties
    if let title = partial.title {
        titleLabel.text = title
    }
    if let rating = partial.rating {
        updateStars(rating)
    }
}

// Collect to receive the complete, fully-typed result
let response = try await stream.collect()
let review = response.content  // BookReview — all properties non-nil

`ResponseStream<Content>`

streamResponse() returns a ResponseStream<Content>, which is an AsyncSequence of ResponseStream.Snapshot<Content> values. The type parameter matches what you would get from the equivalent respond() call.

// Type relationships
session.streamResponse(to: prompt)
// → ResponseStream<String>

session.streamResponse(to: prompt, generating: BookReview.self)
// → ResponseStream<BookReview>

// Each snapshot during the stream:
snapshot.content
// → String (for string stream)
// → BookReview.PartiallyGenerated (for typed stream — all properties Optional)

// After .collect():
response.content
// → String (complete)
// → BookReview (complete, all properties set)
// Type relationships
session.streamResponse(to: prompt)
// → ResponseStream<String>

session.streamResponse(to: prompt, generating: BookReview.self)
// → ResponseStream<BookReview>

// Each snapshot during the stream:
snapshot.content
// → String (for string stream)
// → BookReview.PartiallyGenerated (for typed stream — all properties Optional)

// After .collect():
response.content
// → String (complete)
// → BookReview (complete, all properties set)

ResponseStream<Content> conforms to AsyncSequence, so you get the full suite of async sequence operators — map, filter, prefix, etc.

Progressive UI Update Pattern

The natural pattern for SwiftUI is to assign each snapshot directly to a @State property:

@Observable
final class SummaryViewModel {
    var generatedText = ""

    func generate(prompt: Prompt) async throws {
        let stream = session.streamResponse(to: prompt)
        for try await snapshot in stream {
            generatedText = snapshot.content  // @Observable triggers view update per chunk
        }
    }
}

// In the view:
Text(viewModel.generatedText)
    .animation(.default, value: viewModel.generatedText)
@Observable
final class SummaryViewModel {
    var generatedText = ""

    func generate(prompt: Prompt) async throws {
        let stream = session.streamResponse(to: prompt)
        for try await snapshot in stream {
            generatedText = snapshot.content  // @Observable triggers view update per chunk
        }
    }
}

// In the view:
Text(viewModel.generatedText)
    .animation(.default, value: viewModel.generatedText)

For @Generable types, update individual UI elements as their backing properties become available:

for try await snapshot in stream {
    let partial = snapshot.content  // BookReview.PartiallyGenerated
    titleLabel.text = partial.title ?? titleLabel.text  // retain last known value
    summaryLabel.text = partial.summary ?? summaryLabel.text
}
for try await snapshot in stream {
    let partial = snapshot.content  // BookReview.PartiallyGenerated
    titleLabel.text = partial.title ?? titleLabel.text  // retain last known value
    summaryLabel.text = partial.summary ?? summaryLabel.text
}

`collect()` — Streams to Full Response

collect() is an async method on ResponseStream that waits for the stream to finish and returns a complete Response<Content>:

let stream = session.streamResponse(to: prompt)

// Option A: observe snapshots AND get the final result
for try await snapshot in stream {
    updateProgressUI(snapshot.content)
}
// Stream is exhausted — collect() returns immediately since the stream is done
let finalResponse = try await stream.collect()

// Option B: skip observation, just get the final result
let finalResponse = try await stream.collect()
let stream = session.streamResponse(to: prompt)

// Option A: observe snapshots AND get the final result
for try await snapshot in stream {
    updateProgressUI(snapshot.content)
}
// Stream is exhausted — collect() returns immediately since the stream is done
let finalResponse = try await stream.collect()

// Option B: skip observation, just get the final result
let finalResponse = try await stream.collect()

If the stream finished with an error before collect() is called, collect() propagates that error. If the stream completed successfully, collect() returns immediately with the cached result.

Error Handling in Streams

Errors are thrown during iteration, not at stream creation (the stream object itself is always returned, even if the model will fail):

do {
    for try await snapshot in stream {
        // process snapshot
    }
} catch LanguageModelSession.GenerationError.rateLimited(let retryAfter) {
    // system under load — retry after the given delay
} catch LanguageModelSession.GenerationError.exceededContextWindowSize {
    // prompt + history too long — trim the input
} catch LanguageModelSession.GenerationError.guardrailViolation {
    // content flagged — show alternative UX
} catch {
    // unexpected error
}
do {
    for try await snapshot in stream {
        // process snapshot
    }
} catch LanguageModelSession.GenerationError.rateLimited(let retryAfter) {
    // system under load — retry after the given delay
} catch LanguageModelSession.GenerationError.exceededContextWindowSize {
    // prompt + history too long — trim the input
} catch LanguageModelSession.GenerationError.guardrailViolation {
    // content flagged — show alternative UX
} catch {
    // unexpected error
}

The same error types apply to streamResponse() as to respond() — the difference is only in when they surface during your async call.

Part 6: Generation Options

GenerationOptions is a struct you pass to respond() or streamResponse() to control how the model generates output. All properties are optional — omitting them leaves the model at its defaults, which are usually correct.

let options = GenerationOptions(
    temperature: 0.1,
    maximumResponseTokens: 200
)

let response = try await session.respond(to: prompt, options: options)
let options = GenerationOptions(
    temperature: 0.1,
    maximumResponseTokens: 200
)

let response = try await session.respond(to: prompt, options: options)

`temperature`

Temperature controls how "creative" or "random" the model's output is, on a scale from 0.0 to 1.0. nil (the default) lets the model use its own calibrated default, which is appropriate for most tasks.

Temperature	Behaviour	Best For
`nil`	Model default (typically ~0.7)	General use — let the model decide
`0.0–0.2`	Near-deterministic, consistent	Corrections, extraction, classification
`0.3–0.6`	Balanced	Summarisation, analysis
`0.7–1.0`	Creative, varied	Brainstorming, dialogue, story generation

The most common mistake: setting a high temperature for a correction or extraction task. If you are normalising speech-to-text errors using @Generable, you want the model to produce the same correct answer every time — not a creatively varied interpretation. Use nil or set it low.

// ❌ High temperature for a structured correction task — produces inconsistent output
let options = GenerationOptions(temperature: 0.9)
let response = try await session.respond(
    to: Prompt { rawTranscript },
    generating: NormalisedTranscript.self,
    options: options
)

// ✅ Low temperature — deterministic, reliable corrections
let options = GenerationOptions(temperature: 0.1)
// or just omit options entirely — the schema constraints already reduce variance
// ❌ High temperature for a structured correction task — produces inconsistent output
let options = GenerationOptions(temperature: 0.9)
let response = try await session.respond(
    to: Prompt { rawTranscript },
    generating: NormalisedTranscript.self,
    options: options
)

// ✅ Low temperature — deterministic, reliable corrections
let options = GenerationOptions(temperature: 0.1)
// or just omit options entirely — the schema constraints already reduce variance

For @Generable output, the constrained sampler that enforces your schema also reduces variance regardless of temperature. But setting temperature low is still good practice to signal your intent and produce maximally consistent output.

`GenerationOptions.SamplingMode`

SamplingMode gives you control over the underlying sampling algorithm. The two modes are:

.greedy — always selects the single most probable token at each step. Maximally deterministic. Best for tasks with one correct answer (grammar correction, structured extraction).

.random(temperature:) — samples from the probability distribution, with temperature scaling how broadly. This is the mode behind the temperature parameter.

// Explicit greedy sampling — maximum determinism
let options = GenerationOptions(
    sampling: .greedy
)

// Random sampling at a specific temperature
let options = GenerationOptions(
    sampling: .random(temperature: 0.7)
)
// Explicit greedy sampling — maximum determinism
let options = GenerationOptions(
    sampling: .greedy
)

// Random sampling at a specific temperature
let options = GenerationOptions(
    sampling: .random(temperature: 0.7)
)

The temperature property on GenerationOptions is a convenience shorthand for .random(temperature:). Setting temperature: 0.0 is equivalent to .greedy.

`maximumResponseTokens`

Sets an upper bound on how many tokens the model can generate in its response. Useful for:

Capping costs (on-device, this is latency rather than money) when you know responses should be short
Preventing runaway generation in summary tasks where you want concise output
Enforcing length constraints the instructions alone can't reliably enforce

// Limit to a short summary (~100 tokens ≈ ~75 words)
let options = GenerationOptions(maximumResponseTokens: 100)

let response = try await session.respond(
    to: "Summarise this training session in one paragraph: \(notes)",
    options: options
)
// Limit to a short summary (~100 tokens ≈ ~75 words)
let options = GenerationOptions(maximumResponseTokens: 100)

let response = try await session.respond(
    to: "Summarise this training session in one paragraph: \(notes)",
    options: options
)

Be careful not to set maximumResponseTokens too low for @Generable types — if the model runs out of tokens before completing your struct, it will throw GenerationError.exceededContextWindowSize.

Part 7: Tool Calling

Tools let the model call back into your Swift code to fetch data or perform actions during generation. The model autonomously decides whether and when to call a tool — you provide the definitions; it decides whether they are relevant to the current prompt.

The `Tool` Protocol

Conform to Tool to define a callable function the model can invoke:

@available(iOS 26, *)
struct CurrentDateTool: Tool {
    let name = "getCurrentDate"
    let description = "Returns today's date in ISO 8601 format (YYYY-MM-DD)."

    // Arguments the model will pass — a @Generable struct
    @Generable
    struct Arguments {
        @Guide(description: "Optional timezone identifier, e.g. 'Europe/Dublin'")
        var timezone: String?
    }

    // Return type — any PromptRepresentable (String is simplest)
    func call(arguments: Arguments) async -> String {
        let formatter = ISO8601DateFormatter()
        if let tz = arguments.timezone,
           let zone = TimeZone(identifier: tz) {
            formatter.timeZone = zone
        }
        return formatter.string(from: Date())
    }
}
@available(iOS 26, *)
struct CurrentDateTool: Tool {
    let name = "getCurrentDate"
    let description = "Returns today's date in ISO 8601 format (YYYY-MM-DD)."

    // Arguments the model will pass — a @Generable struct
    @Generable
    struct Arguments {
        @Guide(description: "Optional timezone identifier, e.g. 'Europe/Dublin'")
        var timezone: String?
    }

    // Return type — any PromptRepresentable (String is simplest)
    func call(arguments: Arguments) async -> String {
        let formatter = ISO8601DateFormatter()
        if let tz = arguments.timezone,
           let zone = TimeZone(identifier: tz) {
            formatter.timeZone = zone
        }
        return formatter.string(from: Date())
    }
}

Key constraints:

Arguments must conform to ConvertibleFromGeneratedContent. A @Generable struct is the standard approach — the macro handles conformance automatically.
Output (the return type) must conform to PromptRepresentable. String always works. @Generable types also work.
call(arguments:) is implicitly @concurrent — it runs off the main actor. Make it async if you need to do async work.

Registering Tools With a Session

Pass tools in the tools parameter when creating a session:

@available(iOS 26, *)
let session = LanguageModelSession(
    tools: [CurrentDateTool(), UserProfileTool()]
) {
    "You are a task scheduling assistant."
    "Use getCurrentDate to determine today's date before scheduling."
}

let response = try await session.respond(
    to: "Schedule a reminder for two weeks from today"
)
let text = response.content  // model called getCurrentDate internally
@available(iOS 26, *)
let session = LanguageModelSession(
    tools: [CurrentDateTool(), UserProfileTool()]
) {
    "You are a task scheduling assistant."
    "Use getCurrentDate to determine today's date before scheduling."
}

let response = try await session.respond(
    to: "Schedule a reminder for two weeks from today"
)
let text = response.content  // model called getCurrentDate internally

The model receives each tool's name, description, and the JSON schema derived from Arguments. It uses the name and description to decide whether calling the tool is relevant to the prompt. Name and description are the primary signals — write them as short, specific phrases.

How the Model Decides to Call Tools

You cannot force the model to call a specific tool. It decides autonomously based on:

Whether the tool's name and description match the intent of the prompt
Whether it already has the information it needs without a tool call
Whether the prompt semantically requires external data

The model may call zero tools (if it can answer from its knowledge), call one tool, or call multiple tools before producing its final response.

Critical Performance Insight: Pre-Fetch vs Tool

This is Apple's own guidance from the documentation, and it matters for performance:

If you ALWAYS need data from a source, inject it directly into instructions rather than defining a tool.

// ❌ Tool for data you always need — adds latency on every call
struct UserPreferencesTool: Tool { ... }

// ✅ Pre-fetch and inject — one fetch, zero tool overhead
let preferences = await loadUserPreferences()
let session = LanguageModelSession {
    "User preferences: \(preferences.serialised)"
    "Use these preferences when making recommendations."
}
// ❌ Tool for data you always need — adds latency on every call
struct UserPreferencesTool: Tool { ... }

// ✅ Pre-fetch and inject — one fetch, zero tool overhead
let preferences = await loadUserPreferences()
let session = LanguageModelSession {
    "User preferences: \(preferences.serialised)"
    "Use these preferences when making recommendations."
}

Tools have two costs:

Token cost — each tool definition (name + description + arguments schema) consumes context budget. A tool with a complex Arguments struct can cost 50–100 tokens just for its definition.
Latency cost — each tool call is a model inference round-trip: the model generates a call, your code runs, the result is injected back, the model continues. This adds meaningful latency.

Reserve tools for data that is conditionally needed — data you might need depending on what the user asks.

Context Window Cost

Define tools concisely. The model sees name + description + arguments schema for every tool, every call, whether it uses them or not.

// ❌ Verbose tool definition — each call consumes more context
struct FetchUserTrainingHistoryForTheLastSixMonthsTool: Tool {
    let name = "fetchUserTrainingHistoryForTheLastSixMonths"
    let description = "This tool fetches the complete training history of the current user for the past six calendar months, including all session notes, techniques practised, and time spent..."
    // ...
}

// ✅ Concise — same capability, fraction of the tokens
struct TrainingHistoryTool: Tool {
    let name = "getTrainingHistory"
    let description = "Returns recent training sessions with notes and techniques."
    // ...
}
// ❌ Verbose tool definition — each call consumes more context
struct FetchUserTrainingHistoryForTheLastSixMonthsTool: Tool {
    let name = "fetchUserTrainingHistoryForTheLastSixMonths"
    let description = "This tool fetches the complete training history of the current user for the past six calendar months, including all session notes, techniques practised, and time spent..."
    // ...
}

// ✅ Concise — same capability, fraction of the tokens
struct TrainingHistoryTool: Tool {
    let name = "getTrainingHistory"
    let description = "Returns recent training sessions with notes and techniques."
    // ...
}

A practical limit is 3–5 tools per session. Beyond that, the definitions alone consume a significant portion of context, leaving less room for the actual conversation.

Tool Calls in the Transcript

When the model calls a tool, it appears in the session's Transcript as two entries:

Transcript.Entry.toolCalls — the model's request(s) to call tools
Transcript.Entry.toolOutput — the results that were injected back

This is useful when debugging why the model produced a particular response — you can inspect the transcript to see exactly what tool calls were made and what data the model received. See Part 9 (The Transcript) for full Transcript coverage.

Part 8: Token Budget

The on-device model has a fixed context window shared by all inputs and outputs for a session. Understanding how that budget is consumed is essential for building reliable features — especially multi-turn conversations and tool-using sessions.

The Budget Breakdown

Every token in a session competes for the same fixed window:

Total Context Window
├── Instructions (system prompt)
├── Tool definitions (name + description + args schema × number of tools)
├── Transcript history (all previous turns)
├── Current prompt
└── Response (tokens generated)

Response tokens are not free — they come out of the same pool as input. A long system prompt and a long conversation history leave less room for both the current prompt and its response.

Measuring Token Usage

SystemLanguageModel exposes three tokenUsage(for:) overloads (added February 2026):

let model = SystemLanguageModel.default

// 1. Cost of Instructions + tool definitions
let instrUsage = try await model.tokenUsage(
    for: instructions,
    tools: [MyTool()]
)
print(instrUsage.tokenCount)  // e.g. 180

// 2. Cost of a single Prompt
let promptUsage = try await model.tokenUsage(for: prompt)
print(promptUsage.tokenCount)  // e.g. 45

// 3. Cost of a saved Transcript (conversation history)
let historyUsage = try await model.tokenUsage(for: transcript.entries)
print(historyUsage.tokenCount)  // e.g. 620
let model = SystemLanguageModel.default

// 1. Cost of Instructions + tool definitions
let instrUsage = try await model.tokenUsage(
    for: instructions,
    tools: [MyTool()]
)
print(instrUsage.tokenCount)  // e.g. 180

// 2. Cost of a single Prompt
let promptUsage = try await model.tokenUsage(for: prompt)
print(promptUsage.tokenCount)  // e.g. 45

// 3. Cost of a saved Transcript (conversation history)
let historyUsage = try await model.tokenUsage(for: transcript.entries)
print(historyUsage.tokenCount)  // e.g. 620

All three return SystemLanguageModel.TokenUsage, with a single tokenCount: Int property. Use these to profile your sessions during development rather than guessing.

The `contextSize` Property

SystemLanguageModel.contextSize returns the total context window size in tokens as an async Int. It is back-deployed to earlier OS versions via @backDeployed:

let totalWindow = await SystemLanguageModel.default.contextSize
// e.g. 4096

let available = totalWindow - instrUsage.tokenCount - historyUsage.tokenCount
print("Available for prompt + response: \(available) tokens")
let totalWindow = await SystemLanguageModel.default.contextSize
// e.g. 4096

let available = totalWindow - instrUsage.tokenCount - historyUsage.tokenCount
print("Available for prompt + response: \(available) tokens")

Use contextSize to compute headroom before sending a prompt, particularly in multi-turn sessions where history accumulates.

`GenerationError.exceededContextWindowSize`

This error is thrown when the combined input (instructions + tools + history + prompt) exceeds the context window. Handle it gracefully:

do {
    let response = try await session.respond(to: prompt)
} catch LanguageModelSession.GenerationError.exceededContextWindowSize {
    // Strategies:
    // 1. Summarise the conversation history and start a new session
    // 2. Trim the oldest transcript entries
    // 3. Remove tool definitions you don't strictly need
    // 4. Shorten the prompt
}
do {
    let response = try await session.respond(to: prompt)
} catch LanguageModelSession.GenerationError.exceededContextWindowSize {
    // Strategies:
    // 1. Summarise the conversation history and start a new session
    // 2. Trim the oldest transcript entries
    // 3. Remove tool definitions you don't strictly need
    // 4. Shorten the prompt
}

For multi-turn sessions, the most robust strategy is to detect when history is growing long and summarise it before continuing:

// When history exceeds a threshold, compress it
if historyTokenCount > contextSize / 2 {
    let summary = try await summariseHistory(session.transcript)
    // Start fresh session with summary in instructions
    session = LanguageModelSession {
        "Previous conversation summary: \(summary)"
    }
}
// When history exceeds a threshold, compress it
if historyTokenCount > contextSize / 2 {
    let summary = try await summariseHistory(session.transcript)
    // Start fresh session with summary in instructions
    session = LanguageModelSession {
        "Previous conversation summary: \(summary)"
    }
}

The `#Playground` Macro for Budget Profiling

The #Playground macro in Xcode (26.4+) shows Input Token Count and Response Token Count separately in the canvas after each run. This is the fastest way to profile token usage during development — no logging, no instrumentation, just iterate on the prompt and watch the counts update in real time.

Rules of Thumb

Content	Approximate Token Cost
1 word	~1.3 tokens
100 words	~130 tokens
1 page (250 words)	~325 tokens
Simple `@Generable` struct (2 props)	~50 tokens overhead
Tool definition (name + description + args)	~50–100 tokens
Default context window	~4,096 tokens

A 4k window sounds large but fills up quickly in multi-turn sessions with tool-heavy prompts.

Part 9: The Transcript

Transcript is the linear record of everything that has happened in a LanguageModelSession. Every turn adds entries. The transcript is how the model "remembers" previous exchanges in a multi-turn conversation.

`Transcript.Entry`

The transcript is an array of Transcript.Entry values. Each entry is one of five cases:

Entry	When It Appears
`.instructions(Transcript.Instructions)`	Session creation — the system prompt
`.prompt(Transcript.Prompt)`	Each time you call `respond()` or `streamResponse()`
`.response(Transcript.Response)`	Each model reply
`.toolCalls(Transcript.ToolCalls)`	When the model decides to invoke one or more tools
`.toolOutput(Transcript.ToolOutput)`	The result(s) returned from your tool's `call()`

A simple two-turn conversation produces this entry sequence:

.instructions  ← session setup
.prompt        ← "What's the best sweep from Half Guard?"
.response      ← "The Hip Bump Sweep is..."
.prompt        ← "How do I set it up?"
.response      ← "Start by flattening your opponent..."

A tool-calling exchange adds two extra entries per tool call:

.prompt        ← "What techniques did I drill last Tuesday?"
.toolCalls     ← [getTrainingHistory(date: "2026-02-24")]
.toolOutput    ← [{ sessions: [...] }]
.response      ← "Last Tuesday you drilled..."

Reading the Transcript

Access the current session transcript via session.transcript:

let session = LanguageModelSession { "You are a BJJ coach." }
_ = try await session.respond(to: "What is the Kimura?")
_ = try await session.respond(to: "How do I finish it from Guard?")

// Inspect the transcript
for entry in session.transcript.entries {
    switch entry {
    case .prompt(let p):
        print("User: \(p.segments.map(\.description).joined())")
    case .response(let r):
        print("Model: \(r.segments.map(\.description).joined())")
    default:
        break
    }
}
let session = LanguageModelSession { "You are a BJJ coach." }
_ = try await session.respond(to: "What is the Kimura?")
_ = try await session.respond(to: "How do I finish it from Guard?")

// Inspect the transcript
for entry in session.transcript.entries {
    switch entry {
    case .prompt(let p):
        print("User: \(p.segments.map(\.description).joined())")
    case .response(let r):
        print("Model: \(r.segments.map(\.description).joined())")
    default:
        break
    }
}

Saving and Resuming Sessions

Save the transcript to persist a conversation and resume it later — useful for a coaching assistant where the user expects the model to remember what they discussed in previous sessions:

// Save
let savedTranscript = session.transcript
// Persist to SwiftData, UserDefaults, or disk...

// Resume — new session with full history
let resumedSession = LanguageModelSession(
    model: SystemLanguageModel.default,
    tools: [],
    transcript: savedTranscript
)
// Model now has full context of the previous conversation
let response = try await resumedSession.respond(to: "Where were we?")
// Save
let savedTranscript = session.transcript
// Persist to SwiftData, UserDefaults, or disk...

// Resume — new session with full history
let resumedSession = LanguageModelSession(
    model: SystemLanguageModel.default,
    tools: [],
    transcript: savedTranscript
)
// Model now has full context of the previous conversation
let response = try await resumedSession.respond(to: "Where were we?")

The resumed session is identical in behaviour to a session that never stopped — the model sees the full entry history.

When to Use the Transcript

Use transcript accumulation when:

The model needs to refer back to something the user said earlier ("as I mentioned before...")
You are building a multi-turn chatbot or coaching assistant
Continuity across app sessions is a user-facing feature

Do NOT accumulate transcripts when:

Each call is independent (normalisation, extraction, summarisation, classification)
You are using session-per-call — there is no transcript to worry about
The task is stateless — the model does not need to "remember" anything

Unnecessary transcript accumulation wastes context budget and eventually causes GenerationError.exceededContextWindowSize. Most FoundationModels use cases do not need cross-turn memory — use session-per-call by default (see Part 2).

Part 10: Failure Modes & Graceful Degradation

FoundationModels can fail in ways that are different from a typical network API. Most failures are environmental (device eligibility, model state, system load) rather than logic errors. The right response in almost every case is graceful degradation, not throwing errors up to the UI.

`GenerationError` Cases

LanguageModelSession.GenerationError is thrown from respond() and streamResponse():

.exceededContextWindowSize

The combined input (instructions + tools + history + prompt) exceeded the context window. Solutions in order of preference:

Reduce the prompt — summarise or truncate the input text
Trim the oldest transcript entries in a multi-turn session
Remove tool definitions that aren't needed for this call
Split into multiple sessions

.rateLimited

The system is under load. The on-device model is a shared resource — all apps use the same model, and the OS rate-limits when demand is high. Handle with simple exponential backoff:

func generateWithRetry(session: LanguageModelSession, prompt: Prompt) async throws -> String {
    var delay: UInt64 = 1_000_000_000  // 1 second
    for attempt in 1...3 {
        do {
            return try await session.respond(to: prompt).content
        } catch LanguageModelSession.GenerationError.rateLimited {
            if attempt < 3 {
                try await Task.sleep(nanoseconds: delay)
                delay *= 2
            }
        }
    }
    throw LanguageModelSession.GenerationError.rateLimited  // re-throw after 3 attempts
}
func generateWithRetry(session: LanguageModelSession, prompt: Prompt) async throws -> String {
    var delay: UInt64 = 1_000_000_000  // 1 second
    for attempt in 1...3 {
        do {
            return try await session.respond(to: prompt).content
        } catch LanguageModelSession.GenerationError.rateLimited {
            if attempt < 3 {
                try await Task.sleep(nanoseconds: delay)
                delay *= 2
            }
        }
    }
    throw LanguageModelSession.GenerationError.rateLimited  // re-throw after 3 attempts
}

.guardrailViolation

The content triggered safety filtering. This can happen on the prompt (the input was flagged) or on the response (the model started generating something that triggered the filter). The error contains context on what was flagged.

.unsupportedGuide

A @Guide constraint on a @Generable type is not supported for the current model or OS version. This should not occur in production if your deployment target is correct, but handle it defensively.

LanguageModelSession.GenerationError.Refusal

When the model declines to answer a prompt, it throws a Refusal error. Refusal is special because it includes an explanation:

do {
    let response = try await session.respond(to: prompt)
} catch let refusal as LanguageModelSession.GenerationError.Refusal {
    // Get the explanation as a complete Response<String>
    let explanation = try await refusal.explanation
    print(explanation.content)  // "I can't help with that because..."

    // Or stream it
    for try await snapshot in refusal.explanationStream {
        print(snapshot.content)
    }
}
do {
    let response = try await session.respond(to: prompt)
} catch let refusal as LanguageModelSession.GenerationError.Refusal {
    // Get the explanation as a complete Response<String>
    let explanation = try await refusal.explanation
    print(explanation.content)  // "I can't help with that because..."

    // Or stream it
    for try await snapshot in refusal.explanationStream {
        print(snapshot.content)
    }
}

Production Pattern: The Never-Throws Service

The cleanest production pattern is a service method that never throws — it returns the raw input unchanged on any failure. Callers have zero error handling burden, and worst case equals current pre-AI behaviour:

@available(iOS 26, *)
final class TranscriptNormalisationService {
    func normalise(_ rawTranscript: String) async -> String {
        guard !rawTranscript.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty else {
            return rawTranscript
        }

        do {
            let session = LanguageModelSession {
                "Fix speech-to-text errors in BJJ transcripts."
                "Return only the corrected text."
            }
            let response = try await session.respond(
                to: Prompt { rawTranscript },
                generating: NormalisedTranscript.self
            )
            return response.content.normalisedText
        } catch {
            // Log the error, return the raw transcript unchanged
            GraplaLogger.data.error("Normalisation failed: \(error)")
            return rawTranscript
        }
    }
}
@available(iOS 26, *)
final class TranscriptNormalisationService {
    func normalise(_ rawTranscript: String) async -> String {
        guard !rawTranscript.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty else {
            return rawTranscript
        }

        do {
            let session = LanguageModelSession {
                "Fix speech-to-text errors in BJJ transcripts."
                "Return only the corrected text."
            }
            let response = try await session.respond(
                to: Prompt { rawTranscript },
                generating: NormalisedTranscript.self
            )
            return response.content.normalisedText
        } catch {
            // Log the error, return the raw transcript unchanged
            GraplaLogger.data.error("Normalisation failed: \(error)")
            return rawTranscript
        }
    }
}

This pattern means:

The caller always gets a String back — no try/catch required
If AI is unavailable, the app works exactly as before
Errors are logged for debugging without surfacing to the user

Additional Production Patterns

Cache availability at setup, not per-call. SystemLanguageModel.default.availability has non-trivial overhead. Check it once when the view or service initialises and store the result. Availability doesn't change mid-session.

// ❌ Checking availability on every call
func normalise(_ text: String) async -> String {
    guard SystemLanguageModel.default.isAvailable else { return text }  // overhead each time
    ...
}

// ✅ Check once, cache
final class NormalisationService {
    private let isAvailable = SystemLanguageModel.default.isAvailable

    func normalise(_ text: String) async -> String {
        guard isAvailable else { return text }
        ...
    }
}
// ❌ Checking availability on every call
func normalise(_ text: String) async -> String {
    guard SystemLanguageModel.default.isAvailable else { return text }  // overhead each time
    ...
}

// ✅ Check once, cache
final class NormalisationService {
    private let isAvailable = SystemLanguageModel.default.isAvailable

    func normalise(_ text: String) async -> String {
        guard isAvailable else { return text }
        ...
    }
}

The fallback path is production code. On the majority of devices in 2026, Apple Intelligence will not be available (older hardware, non-supported regions, disabled in settings). Your non-AI code path is not a fallback — it is the primary path for most users. Test it as thoroughly as the AI path.

Use AnyObject? for iOS 26 services in SwiftUI views. Covered in Part 1, but worth repeating: avoid @available(iOS 26, *) on @State properties. Use AnyObject? and cast inside #available guards to prevent the constraint propagating to the whole view.

Part 11: Testing

The most important insight for testing FoundationModels code: most of your test suite should never touch the model. Well-structured FoundationModels code is testable at every layer without Apple Intelligence.

The Four Test Categories

1. Output Type Tests (No Model Required)

@Generable structs are plain data containers with memberwise initialisers. You can construct them directly in tests, verify Equatable conformance, and test edge cases without the model ever running:

@Suite("NormalisedTranscript")
struct NormalisedTranscriptTests {
    @Test func construction() {
        let result = NormalisedTranscript(
            normalisedText: "Worked Kimura from Half Guard",
            extractedTerms: ["Kimura", "Half Guard"]
        )
        #expect(result.normalisedText == "Worked Kimura from Half Guard")
        #expect(result.extractedTerms.count == 2)
        #expect(result.extractedTerms.contains("Kimura"))
    }

    @Test func equatable() {
        let a = NormalisedTranscript(normalisedText: "Test", extractedTerms: [])
        let b = NormalisedTranscript(normalisedText: "Test", extractedTerms: [])
        #expect(a == b)
    }

    @Test func emptyTerms() {
        let result = NormalisedTranscript(normalisedText: "Some text", extractedTerms: [])
        #expect(result.extractedTerms.isEmpty)
    }
}
@Suite("NormalisedTranscript")
struct NormalisedTranscriptTests {
    @Test func construction() {
        let result = NormalisedTranscript(
            normalisedText: "Worked Kimura from Half Guard",
            extractedTerms: ["Kimura", "Half Guard"]
        )
        #expect(result.normalisedText == "Worked Kimura from Half Guard")
        #expect(result.extractedTerms.count == 2)
        #expect(result.extractedTerms.contains("Kimura"))
    }

    @Test func equatable() {
        let a = NormalisedTranscript(normalisedText: "Test", extractedTerms: [])
        let b = NormalisedTranscript(normalisedText: "Test", extractedTerms: [])
        #expect(a == b)
    }

    @Test func emptyTerms() {
        let result = NormalisedTranscript(normalisedText: "Some text", extractedTerms: [])
        #expect(result.extractedTerms.isEmpty)
    }
}

These tests run in CI on any machine. No simulator required.

2. Service Fallback Tests (Works on All Simulators)

Test that your service returns the raw input unchanged when the model is unavailable. The simulator never has Apple Intelligence, so this path is always exercised:

@MainActor
@Suite("TranscriptNormalisationService")
struct TranscriptNormalisationServiceTests {
    @Test func emptyTranscriptReturnsEmpty() async {
        guard #available(iOS 26, *) else { return }
        let service = TranscriptNormalisationService()
        let result = await service.normalise("")
        #expect(result.isEmpty)
    }

    @Test func whitespaceOnlyReturnsUnchanged() async {
        guard #available(iOS 26, *) else { return }
        let service = TranscriptNormalisationService()
        let result = await service.normalise("   \n  ")
        #expect(result.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty)
    }

    @Test func unavailableModelReturnsFallback() async {
        guard #available(iOS 26, *) else { return }
        // On simulator, model is unavailable — service must return raw transcript
        let service = TranscriptNormalisationService()
        let raw = "worked on kimora from half card today"
        let result = await service.normalise(raw)
        // On device: could be corrected. On simulator: must equal raw input.
        #expect(!result.isEmpty)  // just verify it doesn't crash
    }
}
@MainActor
@Suite("TranscriptNormalisationService")
struct TranscriptNormalisationServiceTests {
    @Test func emptyTranscriptReturnsEmpty() async {
        guard #available(iOS 26, *) else { return }
        let service = TranscriptNormalisationService()
        let result = await service.normalise("")
        #expect(result.isEmpty)
    }

    @Test func whitespaceOnlyReturnsUnchanged() async {
        guard #available(iOS 26, *) else { return }
        let service = TranscriptNormalisationService()
        let result = await service.normalise("   \n  ")
        #expect(result.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty)
    }

    @Test func unavailableModelReturnsFallback() async {
        guard #available(iOS 26, *) else { return }
        // On simulator, model is unavailable — service must return raw transcript
        let service = TranscriptNormalisationService()
        let raw = "worked on kimora from half card today"
        let result = await service.normalise(raw)
        // On device: could be corrected. On simulator: must equal raw input.
        #expect(!result.isEmpty)  // just verify it doesn't crash
    }
}

3. Availability Tests (Works Everywhere)

Verify your availability checking code runs without crashing. Do not assert the specific availability state — it varies by machine, OS, and whether Apple Intelligence is enabled:

@Test func availabilityCheckDoesNotCrash() {
    guard #available(iOS 26, *) else { return }
    let availability = SystemLanguageModel.default.availability
    // Just verify we get a valid state — don't assert which state
    switch availability {
    case .available:
        break  // fine
    case .unavailable:
        break  // also fine — expected on simulator
    @unknown default:
        break
    }
}
@Test func availabilityCheckDoesNotCrash() {
    guard #available(iOS 26, *) else { return }
    let availability = SystemLanguageModel.default.availability
    // Just verify we get a valid state — don't assert which state
    switch availability {
    case .available:
        break  // fine
    case .unavailable:
        break  // also fine — expected on simulator
    @unknown default:
        break
    }
}

4. On-Device Tests (Manual, `.disabled()` by Default)

Mark tests that require a real device with Apple Intelligence as .disabled(). They are skipped in CI but can be run manually on a real device:

@Test("Normalises BJJ terms on-device", .disabled("Requires device with Apple Intelligence"))
func normalisesTermsOnDevice() async throws {
    guard #available(iOS 26, *) else { return }
    let service = TranscriptNormalisationService()
    let raw = "rolled today, worked on my kimora from half card"
    let result = await service.normalise(raw)
    // On a real device with AI, these should be corrected
    #expect(result.contains("Kimura") || result.contains("kimura"))
    #expect(!result.contains("kimora"))
}
@Test("Normalises BJJ terms on-device", .disabled("Requires device with Apple Intelligence"))
func normalisesTermsOnDevice() async throws {
    guard #available(iOS 26, *) else { return }
    let service = TranscriptNormalisationService()
    let raw = "rolled today, worked on my kimora from half card"
    let result = await service.normalise(raw)
    // On a real device with AI, these should be corrected
    #expect(result.contains("Kimura") || result.contains("kimura"))
    #expect(!result.contains("kimora"))
}

To run these locally: open the test plan in Xcode, filter by the test name, and run on a connected iPhone 15 Pro or later with Apple Intelligence enabled.

Testing Checklist

Test	Runs in CI	Requires Apple Intelligence
`@Generable` type construction	✅	❌
`@Generable` equatable	✅	❌
Service empty input handling	✅	❌
Service fallback (model unavailable)	✅	❌
Availability check no-crash	✅	❌
End-to-end normalisation	❌ (manual)	✅

Aim for 100% automated coverage of everything above the model boundary. The on-device generation itself is integration-tested manually.

Part 12: Example Use Cases

These examples cover the range of tasks FoundationModels handles well. Each follows the same pattern: on-device, private, structured output, graceful fallback.

1. Sports / BJJ App — Domain-Specific Transcript Normalisation

Use case: Correct speech-to-text misrecognitions of BJJ terms before feeding into entity extraction.

Why FoundationModels: A regex can't handle "kimora" → "Kimura" contextually; a cloud API sends private training notes offsite. On-device gets both right.

@Generable
struct NormalisedTranscript {
    @Guide(description: "The transcript with BJJ terms corrected")
    var normalisedText: String

    @Guide(description: "Canonical BJJ terms extracted, e.g. ['Kimura', 'Half Guard']")
    var extractedTerms: [String]
}
@Generable
struct NormalisedTranscript {
    @Guide(description: "The transcript with BJJ terms corrected")
    var normalisedText: String

    @Guide(description: "Canonical BJJ terms extracted, e.g. ['Kimura', 'Half Guard']")
    var extractedTerms: [String]
}

Tools needed: None — pure text transformation. Session-per-call.

2. Recipe App — Ingredient Extraction From Voice

Use case: "I need some eggs, any kind of cheese, and that Italian herb" → structured shopping list.

Why FoundationModels: The model handles colloquial descriptions ("that Italian herb" → "basil"), vague quantities ("some"), and variety descriptions ("any kind of cheese") — none of which a regex can parse.

@Generable
struct Ingredient {
    @Guide(description: "Canonical ingredient name, e.g. 'basil', 'eggs'")
    var name: String

    @Guide(description: "Quantity as spoken, e.g. '2', 'some', 'a handful'")
    var quantity: String
}

@Generable
struct IngredientList {
    @Guide(description: "All ingredients mentioned", .minimumCount(1))
    var ingredients: [Ingredient]
}
@Generable
struct Ingredient {
    @Guide(description: "Canonical ingredient name, e.g. 'basil', 'eggs'")
    var name: String

    @Guide(description: "Quantity as spoken, e.g. '2', 'some', 'a handful'")
    var quantity: String
}

@Generable
struct IngredientList {
    @Guide(description: "All ingredients mentioned", .minimumCount(1))
    var ingredients: [Ingredient]
}

Tools needed: None. Session-per-call.

3. Journaling App — Private Mood Tagging

Use case: Classify a journal entry's emotional tone without sending text to a cloud service.

Why FoundationModels: Journal entries are deeply personal. On-device is the only acceptable processing option — not a preference, a product requirement.

@Generable
enum PrimaryMood {
    case joyful, content, neutral, anxious, sad, angry, reflective
}

@Generable
struct MoodAnalysis {
    @Guide(description: "The dominant emotion in the entry")
    var primaryMood: PrimaryMood

    @Guide(description: "Intensity, 1 = mild, 5 = intense", .range(1...5))
    var intensity: Int

    @Guide(description: "Key themes, up to three", .maximumCount(3))
    var themes: [String]
}
@Generable
enum PrimaryMood {
    case joyful, content, neutral, anxious, sad, angry, reflective
}

@Generable
struct MoodAnalysis {
    @Guide(description: "The dominant emotion in the entry")
    var primaryMood: PrimaryMood

    @Guide(description: "Intensity, 1 = mild, 5 = intense", .range(1...5))
    var intensity: Int

    @Guide(description: "Key themes, up to three", .maximumCount(3))
    var themes: [String]
}

Tools needed: None. Session-per-call.

4. Task Manager — Natural Language Task Parsing

Use case: "Remind me to call Mum next Tuesday afternoon" → structured task with date components and priority.

Why FoundationModels: Natural language date parsing ("next Tuesday"), intent extraction, and priority inference in a single call.

@Generable
struct ParsedTask {
    @Guide(description: "Clean task title, e.g. 'Call Mum'")
    var title: String

    @Guide(description: "Relative date reference as spoken, e.g. 'next Tuesday afternoon'")
    var dateReference: String

    @Guide(description: "Priority 1 (low) to 3 (high)", .range(1...3))
    var priority: Int
}
@Generable
struct ParsedTask {
    @Guide(description: "Clean task title, e.g. 'Call Mum'")
    var title: String

    @Guide(description: "Relative date reference as spoken, e.g. 'next Tuesday afternoon'")
    var dateReference: String

    @Guide(description: "Priority 1 (low) to 3 (high)", .range(1...3))
    var priority: Int
}

Tools needed: CurrentDateTool to anchor relative dates ("next Tuesday" needs to know what today is).

5. Fitness App — Workout Log Summarisation

Use case: After a training session, summarise a structured workout log into a human-readable weekly review.

Why FoundationModels: Summary generation from structured data into natural prose. Streaming makes it feel responsive.

// No @Generable needed — plain text output, streamed
let stream = session.streamResponse(
    to: "Summarise this week's training in 2 paragraphs: \(workoutLogJSON)"
)
for try await snapshot in stream {
    summaryView.text = snapshot.content  // live update as text generates
}
// No @Generable needed — plain text output, streamed
let stream = session.streamResponse(
    to: "Summarise this week's training in 2 paragraphs: \(workoutLogJSON)"
)
for try await snapshot in stream {
    summaryView.text = snapshot.content  // live update as text generates
}

Tools needed: None. Session-per-call. Use streamResponse() for the typing effect.

6. Developer Tool — Conventional Commit Message Generation

Use case: Given a summary of changed files and diff, generate a conventional commit message.

Why FoundationModels: Requires understanding intent from code changes — beyond simple pattern matching, but doesn't need frontier reasoning. On-device keeps source code private.

@Generable
enum CommitType {
    case feat, fix, chore, docs, refactor, test, perf
}

@Generable
struct CommitMessage {
    @Guide(description: "Conventional commit type")
    var type: CommitType

    @Guide(description: "Affected scope, e.g. 'auth', 'ui', nil if unclear")
    var scope: String?

    @Guide(description: "Imperative subject line, 72 chars max")
    var subject: String

    @Guide(description: "Optional body with context on why this change was made")
    var body: String?
}
@Generable
enum CommitType {
    case feat, fix, chore, docs, refactor, test, perf
}

@Generable
struct CommitMessage {
    @Guide(description: "Conventional commit type")
    var type: CommitType

    @Guide(description: "Affected scope, e.g. 'auth', 'ui', nil if unclear")
    var scope: String?

    @Guide(description: "Imperative subject line, 72 chars max")
    var subject: String

    @Guide(description: "Optional body with context on why this change was made")
    var body: String?
}

Tools needed: None. Session-per-call.

7. Language Learning App — Sentence Correction

Use case: Correct a learner's written sentence while preserving their intended meaning.

Why FoundationModels: Grammar correction requires semantic understanding — the model must know what the learner was trying to say. On-device matters here too: learners write embarrassing mistakes they would prefer not to send to a cloud API.

@Generable
struct CorrectedSentence {
    @Guide(description: "The corrected sentence with natural grammar")
    var correctedText: String

    @Guide(description: "Explanations of corrections made, e.g. ['Changed tense from past to present perfect']")
    var explanations: [String]

    @Guide(description: "Confidence the original meaning was preserved, 1-5", .range(1...5))
    var meaningPreservedConfidence: Int
}
@Generable
struct CorrectedSentence {
    @Guide(description: "The corrected sentence with natural grammar")
    var correctedText: String

    @Guide(description: "Explanations of corrections made, e.g. ['Changed tense from past to present perfect']")
    var explanations: [String]

    @Guide(description: "Confidence the original meaning was preserved, 1-5", .range(1...5))
    var meaningPreservedConfidence: Int
}

Tools needed: None. Session-per-call.

8. E-Commerce — Product Attribute Extraction

Use case: Extract structured attributes (colour, size, material, style) from free-text product descriptions for catalogue indexing.

Why FoundationModels: Product descriptions are unstructured prose. Structured extraction via @Generable is more robust than regex for the variety of descriptions sellers write.

@Generable
struct ProductAttributes {
    @Guide(description: "Primary colour(s), e.g. ['navy', 'white']")
    var colours: [String]

    @Guide(description: "Material, e.g. 'cotton', 'polyester blend'")
    var material: String?

    @Guide(description: "Style keywords, e.g. ['casual', 'slim-fit']", .maximumCount(5))
    var styleKeywords: [String]
}
@Generable
struct ProductAttributes {
    @Guide(description: "Primary colour(s), e.g. ['navy', 'white']")
    var colours: [String]

    @Guide(description: "Material, e.g. 'cotton', 'polyester blend'")
    var material: String?

    @Guide(description: "Style keywords, e.g. ['casual', 'slim-fit']", .maximumCount(5))
    var styleKeywords: [String]
}

Tools needed: Optional ProductCatalogTool to canonicalise values against your taxonomy.

9. Health App — Symptom Log Structuring

Use case: User dictates how they're feeling → structured symptom entry for a health log.

Why FoundationModels: Privacy is non-negotiable. Health data is the most sensitive category — on-device is not a preference, it's a product and ethical requirement.

@Generable
enum BodyArea {
    case head, chest, abdomen, back, leftArm, rightArm, leftLeg, rightLeg, general
}

@Generable
struct SymptomEntry {
    @Guide(description: "Primary affected body area")
    var bodyArea: BodyArea

    @Guide(description: "Symptom description in normalised clinical language")
    var description: String

    @Guide(description: "Severity 1 (mild) to 10 (severe)", .range(1...10))
    var severity: Int

    @Guide(description: "Duration as spoken, e.g. 'since this morning', 'two days'")
    var duration: String
}
@Generable
enum BodyArea {
    case head, chest, abdomen, back, leftArm, rightArm, leftLeg, rightLeg, general
}

@Generable
struct SymptomEntry {
    @Guide(description: "Primary affected body area")
    var bodyArea: BodyArea

    @Guide(description: "Symptom description in normalised clinical language")
    var description: String

    @Guide(description: "Severity 1 (mild) to 10 (severe)", .range(1...10))
    var severity: Int

    @Guide(description: "Duration as spoken, e.g. 'since this morning', 'two days'")
    var duration: String
}

Tools needed: None. Session-per-call.

10. Customer Support — Ticket Triage

Use case: Classify incoming support tickets by category, urgency, and sentiment to route them to the right team.

Why FoundationModels: Classification with semantic understanding. A keyword-based classifier misroutes tickets with indirect language; the model understands context.

@Generable
enum TicketCategory {
    case billing, technicalSupport, accountAccess, featureRequest, complaint, other
}

@Generable
enum CustomerSentiment {
    case positive, neutral, frustrated, angry
}

@Generable
struct TicketClassification {
    @Guide(description: "Primary support category")
    var category: TicketCategory

    @Guide(description: "Urgency 1 (low) to 5 (escalate immediately)", .range(1...5))
    var urgency: Int

    @Guide(description: "Customer emotional tone")
    var sentiment: CustomerSentiment

    @Guide(description: "One-sentence routing note for the support agent")
    var routingNote: String
}
@Generable
enum TicketCategory {
    case billing, technicalSupport, accountAccess, featureRequest, complaint, other
}

@Generable
enum CustomerSentiment {
    case positive, neutral, frustrated, angry
}

@Generable
struct TicketClassification {
    @Guide(description: "Primary support category")
    var category: TicketCategory

    @Guide(description: "Urgency 1 (low) to 5 (escalate immediately)", .range(1...5))
    var urgency: Int

    @Guide(description: "Customer emotional tone")
    var sentiment: CustomerSentiment

    @Guide(description: "One-sentence routing note for the support agent")
    var routingNote: String
}

Tools needed: Optional KnowledgeBaseTool to check if similar issues have documented resolutions before routing.

Part 13: Quick Reference & Anti-Patterns

Quick Reference

Key Types

Type	One-liner
`SystemLanguageModel`	Entry point — `SystemLanguageModel.default`
`SystemLanguageModel.Availability`	`.available` / `.unavailable(reason)`
`LanguageModelSession`	Manages one conversation thread; stateful
`Instructions`	System prompt — set once at session creation
`Prompt`	User input for a single turn
`Response<Content>`	Wrapper — always access `.content`
`ResponseStream<Content>`	`AsyncSequence` of `Snapshot<Content>`
`GenerationOptions`	`temperature`, `maximumResponseTokens`, `sampling`
`GenerationGuide<T>`	Constraints on `@Guide` properties
`Transcript`	Linear history of all session entries
`Tool`	Protocol for functions the model can call
`SystemLanguageModel.TokenUsage`	`.tokenCount` — cost of instructions/prompt/history

Session Init Cheatsheet

// Fresh session, no tools
LanguageModelSession { "Instructions here" }

// With specific model
LanguageModelSession(model: SystemLanguageModel.default) { "..." }

// With tools
LanguageModelSession(tools: [MyTool()]) { "..." }

// Resume from saved transcript
LanguageModelSession(model: .default, tools: [], transcript: savedTranscript)
// Fresh session, no tools
LanguageModelSession { "Instructions here" }

// With specific model
LanguageModelSession(model: SystemLanguageModel.default) { "..." }

// With tools
LanguageModelSession(tools: [MyTool()]) { "..." }

// Resume from saved transcript
LanguageModelSession(model: .default, tools: [], transcript: savedTranscript)

`respond()` vs `streamResponse()`

	`respond()`	`streamResponse()`
Returns	`Response<Content>`	`ResponseStream<Content>`
Best for	Background processing, pipelines	Live UI with typing effect
Partial results	No	Yes (via `Snapshot<Content>`)
Rate limit risk	Lower	Higher in background tasks
Collect to full response	N/A	`.collect()`

`@Generable` vs Raw `String`

Use @Generable when:

You need structured, typed output (multiple fields)
You want compile-time guarantees on output shape
The response must be parsed/processed programmatically
You need constraints (@Guide) on values

Use raw String when:

Output is prose for display to the user
You're summarising or generating a paragraph
Streaming the output for a typing effect

Token Budget Formula

Total = instructions + tool definitions + transcript history + prompt + response

All compete for the same fixed window (~4,096 tokens). Response tokens come out of the same pool as input.

Tool vs Pre-Fetch vs Inject

If you...	Do this
Always need the data	Pre-fetch, inject into instructions
Sometimes need the data	Define as `Tool`
Need data only when asked about it	Define as `Tool`
Have more than 5 tools	Split into multiple focused sessions

Anti-Patterns

1. Accessing response instead of response.content

respond() returns Response<T>, not T. Always unwrap .content.

let text = try await session.respond(to: prompt)  // Response<String>, not String
text.uppercased()  // ❌ compile error

let text = try await session.respond(to: prompt).content  // ✅ String
let text = try await session.respond(to: prompt)  // Response<String>, not String
text.uppercased()  // ❌ compile error

let text = try await session.respond(to: prompt).content  // ✅ String

2. Storing LanguageModelSession persistently when you don't need history

For stateless tasks (normalisation, extraction, classification), create a new session per call. Persistent sessions accumulate transcript and eventually hit the context limit.

3. Defining too many tools

Each tool definition consumes ~50–100 tokens of context budget, whether used or not. Keep it to 3–5 tools per session. If you have 10 tools, split them across multiple focused sessions.

4. Calling isAvailable or checkAvailability() per-call

Availability checking has overhead and doesn't change mid-session. Check once at service/view init and cache the result.

5. High temperature for structured/correction tasks

For @Generable types that correct or extract, use nil or temperature: 0.0–0.2. High temperature produces creatively varied — but wrong — corrections.

6. Long, elaborate instructions modelled on frontier model prompts

On a ~3B parameter model, shorter is better. Instructions over ~200 words dilute signal. Explicit rules outperform discursive descriptions.

7. Not testing the fallback path

On most devices today, Apple Intelligence is unavailable. Your non-AI code path is the primary experience for the majority of users. Test it as thoroughly as the AI path.

8. Using FoundationModels where a regex or simple function would do

If the task is a known, fixed pattern (extract a UUID, validate an email, format a date), use a deterministic function. LLM overhead — latency, availability, complexity — is waste for these cases.

9. Propagating @available(iOS 26, *) to SwiftUI views

Adding @available to a @State property forces the whole view to require iOS 26. Use the AnyObject? pattern instead and cast inside #available guards.

10. Treating .modelNotReady as permanent

.modelNotReady means the model is downloading. It's transient. Show "not available right now" UI and retry later. Do not show a permanent "unsupported" state for this case.

Part 14: Context Engineering for On-Device AI

The context window is the most important constraint in FoundationModels. Everything else — prompt engineering, temperature, tool design — happens within it. Understanding how to engineer what goes into that window is the difference between a feature that works reliably and one that fails silently on complex inputs.

The Fundamental Constraint

The on-device model has a fixed context window of approximately 4,096 tokens shared across:

instructions + tool definitions + transcript history + current prompt + response

This is roughly 3,000 words (about 12 pages) of total input and output. That sounds like a lot until you try to inject meaningful app data.

A BJJ training app with 116 positions, each with a 200-word description: ~30,000 tokens — 7x the entire context window. Injecting "all your app data" into instructions is not a strategy; it's a crash waiting to happen.

What Breaks First

When you over-fill the context window you get GenerationError.exceededContextWindowSize. But the model also silently degrades before it throws — a model given 3,500 tokens of input in a 4,096 window has only 596 tokens for its response. For most tasks that's enough. For others it's not — and the failure mode is truncation, not an error.

Common over-injection mistakes:

Data	Tokens (approx)	Problem
All SwiftData records (100+ items)	10,000–50,000	Massively exceeds window
Full JSON blob of one complex entity	500–2,000	May leave little room for response
Entire app configuration/preferences	200–800	Unnecessary; most not relevant
Complete conversation history (100 turns)	2,000–5,000	Pushes out current prompt

Pattern 1: Select, Don't Dump

The simplest and most impactful change: fetch only what's relevant to the current request.

// ❌ Dumps all 116 positions into context — will throw
let allPositions = try await queryService.fetchAllPositions()
let session = LanguageModelSession {
    "Here are all BJJ positions: \(allPositions.map(\.description).joined(separator: "\n"))"
}

// ✅ Fetches only positions relevant to the current question
let relevantPositions = try await queryService.fetchPositions(
    matching: userQuery,
    limit: 5  // 5 positions × ~200 tokens = ~1,000 tokens — fits comfortably
)
let session = LanguageModelSession {
    "Relevant positions: \(relevantPositions.map(\.summary).joined(separator: "\n"))"
}
// ❌ Dumps all 116 positions into context — will throw
let allPositions = try await queryService.fetchAllPositions()
let session = LanguageModelSession {
    "Here are all BJJ positions: \(allPositions.map(\.description).joined(separator: "\n"))"
}

// ✅ Fetches only positions relevant to the current question
let relevantPositions = try await queryService.fetchPositions(
    matching: userQuery,
    limit: 5  // 5 positions × ~200 tokens = ~1,000 tokens — fits comfortably
)
let session = LanguageModelSession {
    "Relevant positions: \(relevantPositions.map(\.summary).joined(separator: "\n"))"
}

Use SwiftData predicates and fetchLimit to constrain what you load before it reaches the context.

Pattern 2: Layered Injection

Inject summaries at the top level, with detail available on-demand via tools. The model sees the overview by default and only loads detail when it actually needs it:

// Layer 1 — always injected: position names only (~50 tokens for 116 positions)
let positionNames = positions.map(\.name).joined(separator: ", ")

// Layer 2 — injected only when needed via tool
struct PositionDetailTool: Tool {
    let name = "getPositionDetail"
    let description = "Returns full description and transitions for a named BJJ position."

    @Generable
    struct Arguments {
        var positionName: String
    }

    func call(arguments: Arguments) async -> String {
        // Fetch the full detail only when the model asks for it
        return await loadPositionDetail(arguments.positionName)
    }
}

let session = LanguageModelSession(tools: [PositionDetailTool()]) {
    "Available positions: \(positionNames)"
    "Use getPositionDetail to look up full information about any position."
}
// Layer 1 — always injected: position names only (~50 tokens for 116 positions)
let positionNames = positions.map(\.name).joined(separator: ", ")

// Layer 2 — injected only when needed via tool
struct PositionDetailTool: Tool {
    let name = "getPositionDetail"
    let description = "Returns full description and transitions for a named BJJ position."

    @Generable
    struct Arguments {
        var positionName: String
    }

    func call(arguments: Arguments) async -> String {
        // Fetch the full detail only when the model asks for it
        return await loadPositionDetail(arguments.positionName)
    }
}

let session = LanguageModelSession(tools: [PositionDetailTool()]) {
    "Available positions: \(positionNames)"
    "Use getPositionDetail to look up full information about any position."
}

This keeps the base context lean (~50 tokens for names vs 30,000 for all descriptions) while still giving the model access to full detail on demand.

Pattern 3: The Two-Step Compression Pipeline

For tasks that require reasoning over large datasets, compress first, then reason. This only makes sense on-device — with a cloud API you pay per token on both calls and gain nothing. On-device both calls are free and private:

// Step 1: Summarise the large dataset (fresh session, large input is fine)
func summariseTrainingHistory(_ sessions: [TrainingSession]) async throws -> String {
    let session = LanguageModelSession {
        "Summarise this training history in 150 words, highlighting patterns and progress."
    }
    let fullHistory = sessions.map(\.description).joined(separator: "\n\n")
    return try await session.respond(to: fullHistory).content
    // fullHistory might be 5,000 tokens — fills most of the window, but that's fine
    // The output is ~150 tokens
}

// Step 2: Reason with the summary (fresh context, compact input)
func answerWithHistory(question: String, summary: String) async throws -> String {
    let session = LanguageModelSession {
        "Training history summary: \(summary)"  // ~150 tokens
        "Answer questions about training progress based on this summary."
    }
    return try await session.respond(to: question).content
    // Plenty of context headroom for question + answer
}

// Usage
let summary = try await summariseTrainingHistory(recentSessions)
let answer  = try await answerWithHistory(question: userQuestion, summary: summary)
// Step 1: Summarise the large dataset (fresh session, large input is fine)
func summariseTrainingHistory(_ sessions: [TrainingSession]) async throws -> String {
    let session = LanguageModelSession {
        "Summarise this training history in 150 words, highlighting patterns and progress."
    }
    let fullHistory = sessions.map(\.description).joined(separator: "\n\n")
    return try await session.respond(to: fullHistory).content
    // fullHistory might be 5,000 tokens — fills most of the window, but that's fine
    // The output is ~150 tokens
}

// Step 2: Reason with the summary (fresh context, compact input)
func answerWithHistory(question: String, summary: String) async throws -> String {
    let session = LanguageModelSession {
        "Training history summary: \(summary)"  // ~150 tokens
        "Answer questions about training progress based on this summary."
    }
    return try await session.respond(to: question).content
    // Plenty of context headroom for question + answer
}

// Usage
let summary = try await summariseTrainingHistory(recentSessions)
let answer  = try await answerWithHistory(question: userQuestion, summary: summary)

The summary call uses most of its window for the raw data and produces a compact output. The reasoning call has clean context with just the summary. Each call is focused on a single task.

Pattern 4: Pre-Summarise at Write Time

For persistent app data (SwiftData entities), generate summaries when the data is saved and store them alongside the entity. The summary is computed once and reused for every future AI interaction:

@Model
final class TrainingSession {
    var rawNotes: String = ""
    var date: Date = Date()
    var techniques: [String] = []

    // Pre-generated — computed at save time, reused in every AI call
    var aiSummary: String = ""
}

// When saving a session
func saveSession(_ session: TrainingSession) async {
    // Generate summary once at write time
    if #available(iOS 26, *) {
        let model = LanguageModelSession {
            "Summarise this BJJ training session in 50 words."
        }
        let summary = try? await model.respond(
            to: session.rawNotes + "\nTechniques: \(session.techniques.joined(separator: ", "))"
        ).content
        session.aiSummary = summary ?? ""
    }
    modelContext.insert(session)
    try? modelContext.save()
}

// At query time — inject pre-built summaries, not raw notes
func buildSessionContext(recentSessions: [TrainingSession]) -> String {
    recentSessions
        .map { "[\($0.date.formatted())]: \($0.aiSummary)" }
        .joined(separator: "\n")
    // Each summary: ~50 tokens × 10 sessions = 500 tokens — fits comfortably
}
@Model
final class TrainingSession {
    var rawNotes: String = ""
    var date: Date = Date()
    var techniques: [String] = []

    // Pre-generated — computed at save time, reused in every AI call
    var aiSummary: String = ""
}

// When saving a session
func saveSession(_ session: TrainingSession) async {
    // Generate summary once at write time
    if #available(iOS 26, *) {
        let model = LanguageModelSession {
            "Summarise this BJJ training session in 50 words."
        }
        let summary = try? await model.respond(
            to: session.rawNotes + "\nTechniques: \(session.techniques.joined(separator: ", "))"
        ).content
        session.aiSummary = summary ?? ""
    }
    modelContext.insert(session)
    try? modelContext.save()
}

// At query time — inject pre-built summaries, not raw notes
func buildSessionContext(recentSessions: [TrainingSession]) -> String {
    recentSessions
        .map { "[\($0.date.formatted())]: \($0.aiSummary)" }
        .joined(separator: "\n")
    // Each summary: ~50 tokens × 10 sessions = 500 tokens — fits comfortably
}

Pre-summarisation at write time means:

Zero AI cost at query time — the summary is already there
The context load is predictable and bounded by summary length
The summary can be updated when the entity changes

Dataset Size Reference

Content	Volume	Approx. Tokens	Fits in Context?
Single entity description	1	200–500	✅ Yes
Entity names list	100	~150	✅ Yes
Short entity summaries	10	~500	✅ Yes
Short entity summaries	50	~2,500	⚠️ Tight
Full entity descriptions	10	~2,000	⚠️ Tight
Full entity descriptions	50+	10,000+	❌ No
Full entity descriptions	100+	20,000+	❌ No
Conversation (10 turns)	—	~1,000	✅ Yes
Conversation (50 turns)	—	~5,000	❌ No

Decision Tree

Do you need to inject app data into context?
│
├── Yes → How much data?
│   │
│   ├── 1–5 entities, full detail
│   │   └── Inject directly into instructions
│   │
│   ├── 5–20 entities
│   │   ├── Always need all of them? → Inject summaries (pre-generated at write time)
│   │   └── Only need some? → Names in instructions + detail via Tool
│   │
│   └── 20+ entities
│       ├── Need to reason across all of them? → Two-step: summarise first, then reason
│       └── Need specific ones? → Select with predicate, inject summaries for matched
│
└── No → Standard session-per-call, no data injection needed

On-Device vs Cloud: Why This Pattern Is Different

With cloud APIs (OpenAI, Anthropic, Google), the two-step pattern is often not worth it: you pay per token on both calls, and the total cost may be similar to one call with the full data — especially if the summarisation model is also expensive.

On-device, the economics flip:

No per-token cost — both calls are free
No network latency — both calls run locally, typically in under a second each
No privacy concern — data never leaves the device regardless of call count
Shared resource — each call consumes system resources and may be rate-limited, so compact contexts are still preferred

This makes on-device AI uniquely suited to multi-step pipelines where cloud would be prohibitively expensive or slow.

Part 15: Advanced Patterns

This section covers patterns that don't fit neatly into any earlier part — actor isolation details, the non-obvious syntax for @Generable enums with associated values, reactive availability monitoring in SwiftUI, chaining model output back as prompt input via PromptRepresentable, and the bounded domain injection pattern for apps with curated entity datasets.

Actor Isolation and `call(arguments:)` — What Actor Does Your Code Run On?

Understanding actor isolation in FoundationModels matters when your tool or service touches @MainActor-bound state.

`Tool.call(arguments:)` Is `@concurrent`

The call(arguments:) method on the Tool protocol is implicitly @concurrent, which means it runs off the main actor — in a generic concurrent executor, not @MainActor. This is deliberate: the model calls your tool during inference, which itself is off the main actor. Calling back to the main actor mid-inference would require a hop, adding latency.

@available(iOS 26, *)
struct TrainingHistoryTool: Tool {
    let name = "getTrainingHistory"
    let description = "Returns recent training sessions."

    @Generable
    struct Arguments {
        var limit: Int
    }

    // This runs @concurrent — NOT on @MainActor
    func call(arguments: Arguments) async -> String {
        // ✅ Pure computation or actor-independent async work is fine here
        let sessions = await fetchSessions(limit: arguments.limit)
        return sessions.map(\.summary).joined(separator: "\n")

        // ❌ Accessing @MainActor-bound state directly will cause a data race warning
        // return self.someMainActorProperty  // won't compile
    }
}
@available(iOS 26, *)
struct TrainingHistoryTool: Tool {
    let name = "getTrainingHistory"
    let description = "Returns recent training sessions."

    @Generable
    struct Arguments {
        var limit: Int
    }

    // This runs @concurrent — NOT on @MainActor
    func call(arguments: Arguments) async -> String {
        // ✅ Pure computation or actor-independent async work is fine here
        let sessions = await fetchSessions(limit: arguments.limit)
        return sessions.map(\.summary).joined(separator: "\n")

        // ❌ Accessing @MainActor-bound state directly will cause a data race warning
        // return self.someMainActorProperty  // won't compile
    }
}

If your tool genuinely needs main-actor state (e.g., reading from a @MainActor service), hop explicitly:

func call(arguments: Arguments) async -> String {
    // Hop to MainActor to read the value, then hop back
    let data = await MainActor.run {
        myMainActorService.currentData
    }
    return process(data)
}
func call(arguments: Arguments) async -> String {
    // Hop to MainActor to read the value, then hop back
    let data = await MainActor.run {
        myMainActorService.currentData
    }
    return process(data)
}

What Actor Does `respond()` Run On?

LanguageModelSession.respond() is async but has no actor isolation requirement — it is safe to call from any actor context, including @MainActor. Internally, the framework dispatches inference to a background executor automatically.

// ✅ Calling respond() from @MainActor is fine — the framework handles the dispatch
@MainActor
final class NormalisationService {
    func normalise(_ text: String) async -> String {
        let session = LanguageModelSession { "Fix BJJ terms." }
        // respond() is async but not @MainActor — call is fine from here
        let response = try? await session.respond(to: Prompt { text })
        return response?.content ?? text
    }
}
// ✅ Calling respond() from @MainActor is fine — the framework handles the dispatch
@MainActor
final class NormalisationService {
    func normalise(_ text: String) async -> String {
        let session = LanguageModelSession { "Fix BJJ terms." }
        // respond() is async but not @MainActor — call is fine from here
        let response = try? await session.respond(to: Prompt { text })
        return response?.content ?? text
    }
}

You do not need to manually Task.detach or use Task { @concurrent in ... } before calling respond(). The framework does the right thing automatically.

`@MainActor` Services Calling Tools — The Safe Pattern

When a @MainActor service needs tools that access non-@MainActor data, the cleanest pattern is to make the tool capture any main-actor dependencies at session creation time (before inference begins), rather than accessing them from within call():

@MainActor
final class CoachingService {
    private let userProfile: UserProfile  // @MainActor bound

    func answer(_ question: String) async -> String {
        // Capture the profile value NOW, on MainActor, before the session runs
        let profileSummary = userProfile.summary  // safe — we're on MainActor

        // The tool closes over the already-captured value — no actor hop needed in call()
        struct ProfileContextTool: Tool {
            let name = "getUserProfile"
            let description = "Returns the user's training profile."
            @Generable struct Arguments {}
            let summary: String  // captured at creation time
            func call(arguments: Arguments) async -> String { summary }
        }

        let session = LanguageModelSession(tools: [ProfileContextTool(summary: profileSummary)]) {
            "Answer BJJ coaching questions using the user's profile."
        }
        return (try? await session.respond(to: question).content) ?? ""
    }
}
@MainActor
final class CoachingService {
    private let userProfile: UserProfile  // @MainActor bound

    func answer(_ question: String) async -> String {
        // Capture the profile value NOW, on MainActor, before the session runs
        let profileSummary = userProfile.summary  // safe — we're on MainActor

        // The tool closes over the already-captured value — no actor hop needed in call()
        struct ProfileContextTool: Tool {
            let name = "getUserProfile"
            let description = "Returns the user's training profile."
            @Generable struct Arguments {}
            let summary: String  // captured at creation time
            func call(arguments: Arguments) async -> String { summary }
        }

        let session = LanguageModelSession(tools: [ProfileContextTool(summary: profileSummary)]) {
            "Answer BJJ coaching questions using the user's profile."
        }
        return (try? await session.respond(to: question).content) ?? ""
    }
}

This is simpler than hopping to MainActor inside call() and avoids any potential race conditions.

`@Generable` Enums With Associated Values

The earlier enum examples in Part 4 showed simple case enums (.positive, .neutral, .negative). @Generable also supports enums with associated values — but the syntax has a specific constraint: all associated values must themselves conform to Generable (or be types that @Generable already knows how to handle: String, Int, Double, Bool, arrays of generable types).

Basic Associated Value Enum

@available(iOS 26, *)
@Generable
enum TranscriptCorrection {
    case termCorrection(original: String, corrected: String)
    case spellingFix(original: String, corrected: String)
    case noChange
}

@Generable
struct AnnotatedTranscript {
    @Guide(description: "The corrected transcript text")
    var correctedText: String

    @Guide(description: "Each correction made, with original and corrected forms")
    var corrections: [TranscriptCorrection]
}
@available(iOS 26, *)
@Generable
enum TranscriptCorrection {
    case termCorrection(original: String, corrected: String)
    case spellingFix(original: String, corrected: String)
    case noChange
}

@Generable
struct AnnotatedTranscript {
    @Guide(description: "The corrected transcript text")
    var correctedText: String

    @Guide(description: "Each correction made, with original and corrected forms")
    var corrections: [TranscriptCorrection]
}

The model generates each corrections element as a tagged union — it chooses the case name and then generates the associated values. This is significantly richer than a flat string array for corrections, because the output is fully typed.

Nested `@Generable` Structs as Associated Values

Associated values can also be @Generable structs:

@available(iOS 26, *)
@Generable
struct DateRange {
    @Guide(description: "Start date in YYYY-MM-DD format")
    var start: String
    @Guide(description: "End date in YYYY-MM-DD format")
    var end: String
}

@Generable
enum ScheduleIntent {
    case singleDay(date: String)
    case dateRange(range: DateRange)
    case recurring(dayOfWeek: String, startTime: String)
    case unspecified
}

@Generable
struct ParsedScheduleRequest {
    @Guide(description: "What the user wants to schedule")
    var activity: String

    @Guide(description: "When the user wants to schedule it")
    var timing: ScheduleIntent
}
@available(iOS 26, *)
@Generable
struct DateRange {
    @Guide(description: "Start date in YYYY-MM-DD format")
    var start: String
    @Guide(description: "End date in YYYY-MM-DD format")
    var end: String
}

@Generable
enum ScheduleIntent {
    case singleDay(date: String)
    case dateRange(range: DateRange)
    case recurring(dayOfWeek: String, startTime: String)
    case unspecified
}

@Generable
struct ParsedScheduleRequest {
    @Guide(description: "What the user wants to schedule")
    var activity: String

    @Guide(description: "When the user wants to schedule it")
    var timing: ScheduleIntent
}

When to Use Associated Value Enums vs Flat Structs

Use associated value enums when the output shape is fundamentally discriminated — the presence of one field makes others meaningless. In the ScheduleIntent example above, if the user said "every Monday at 9am", the .recurring case makes date and range meaningless, and a flat struct would leave those fields awkwardly nil.

Use flat @Generable structs with optional properties when most combinations of values are valid. The associated value enum excels when the cases are truly mutually exclusive and each has distinct associated data.

The Constraint: All Associated Values Must Be Generable

If you include a type that is not Generable-conformant as an associated value, the @Generable macro will emit a compile-time error. The fix is always one of:

Add @Generable to the associated type
Change the associated type to a primitive (String, Int, etc.)
Represent it as a separate @Generable struct with its own properties

`Observable` Availability Monitoring — Reactive SwiftUI Pattern

SystemLanguageModel is an Observable final class. This means SwiftUI views can react to .availability changes without any additional wiring — the view re-renders automatically when availability changes.

This is useful when you want to show/hide AI features reactively, for example when the model finishes downloading (.modelNotReady → .available) while the user is already in the app.

Basic Reactive Availability View

@available(iOS 26, *)
struct AIFeatureBadge: View {
    var body: some View {
        // SwiftUI observes SystemLanguageModel.default automatically
        // because it's @Observable — no @StateObject, no manual subscription
        let model = SystemLanguageModel.default

        switch model.availability {
        case .available:
            Label("AI Ready", systemImage: "sparkles")
                .foregroundStyle(.green)
        case .unavailable(.modelNotReady):
            Label("AI Downloading...", systemImage: "arrow.down.circle")
                .foregroundStyle(.yellow)
        case .unavailable(.appleIntelligenceNotEnabled):
            Label("Enable Apple Intelligence", systemImage: "exclamationmark.circle")
                .foregroundStyle(.secondary)
        case .unavailable(.deviceNotEligible):
            EmptyView()  // Don't surface this — it's permanent
        @unknown default:
            EmptyView()
        }
    }
}
@available(iOS 26, *)
struct AIFeatureBadge: View {
    var body: some View {
        // SwiftUI observes SystemLanguageModel.default automatically
        // because it's @Observable — no @StateObject, no manual subscription
        let model = SystemLanguageModel.default

        switch model.availability {
        case .available:
            Label("AI Ready", systemImage: "sparkles")
                .foregroundStyle(.green)
        case .unavailable(.modelNotReady):
            Label("AI Downloading...", systemImage: "arrow.down.circle")
                .foregroundStyle(.yellow)
        case .unavailable(.appleIntelligenceNotEnabled):
            Label("Enable Apple Intelligence", systemImage: "exclamationmark.circle")
                .foregroundStyle(.secondary)
        case .unavailable(.deviceNotEligible):
            EmptyView()  // Don't surface this — it's permanent
        @unknown default:
            EmptyView()
        }
    }
}

Because SystemLanguageModel is @Observable, SwiftUI tracks which properties the body reads and re-renders when they change. No .onReceive, no Combine, no explicit observation setup.

Watching for the Model Becoming Ready

The .task {} modifier is the right tool for reacting to an availability change and triggering a one-time action — for example, kicking off an initial data enrichment pass once the model becomes available:

@available(iOS 26, *)
struct TrainingDashboardView: View {
    @State private var hasRunInitialEnrichment = false

    var body: some View {
        // ... view content ...
        .task {
            // This task runs when the view appears and re-runs if availability changes
            for await _ in SystemLanguageModel.default.availabilityUpdates {
                guard !hasRunInitialEnrichment else { break }
                if SystemLanguageModel.default.isAvailable {
                    await runInitialEnrichment()
                    hasRunInitialEnrichment = true
                }
            }
        }
    }

    private func runInitialEnrichment() async {
        // Generate AI summaries for any entities that don't have them yet
    }
}
@available(iOS 26, *)
struct TrainingDashboardView: View {
    @State private var hasRunInitialEnrichment = false

    var body: some View {
        // ... view content ...
        .task {
            // This task runs when the view appears and re-runs if availability changes
            for await _ in SystemLanguageModel.default.availabilityUpdates {
                guard !hasRunInitialEnrichment else { break }
                if SystemLanguageModel.default.isAvailable {
                    await runInitialEnrichment()
                    hasRunInitialEnrichment = true
                }
            }
        }
    }

    private func runInitialEnrichment() async {
        // Generate AI summaries for any entities that don't have them yet
    }
}

Note: If availabilityUpdates is not available on your OS target, use .task(id: SystemLanguageModel.default.availability) as an alternative — the task re-runs when availability changes since Availability is Equatable:

.task(id: SystemLanguageModel.default.availability) {
    guard SystemLanguageModel.default.isAvailable else { return }
    guard !hasRunInitialEnrichment else { return }
    await runInitialEnrichment()
    hasRunInitialEnrichment = true
}
.task(id: SystemLanguageModel.default.availability) {
    guard SystemLanguageModel.default.isAvailable else { return }
    guard !hasRunInitialEnrichment else { return }
    await runInitialEnrichment()
    hasRunInitialEnrichment = true
}

Avoiding the Per-View `@available` Constraint

The reactive pattern works cleanly with the AnyObject? wrapping approach from Part 1. Keep the Observable observation inside a #available check, or confine it to a view that is itself conditionally shown:

// In the parent view (no iOS 26 requirement):
var body: some View {
    VStack {
        mainContent
        if #available(iOS 26, *) {
            AIStatusBadge()  // only this view requires iOS 26
        }
    }
}
// In the parent view (no iOS 26 requirement):
var body: some View {
    VStack {
        mainContent
        if #available(iOS 26, *) {
            AIStatusBadge()  // only this view requires iOS 26
        }
    }
}

This way the availability-reactive logic is isolated to a specific subview, and the containing view has no version constraint.

`PromptRepresentable` — Chaining Model Output Back as Input

One of the cleaner architectural patterns enabled by the protocol hierarchy is output-as-input chaining: taking a @Generable type from one call and passing it directly as prompt input to the next call, without any serialisation step.

This works because @Generable types conform to PromptRepresentable (via ConvertibleToGeneratedContent), which means they can appear directly in a @PromptBuilder closure.

Basic Chaining Example

@available(iOS 26, *)
@Generable
struct NormalisedTranscript {
    @Guide(description: "Corrected transcript text")
    var normalisedText: String
    @Guide(description: "BJJ terms found, in canonical form")
    var extractedTerms: [String]
}

@Generable
struct SessionSummary {
    @Guide(description: "One-paragraph summary of the training session")
    var summary: String
    @Guide(description: "Techniques practiced, from the corrected terms")
    var techniquesWorked: [String]
}

// Two-step pipeline: correct → summarise
func processTranscript(_ raw: String) async throws -> SessionSummary {
    // Step 1: Correct BJJ terminology
    let correctionSession = LanguageModelSession {
        "Fix speech-to-text errors in BJJ transcripts. Return corrected text and term list."
    }
    let corrected = try await correctionSession.respond(
        to: Prompt { raw },
        generating: NormalisedTranscript.self
    )

    // Step 2: Summarise — pass the @Generable output directly as prompt input
    // No JSON encoding, no manual string building needed
    let summarySession = LanguageModelSession {
        "Summarise a BJJ training session given a corrected transcript."
    }
    let summary = try await summarySession.respond(
        to: Prompt {
            "Transcript: \(corrected.content)"  // NormalisedTranscript directly in @PromptBuilder
        },
        generating: SessionSummary.self
    )
    return summary.content
}
@available(iOS 26, *)
@Generable
struct NormalisedTranscript {
    @Guide(description: "Corrected transcript text")
    var normalisedText: String
    @Guide(description: "BJJ terms found, in canonical form")
    var extractedTerms: [String]
}

@Generable
struct SessionSummary {
    @Guide(description: "One-paragraph summary of the training session")
    var summary: String
    @Guide(description: "Techniques practiced, from the corrected terms")
    var techniquesWorked: [String]
}

// Two-step pipeline: correct → summarise
func processTranscript(_ raw: String) async throws -> SessionSummary {
    // Step 1: Correct BJJ terminology
    let correctionSession = LanguageModelSession {
        "Fix speech-to-text errors in BJJ transcripts. Return corrected text and term list."
    }
    let corrected = try await correctionSession.respond(
        to: Prompt { raw },
        generating: NormalisedTranscript.self
    )

    // Step 2: Summarise — pass the @Generable output directly as prompt input
    // No JSON encoding, no manual string building needed
    let summarySession = LanguageModelSession {
        "Summarise a BJJ training session given a corrected transcript."
    }
    let summary = try await summarySession.respond(
        to: Prompt {
            "Transcript: \(corrected.content)"  // NormalisedTranscript directly in @PromptBuilder
        },
        generating: SessionSummary.self
    )
    return summary.content
}

The \(corrected.content) interpolation works because NormalisedTranscript (a @Generable struct) conforms to PromptRepresentable. The framework serialises it appropriately for the model — you never touch the intermediate representation.

When Chaining Is Worth It

The chain pattern is most valuable when:

Output type 1 contains richer structure than a plain string — passing the full NormalisedTranscript (with both normalisedText and extractedTerms) to the next session gives the model more signal than a plain corrected string
Each step is a focused, single-task session — staying true to the "one task per session" principle (Part 3) while getting compound results
You want typed output at every step — rather than a single sprawling @Generable struct trying to do everything, each step produces its own clean type

Avoid chaining when the first step's output is a plain String — in that case, just use string interpolation normally. The PromptRepresentable chaining is most valuable for multi-property structured output.

Bounded Domain Injection — The Names-Only Pattern

This is a specialised context engineering pattern for apps that have a fixed, curated, known domain — a set of entities whose names are meaningful and bounded. The insight is that entity names alone are remarkably compact while still giving the model strong domain grounding.

The Core Insight

In Grapla, there are 116 BJJ positions, 150 techniques, 118 submissions, and 141 movements — 525 total entities. Injecting all the descriptions for all 525 entities would require tens of thousands of tokens and overflow the context window many times over.

But injecting just the names is cheap:

Mount, Half Guard, Side Control, Back Mount, Turtle, North-South, Closed Guard,
Open Guard, De La Riva, X-Guard, Butterfly Guard, Single Leg X, ...
Kimura, Armbar, Triangle, Rear Naked Choke, D'Arce, Anaconda, Omoplata, ...
Hip Bump Sweep, Flower Sweep, Scissor Sweep, Pendulum Sweep, ...

A full list of ~525 entity names in CSV format uses approximately 700–900 tokens — well within a 4,096-token window, leaving ample room for instructions, prompt, and response.

Why Names Alone Are Sufficient for Correction Tasks

For a transcript correction service, the model's job is:

Recognise that "kimora" is a garbled version of a known entity
Replace it with the canonical form "Kimura"

The model doesn't need the description of a Kimura to know that "kimora" should be "Kimura". The name list acts as a canonical term index — the model can fuzzy-match against it and apply corrections.

@available(iOS 26, *)
struct BJJEntityNames {
    // Pre-built at app startup from the SwiftData store — reused for every normalisation call
    static let positions = [
        "Mount", "Half Guard", "Side Control", "Back Mount", "Turtle",
        "North-South", "Closed Guard", "Open Guard", "De La Riva", "X-Guard",
        "Butterfly Guard", "Single Leg X", "Full Guard", "Rubber Guard",
        // ... all 116 positions
    ]
    static let techniques = [ /* all 150 */ ]
    static let submissions = [ /* all 118 */ ]
    static let movements   = [ /* all 141 */ ]

    static var allAsCSV: String {
        (positions + techniques + submissions + movements).joined(separator: ", ")
    }
}

@available(iOS 26, *)
final class TranscriptNormalisationService {
    func normalise(_ rawTranscript: String) async -> String {
        let entityNames = BJJEntityNames.allAsCSV  // ~700 tokens

        let session = LanguageModelSession {
            "Fix speech-to-text errors in BJJ training transcripts."
            "Canonical entity names: \(entityNames)"
            "Correct misrecognised terms to their canonical forms. Return only the corrected text."
        }
        // Total instructions: ~750 tokens — leaves ~3,300 tokens for prompt + response
        let response = try? await session.respond(to: Prompt { rawTranscript })
        return response?.content ?? rawTranscript
    }
}
@available(iOS 26, *)
struct BJJEntityNames {
    // Pre-built at app startup from the SwiftData store — reused for every normalisation call
    static let positions = [
        "Mount", "Half Guard", "Side Control", "Back Mount", "Turtle",
        "North-South", "Closed Guard", "Open Guard", "De La Riva", "X-Guard",
        "Butterfly Guard", "Single Leg X", "Full Guard", "Rubber Guard",
        // ... all 116 positions
    ]
    static let techniques = [ /* all 150 */ ]
    static let submissions = [ /* all 118 */ ]
    static let movements   = [ /* all 141 */ ]

    static var allAsCSV: String {
        (positions + techniques + submissions + movements).joined(separator: ", ")
    }
}

@available(iOS 26, *)
final class TranscriptNormalisationService {
    func normalise(_ rawTranscript: String) async -> String {
        let entityNames = BJJEntityNames.allAsCSV  // ~700 tokens

        let session = LanguageModelSession {
            "Fix speech-to-text errors in BJJ training transcripts."
            "Canonical entity names: \(entityNames)"
            "Correct misrecognised terms to their canonical forms. Return only the corrected text."
        }
        // Total instructions: ~750 tokens — leaves ~3,300 tokens for prompt + response
        let response = try? await session.respond(to: Prompt { rawTranscript })
        return response?.content ?? rawTranscript
    }
}

Generalising the Pattern

The bounded domain pattern works whenever your app has a finite, knowable set of canonical terms. Some examples:

App	Bounded Domain	Names-Only Size
BJJ app	525 positions/techniques/submissions/movements	~700 tokens
Recipe app	500 common ingredients	~600 tokens
Medical notes	300 ICD-10 conditions (common subset)	~400 tokens
Developer tool	200 API method names	~250 tokens
Music app	400 instruments + musical terms	~500 tokens

The test for whether this pattern applies: Can you enumerate all the canonical terms your app cares about? If yes, inject the names list. The model will use it as a correction index without needing any descriptions.

Names-Only vs Names + Detail

Combine with the Layered Injection pattern (Part 14) when you sometimes need both correction and reasoning about entities:

let session = LanguageModelSession(tools: [PositionDetailTool()]) {
    // Layer 1: names always present (~700 tokens) — enables correction
    "Canonical BJJ entities: \(BJJEntityNames.allAsCSV)"

    // Layer 2: detail available on demand via tool — enables reasoning
    "Use getPositionDetail to look up descriptions, transitions, and techniques for any position."
}
let session = LanguageModelSession(tools: [PositionDetailTool()]) {
    // Layer 1: names always present (~700 tokens) — enables correction
    "Canonical BJJ entities: \(BJJEntityNames.allAsCSV)"

    // Layer 2: detail available on demand via tool — enables reasoning
    "Use getPositionDetail to look up descriptions, transitions, and techniques for any position."
}

This gives the model correction capability (names) plus on-demand depth (tool) while keeping the base context compact.

Experimental Directions

These patterns are worth exploring but untested at scale. They use only FoundationModels — no additional frameworks required.

Sharded parallel sessions. When your vocabulary corpus is too large for a single context but you need full coverage, split it across multiple sessions running concurrently. Each session holds a different shard of the names list. After all sessions return, merge results — prefer any correction over "unchanged", break ties by confidence or frequency. The on-device model's free-per-call economics make this viable in a way that would be expensive with a cloud API.

async let positions = normalise(rawText, vocabulary: BJJEntityNames.positions)
async let techniques = normalise(rawText, vocabulary: BJJEntityNames.techniques)
async let submissions = normalise(rawText, vocabulary: BJJEntityNames.submissions)

let (p, t, s) = try await (positions, techniques, submissions)
let merged = merge(p, t, s)  // your logic for combining corrections
async let positions = normalise(rawText, vocabulary: BJJEntityNames.positions)
async let techniques = normalise(rawText, vocabulary: BJJEntityNames.techniques)
async let submissions = normalise(rawText, vocabulary: BJJEntityNames.submissions)

let (p, t, s) = try await (positions, techniques, submissions)
let merged = merge(p, t, s)  // your logic for combining corrections

Adaptive context budgeting. Before injecting data, measure how much headroom you have with tokenUsage(for:), then fill to a target percentage (e.g. 60% of the window, reserving 40% for prompt + response). Rank your entities by relevance and inject greedily until you hit the budget. This turns context injection from a static decision into a runtime one.

let instrTokens = try await model.tokenUsage(for: instructions).tokenCount
let window = await model.contextSize
let budget = Int(Double(window) * 0.6) - instrTokens  // 60% target, minus instructions

var injected: [String] = []
var used = 0
for entity in rankedEntities {
    let cost = estimateTokens(entity.name)  // ~1.3 tokens per word
    guard used + cost <= budget else { break }
    injected.append(entity.name)
    used += cost
}
let instrTokens = try await model.tokenUsage(for: instructions).tokenCount
let window = await model.contextSize
let budget = Int(Double(window) * 0.6) - instrTokens  // 60% target, minus instructions

var injected: [String] = []
var used = 0
for entity in rankedEntities {
    let cost = estimateTokens(entity.name)  // ~1.3 tokens per word
    guard used + cost <= budget else { break }
    injected.append(entity.name)
    used += cost
}

Transcript as structured cache. Rather than rehydrating a conversation, use a saved Transcript as a compressed knowledge cache — pre-generate a transcript that contains a curated Q&A exchange about your domain (e.g. "what is a Kimura?" → model's answer), then resume from that transcript for every live session. The model starts with pre-baked domain knowledge already in its context, without spending live call tokens to establish it.

All three patterns are speculative — they depend on how the model handles parallel resource contention, whether adaptive sizing materially improves output quality, and whether transcript rehydration preserves semantic coherence. The #Playground macro is the fastest way to validate any of them before committing to an implementation.

Resources

Official Apple Documentation

WWDC 2025 Sessions:

Session 286: Meet the Foundation Models framework
Session 301: Deep dive into the Foundation Models framework
Session 259: Code-along: Bring on-device AI to your app using the Foundation Models framework

Framework Updates:

February 2026: Improved instruction-following, tokenUsage(for:), contextSize, #Playground macro

Key Types at a Glance

Type	Purpose
`SystemLanguageModel`	Entry point — access the model, check availability
`LanguageModelSession`	Manages a single conversation thread with the model
`Instructions`	System-level behaviour definition for a session
`Prompt`	User input to the model
`Response<Content>`	Wrapper around typed model output — use `.content`
`ResponseStream<Content>`	Async sequence of partial responses for streaming
`GenerationOptions`	Controls temperature, sampling, max tokens
`GenerationGuide<T>`	Constraint on `@Guide` properties (min/max/regex)
`GeneratedContent`	Untyped structured output — escape hatch
`Transcript`	Linear history of a multi-turn session
`Tool`	Protocol for functions the model can call during generation
`SystemLanguageModel.TokenUsage`	Token count for a prompt, instructions, or transcript

Overview

Contents

Part 1: Availability & Setup

SystemLanguageModel.default

Availability Cases

isAvailable Convenience Property

UseCase — .general vs .contentTagging

Guardrails

The AnyObject? Pattern for SwiftUI

Part 2: Sessions & Basic Prompting

LanguageModelSession — Init Variants

Instructions

Prompt

The Critical .content Gotcha

Response.rawContent

Session-Per-Call vs Persistent Sessions

Part 3: Prompt Engineering for On-Device Models

The On-Device Model is Smaller — This Changes Everything

Principle 1: Short, Direct Instructions

Principle 2: Explicit Corrections Beat Implied Inference

Principle 3: Include a Domain Vocabulary in Instructions

Principle 4: One Task Per Session

Principle 5: Avoid Chain-of-Thought Prompting

Frontier Model vs On-Device: Comparison

The #Playground Macro — Fast Prompt Iteration

Part 4: Guided Generation (@Generable)

What @Generable Does

Basic Usage

@Guide — Descriptions

@Guide — Constraints with GenerationGuide

Enums as @Generable Types

PartiallyGenerated — Streaming Snapshots

GeneratedContent — Untyped Escape Hatch

Independent Constructability — Critical for Testing

Protocol Hierarchy

Part 5: Streaming

The Core Decision: Stream or Not?

String Streaming

Typed (@Generable) Streaming

ResponseStream<Content>

Progressive UI Update Pattern

collect() — Streams to Full Response

Error Handling in Streams

Part 6: Generation Options

temperature

GenerationOptions.SamplingMode

maximumResponseTokens

Part 7: Tool Calling

The Tool Protocol

Registering Tools With a Session

How the Model Decides to Call Tools

Critical Performance Insight: Pre-Fetch vs Tool

Context Window Cost

Tool Calls in the Transcript

Part 8: Token Budget

The Budget Breakdown

Measuring Token Usage

The contextSize Property

GenerationError.exceededContextWindowSize

The #Playground Macro for Budget Profiling

Rules of Thumb

Part 9: The Transcript

Transcript.Entry

Reading the Transcript

Saving and Resuming Sessions

When to Use the Transcript

Part 10: Failure Modes & Graceful Degradation

GenerationError Cases

Production Pattern: The Never-Throws Service

Additional Production Patterns

Part 11: Testing

The Four Test Categories

1. Output Type Tests (No Model Required)

2. Service Fallback Tests (Works on All Simulators)

3. Availability Tests (Works Everywhere)

4. On-Device Tests (Manual, .disabled() by Default)

Testing Checklist

Part 12: Example Use Cases

1. Sports / BJJ App — Domain-Specific Transcript Normalisation

2. Recipe App — Ingredient Extraction From Voice

`SystemLanguageModel.default`

`isAvailable` Convenience Property

`UseCase` — `.general` vs `.contentTagging`

`Guardrails`

The `AnyObject?` Pattern for SwiftUI

`LanguageModelSession` — Init Variants

`Instructions`

`Prompt`

The Critical `.content` Gotcha

`Response.rawContent`

The `#Playground` Macro — Fast Prompt Iteration

Part 4: Guided Generation (`@Generable`)

What `@Generable` Does

`@Guide` — Descriptions

`@Guide` — Constraints with `GenerationGuide`

Enums as `@Generable` Types

`PartiallyGenerated` — Streaming Snapshots

`GeneratedContent` — Untyped Escape Hatch

Typed (`@Generable`) Streaming

`ResponseStream<Content>`

`collect()` — Streams to Full Response

`temperature`

`GenerationOptions.SamplingMode`

`maximumResponseTokens`

The `Tool` Protocol

The `contextSize` Property

`GenerationError.exceededContextWindowSize`

The `#Playground` Macro for Budget Profiling

`Transcript.Entry`

`GenerationError` Cases

4. On-Device Tests (Manual, `.disabled()` by Default)

`respond()` vs `streamResponse()`

`@Generable` vs Raw `String`

Actor Isolation and `call(arguments:)` — What Actor Does Your Code Run On?

`Tool.call(arguments:)` Is `@concurrent`

What Actor Does `respond()` Run On?

`@MainActor` Services Calling Tools — The Safe Pattern

`@Generable` Enums With Associated Values

Nested `@Generable` Structs as Associated Values

`Observable` Availability Monitoring — Reactive SwiftUI Pattern

Avoiding the Per-View `@available` Constraint

`PromptRepresentable` — Chaining Model Output Back as Input