FoundationModels: Quick Reference Cheatsheet

This is the fast version. For full detail on any section, follow the deep-dive links to the complete reference.

The 15-Part Index

Part	Covers	One thing to know	Deep dive
1	Availability & Setup	`.modelNotReady` is transient — model is downloading, not missing	→
2	Sessions & Basic Prompting	`respond()` returns `Response<T>` — always unwrap `.content`	→
3	Prompt Engineering	Short beats long. `<200 words`. Explicit rules beat prose descriptions	→
4	`@Generable`	Macro generates a structured output schema; `@Guide` adds constraints	→
5	Streaming	`streamResponse()` → `AsyncSequence`; use `.collect()` to finalise	→
6	Generation Options	`temperature: nil` or `0.0–0.2` for correction tasks; higher for creative	→
7	Tool Calling	Pre-fetch if always needed; define as `Tool` only for conditional data	→
8	Token Budget	All inputs + output share one fixed window (~4,096 tokens)	→
9	The Transcript	New session per call for stateless tasks — don't accumulate history	→
10	Failure Modes	`normalise()` should never throw — return raw input on any failure	→
11	Testing	`@Generable` types are unit-testable without the model (memberwise init)	→
12	Use Cases	10 concrete patterns: BJJ, recipes, journaling, commits, triage...	→
13	Quick Reference	Full type table + anti-patterns (see below)	→
14	Context Engineering	4,096 tokens = ~3,000 words shared. Select, don't dump	→
15	Advanced Patterns	`call()` runs `@concurrent` — hop to `@MainActor` for state access	→

Key Types

Type	Purpose
`SystemLanguageModel`	Singleton entry point — `.default`, `.availability`, `.isAvailable`
`SystemLanguageModel.Availability`	`.available` / `.unavailable(reason)` — always handle `@unknown default`
`LanguageModelSession`	One conversation thread. Stateful — holds `Transcript`
`Instructions`	System prompt — set once at session creation, not per-turn
`Prompt`	Per-turn user input to the model
`Response<Content>`	Wrapper around typed output — always access `.content`, not `response`
`ResponseStream<Content>`	`AsyncSequence` of `Snapshot<Content>` for streaming
`GenerationOptions`	`temperature`, `maximumResponseTokens`, `SamplingMode`
`@Generable`	Macro — synthesises guided generation schema for a struct or enum
`@Guide`	Property wrapper on `@Generable` fields — description + constraints
`GenerationGuide<T>`	Constraint type: `.range()`, `.count()`, `.pattern()`
`Transcript`	Linear history: `.instructions`, `.prompt`, `.response`, `.toolCalls`, `.toolOutput`
`Tool`	Protocol — `Arguments` (Generable), `Output` (PromptRepresentable), `call()`
`SystemLanguageModel.TokenUsage`	`.tokenCount` — measure cost before injection

Session Init Variants

// Minimal — no tools, inline instructions
LanguageModelSession { "Correct BJJ terminology. kimora→Kimura, half card→Half Guard." }

// Explicit model
LanguageModelSession(model: SystemLanguageModel.default) { "..." }

// With tools
LanguageModelSession(tools: [PositionLookupTool()]) { "..." }

// Resume from saved transcript
LanguageModelSession(model: .default, tools: [], transcript: savedTranscript)
// Minimal — no tools, inline instructions
LanguageModelSession { "Correct BJJ terminology. kimora→Kimura, half card→Half Guard." }

// Explicit model
LanguageModelSession(model: SystemLanguageModel.default) { "..." }

// With tools
LanguageModelSession(tools: [PositionLookupTool()]) { "..." }

// Resume from saved transcript
LanguageModelSession(model: .default, tools: [], transcript: savedTranscript)

`respond()` vs `streamResponse()`

	`respond()`	`streamResponse()`
Returns	`Response<Content>`	`ResponseStream<Content>`
Best for	Background processing, pipelines	Live UI with typing effect
Partial results	No	Yes — `snapshot.content` returns `PartiallyGenerated`
Finalise stream	N/A	`.collect()` → `Response<Content>`

Rule: if the output is going directly into a pipeline or SwiftData model, use respond(). If the user sees it appear on screen as it generates, use streamResponse().

Token Budget Formula

Total window ≈ 4,096 tokens ≈ 3,000 words

instructions + tool definitions + transcript history + prompt + response

All five compete for the same pool. Response tokens are consumed from the same window as input tokens — a 500-token response leaves 3,596 tokens for everything else.

Measure before injecting:

let cost = try await model.tokenUsage(for: instructions).tokenCount
let window = await model.contextSize  // back-deployed via @backDeployed
let cost = try await model.tokenUsage(for: instructions).tokenCount
let window = await model.contextSize  // back-deployed via @backDeployed

`@Generable` vs Raw `String`

Use @Generable when:

You need multiple structured fields
Output must be parsed or processed programmatically
You want compile-time guarantees on shape
You need constraints (@Guide) on values

Use raw String when:

Output is prose for display to the user
You're summarising or generating a paragraph
Streaming a typing effect

The `AnyObject?` Pattern (Availability Without `@available` Spread)

The problem: adding a @State private var session: LanguageModelSession? forces @available(iOS 26, *) onto the whole view.

The fix: use AnyObject? as the declared type and cast inside #available guards.

// In your view — no @available annotation needed on the view itself
@State private var normalisationService: AnyObject?

// In .onAppear or .task
if #available(iOS 26, *) {
    normalisationService = TranscriptNormalisationService()
}

// At call site
if #available(iOS 26, *),
   let service = normalisationService as? TranscriptNormalisationService {
    let result = await service.normalise(rawText)
}
// In your view — no @available annotation needed on the view itself
@State private var normalisationService: AnyObject?

// In .onAppear or .task
if #available(iOS 26, *) {
    normalisationService = TranscriptNormalisationService()
}

// At call site
if #available(iOS 26, *),
   let service = normalisationService as? TranscriptNormalisationService {
    let result = await service.normalise(rawText)
}

Context Engineering — 4 Patterns

When app data is too large to inject directly:

Pattern	When to use	How
Select, Don't Dump	Data is queryable	SwiftData predicate — fetch only relevant rows
Layered Injection	Hierarchical data	Inject summaries; load detail on demand via tools
Two-Step Compression	Large corpus, summary needed	Call 1 summarises → Call 2 reasons with summary
Pre-Summarise at Write Time	Rich entities with stable detail	Generate + store AI summary when entity is saved; reuse forever

The 10 Anti-Patterns

1. Accessing response instead of response.content respond() returns Response<T>, not T. Always unwrap .content.

2. Storing LanguageModelSession persistently when you don't need history For stateless tasks (normalisation, extraction, classification), create a new session per call. History accumulates and eventually overflows the context window.

3. Too many tools Each tool definition consumes ~50–100 tokens whether called or not. Keep to 3–5 per session. Split into multiple focused sessions if you have more.

4. Calling isAvailable / checkAvailability() in the hot path Availability doesn't change mid-session. Check once at service init; cache the result.

5. High temperature for structured / correction tasks @Generable correction types need nil or temperature: 0.0–0.2. High temperature produces creatively varied — and wrong — output.

6. Long, elaborate instructions modelled on frontier model prompts The on-device model is ~3B parameters. Instructions over ~200 words dilute signal. Short, explicit rules outperform discursive prose every time.

7. Not testing the fallback path On most devices today, Apple Intelligence is unavailable. Your non-AI path is the primary experience for most users. Test it as thoroughly as the AI path.

8. Using FoundationModels for regex-solvable tasks If the task is a known, fixed pattern (extract a UUID, validate an email, format a date), use a deterministic function. LLM overhead — latency, availability, complexity — is waste.

9. Propagating @available(iOS 26, *) to SwiftUI views Adding @available to a @State property forces the whole view to require iOS 26. Use the AnyObject? pattern instead.

10. Treating .modelNotReady as permanent .modelNotReady means the model is downloading. It is transient. Show "not available right now" and retry on next app launch. Do not display a permanent "unsupported" message.

Minimum Viable Service Pattern

The production-safe wrapper — never throws, falls back silently:

@available(iOS 26, *)
@MainActor final class TranscriptNormalisationService {

    private func makeSession() -> LanguageModelSession {
        LanguageModelSession {
            """
            You are a BJJ transcript corrector. Fix misrecognised terms only.
            Common corrections: kimora→Kimura, half card→Half Guard, arm bar→armbar.
            Vocabulary: Kimura, Triangle, Armbar, Half Guard, Full Guard, Mount, Back Control.
            Return the corrected transcript and the BJJ terms found.
            """
        }
    }

    /// Never throws. Returns raw input unchanged on any failure.
    func normalise(_ rawTranscript: String) async -> NormalisedTranscript {
        guard !rawTranscript.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty else {
            return NormalisedTranscript(normalisedText: rawTranscript, extractedTerms: [])
        }
        guard SystemLanguageModel.default.isAvailable else {
            return NormalisedTranscript(normalisedText: rawTranscript, extractedTerms: [])
        }
        do {
            let session = makeSession()
            let result = try await session.respond(
                to: Prompt { rawTranscript },
                generating: NormalisedTranscript.self
            )
            return result.content
        } catch {
            return NormalisedTranscript(normalisedText: rawTranscript, extractedTerms: [])
        }
    }
}
@available(iOS 26, *)
@MainActor final class TranscriptNormalisationService {

    private func makeSession() -> LanguageModelSession {
        LanguageModelSession {
            """
            You are a BJJ transcript corrector. Fix misrecognised terms only.
            Common corrections: kimora→Kimura, half card→Half Guard, arm bar→armbar.
            Vocabulary: Kimura, Triangle, Armbar, Half Guard, Full Guard, Mount, Back Control.
            Return the corrected transcript and the BJJ terms found.
            """
        }
    }

    /// Never throws. Returns raw input unchanged on any failure.
    func normalise(_ rawTranscript: String) async -> NormalisedTranscript {
        guard !rawTranscript.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty else {
            return NormalisedTranscript(normalisedText: rawTranscript, extractedTerms: [])
        }
        guard SystemLanguageModel.default.isAvailable else {
            return NormalisedTranscript(normalisedText: rawTranscript, extractedTerms: [])
        }
        do {
            let session = makeSession()
            let result = try await session.respond(
                to: Prompt { rawTranscript },
                generating: NormalisedTranscript.self
            )
            return result.content
        } catch {
            return NormalisedTranscript(normalisedText: rawTranscript, extractedTerms: [])
        }
    }
}

Availability Cases at a Glance

Case	Meaning	What to do
`.available`	Ready to use	Create session, proceed
`.unavailable(.deviceNotEligible)`	Hardware doesn't support Apple Intelligence	Show permanent alternative UI; remove AI option
`.unavailable(.appleIntelligenceNotEnabled)`	User hasn't enabled it in Settings	Optionally prompt user; respect their choice
`.unavailable(.modelNotReady)`	Model weights downloading	Show "not available right now"; retry on next launch

Full 15-part reference: iOS 26 FoundationModels: Comprehensive Swift/SwiftUI Reference