This is the fast version. For full detail on any section, follow the deep-dive links to the complete reference.
The 15-Part Index
| Part | Covers | One thing to know | Deep dive |
|---|---|---|---|
| 1 | Availability & Setup | .modelNotReady is transient — model is downloading, not missing | → |
| 2 | Sessions & Basic Prompting | respond() returns Response<T> — always unwrap .content | → |
| 3 | Prompt Engineering | Short beats long. <200 words. Explicit rules beat prose descriptions | → |
| 4 | @Generable | Macro generates a structured output schema; @Guide adds constraints | → |
| 5 | Streaming | streamResponse() → AsyncSequence; use .collect() to finalise | → |
| 6 | Generation Options | temperature: nil or 0.0–0.2 for correction tasks; higher for creative | → |
| 7 | Tool Calling | Pre-fetch if always needed; define as Tool only for conditional data | → |
| 8 | Token Budget | All inputs + output share one fixed window (~4,096 tokens) | → |
| 9 | The Transcript | New session per call for stateless tasks — don't accumulate history | → |
| 10 | Failure Modes | normalise() should never throw — return raw input on any failure | → |
| 11 | Testing | @Generable types are unit-testable without the model (memberwise init) | → |
| 12 | Use Cases | 10 concrete patterns: BJJ, recipes, journaling, commits, triage... | → |
| 13 | Quick Reference | Full type table + anti-patterns (see below) | → |
| 14 | Context Engineering | 4,096 tokens = ~3,000 words shared. Select, don't dump | → |
| 15 | Advanced Patterns | call() runs @concurrent — hop to @MainActor for state access | → |
Key Types
| Type | Purpose |
|---|---|
SystemLanguageModel | Singleton entry point — .default, .availability, .isAvailable |
SystemLanguageModel.Availability | .available / .unavailable(reason) — always handle @unknown default |
LanguageModelSession | One conversation thread. Stateful — holds Transcript |
Instructions | System prompt — set once at session creation, not per-turn |
Prompt | Per-turn user input to the model |
Response<Content> | Wrapper around typed output — always access .content, not response |
ResponseStream<Content> | AsyncSequence of Snapshot<Content> for streaming |
GenerationOptions | temperature, maximumResponseTokens, SamplingMode |
@Generable | Macro — synthesises guided generation schema for a struct or enum |
@Guide | Property wrapper on @Generable fields — description + constraints |
GenerationGuide<T> | Constraint type: .range(), .count(), .pattern() |
Transcript | Linear history: .instructions, .prompt, .response, .toolCalls, .toolOutput |
Tool | Protocol — Arguments (Generable), Output (PromptRepresentable), call() |
SystemLanguageModel.TokenUsage | .tokenCount — measure cost before injection |
Session Init Variants
// Minimal — no tools, inline instructions LanguageModelSession { "Correct BJJ terminology. kimora→Kimura, half card→Half Guard." } // Explicit model LanguageModelSession(model: SystemLanguageModel.default) { "..." } // With tools LanguageModelSession(tools: [PositionLookupTool()]) { "..." } // Resume from saved transcript LanguageModelSession(model: .default, tools: [], transcript: savedTranscript)
respond() vs streamResponse()
respond() | streamResponse() | |
|---|---|---|
| Returns | Response<Content> | ResponseStream<Content> |
| Best for | Background processing, pipelines | Live UI with typing effect |
| Partial results | No | Yes — snapshot.content returns PartiallyGenerated |
| Finalise stream | N/A | .collect() → Response<Content> |
Rule: if the output is going directly into a pipeline or SwiftData model, use respond(). If the user sees it appear on screen as it generates, use streamResponse().
Token Budget Formula
Total window ≈ 4,096 tokens ≈ 3,000 words
instructions + tool definitions + transcript history + prompt + response
All five compete for the same pool. Response tokens are consumed from the same window as input tokens — a 500-token response leaves 3,596 tokens for everything else.
Measure before injecting:
let cost = try await model.tokenUsage(for: instructions).tokenCount let window = await model.contextSize // back-deployed via @backDeployed
@Generable vs Raw String
Use @Generable when:
- You need multiple structured fields
- Output must be parsed or processed programmatically
- You want compile-time guarantees on shape
- You need constraints (
@Guide) on values
Use raw String when:
- Output is prose for display to the user
- You're summarising or generating a paragraph
- Streaming a typing effect
The AnyObject? Pattern (Availability Without @available Spread)
The problem: adding a @State private var session: LanguageModelSession? forces @available(iOS 26, *) onto the whole view.
The fix: use AnyObject? as the declared type and cast inside #available guards.
// In your view — no @available annotation needed on the view itself @State private var normalisationService: AnyObject? // In .onAppear or .task if #available(iOS 26, *) { normalisationService = TranscriptNormalisationService() } // At call site if #available(iOS 26, *), let service = normalisationService as? TranscriptNormalisationService { let result = await service.normalise(rawText) }
Context Engineering — 4 Patterns
When app data is too large to inject directly:
| Pattern | When to use | How |
|---|---|---|
| Select, Don't Dump | Data is queryable | SwiftData predicate — fetch only relevant rows |
| Layered Injection | Hierarchical data | Inject summaries; load detail on demand via tools |
| Two-Step Compression | Large corpus, summary needed | Call 1 summarises → Call 2 reasons with summary |
| Pre-Summarise at Write Time | Rich entities with stable detail | Generate + store AI summary when entity is saved; reuse forever |
The 10 Anti-Patterns
1. Accessing response instead of response.content
respond() returns Response<T>, not T. Always unwrap .content.
2. Storing LanguageModelSession persistently when you don't need history
For stateless tasks (normalisation, extraction, classification), create a new session per call. History accumulates and eventually overflows the context window.
3. Too many tools Each tool definition consumes ~50–100 tokens whether called or not. Keep to 3–5 per session. Split into multiple focused sessions if you have more.
4. Calling isAvailable / checkAvailability() in the hot path
Availability doesn't change mid-session. Check once at service init; cache the result.
5. High temperature for structured / correction tasks
@Generable correction types need nil or temperature: 0.0–0.2. High temperature produces creatively varied — and wrong — output.
6. Long, elaborate instructions modelled on frontier model prompts The on-device model is ~3B parameters. Instructions over ~200 words dilute signal. Short, explicit rules outperform discursive prose every time.
7. Not testing the fallback path On most devices today, Apple Intelligence is unavailable. Your non-AI path is the primary experience for most users. Test it as thoroughly as the AI path.
8. Using FoundationModels for regex-solvable tasks If the task is a known, fixed pattern (extract a UUID, validate an email, format a date), use a deterministic function. LLM overhead — latency, availability, complexity — is waste.
9. Propagating @available(iOS 26, *) to SwiftUI views
Adding @available to a @State property forces the whole view to require iOS 26. Use the AnyObject? pattern instead.
10. Treating .modelNotReady as permanent
.modelNotReady means the model is downloading. It is transient. Show "not available right now" and retry on next app launch. Do not display a permanent "unsupported" message.
Minimum Viable Service Pattern
The production-safe wrapper — never throws, falls back silently:
@available(iOS 26, *) @MainActor final class TranscriptNormalisationService { private func makeSession() -> LanguageModelSession { LanguageModelSession { """ You are a BJJ transcript corrector. Fix misrecognised terms only. Common corrections: kimora→Kimura, half card→Half Guard, arm bar→armbar. Vocabulary: Kimura, Triangle, Armbar, Half Guard, Full Guard, Mount, Back Control. Return the corrected transcript and the BJJ terms found. """ } } /// Never throws. Returns raw input unchanged on any failure. func normalise(_ rawTranscript: String) async -> NormalisedTranscript { guard !rawTranscript.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty else { return NormalisedTranscript(normalisedText: rawTranscript, extractedTerms: []) } guard SystemLanguageModel.default.isAvailable else { return NormalisedTranscript(normalisedText: rawTranscript, extractedTerms: []) } do { let session = makeSession() let result = try await session.respond( to: Prompt { rawTranscript }, generating: NormalisedTranscript.self ) return result.content } catch { return NormalisedTranscript(normalisedText: rawTranscript, extractedTerms: []) } } }
Availability Cases at a Glance
| Case | Meaning | What to do |
|---|---|---|
.available | Ready to use | Create session, proceed |
.unavailable(.deviceNotEligible) | Hardware doesn't support Apple Intelligence | Show permanent alternative UI; remove AI option |
.unavailable(.appleIntelligenceNotEnabled) | User hasn't enabled it in Settings | Optionally prompt user; respect their choice |
.unavailable(.modelNotReady) | Model weights downloading | Show "not available right now"; retry on next launch |
Full 15-part reference: iOS 26 FoundationModels: Comprehensive Swift/SwiftUI Reference