# Fixing domain-specific speech recognition with FoundationModels Speech-to-text fails predictably on niche vocabulary. On-device AI makes a clean fix — and the extractedTerms output is more useful than it first appears. category: Engineering date: 2026-03-02 reading-time: 8 min read excerpt: Any app with voice input and niche vocabulary has this problem. Here's a clean pattern using Apple's on-device FoundationModels to silently correct domain terms before they reach your data layer — and how to pipe the extracted terms into entity matching downstream. --- ## The problem Speech-to-text is impressively good — until you step outside everyday vocabulary. If your app lives in a niche domain, you've probably already seen it. Medical apps mishear procedure names. Climbing apps mishear route grades. Legal apps mishear case terminology. The speech model simply hasn't seen these words enough to transcribe them reliably, and the transcription errors are consistent — the same wrong words come out every time. In [Grapla](https://grapla.app), a Brazilian Jiu-Jitsu training app, the vocabulary is dense with borrowed Portuguese, compound phrases, and proper nouns that the speech model mangles predictably: "kimura" becomes "kimora", "half guard" becomes "half card", "omoplata" becomes "omma plata". Stored verbatim, these transcripts break search, entity extraction, and anything downstream that expects canonical terms. The naive fix is a substitution dictionary. Map "kimora" to "Kimura", "half card" to "Half Guard". This works until your dictionary has 200 entries and still misses half the variants the speech model produces. A better fix: run the raw transcript through the on-device language model and let it understand and correct domain terminology in context. --- ## The pipeline Three stages, left to right: ``` mic → SpeechAnalyzer + vocabulary hints → raw transcript ("worked my kimora from half card") → LanguageModelSession + system prompt with corrections → NormalisedTranscript { normalisedText, extractedTerms } → entity matching / storage ``` The middle stage is the interesting one. The system prompt is short — under 200 words, as recommended for on-device models — and contains two things: explicit correction pairs for the most common misrecognitions, and a vocabulary list of canonical domain terms. Everything else is inferred by the model. The structured output carries two fields. Most examples focus on `normalisedText`. Don't discard `extractedTerms` — it's what makes the downstream pipeline precise. --- ## Four patterns worth keeping ### 1. `@Generable` for structured output The output of a correction task isn't free-form prose — it's a fixed shape. `@Generable` makes this a compile-time guarantee rather than a parsing problem: ```swift @available(iOS 26, *) @Generable(description: "A normalised training transcript with corrected terminology") struct NormalisedTranscript: Sendable, Equatable { @Guide(description: "Full transcript with domain terms corrected and properly cased") var normalisedText: String @Guide(description: "Domain terms found in the transcript, each in canonical form") var extractedTerms: [String] } ``` The session then generates exactly this shape — no JSON parsing, no regex on the response, no prompt engineering for output format: ```swift let session = LanguageModelSession(instructions: systemPrompt) let response = try await session.respond( to: "Correct this transcript:\n\n\(rawText)", generating: NormalisedTranscript.self ) let corrected = response.content.normalisedText let terms = response.content.extractedTerms // ["Kimura", "Half Guard"] ``` `extractedTerms` is produced as a side effect of the correction task. Don't throw it away — the next section explains why. --- ### 2. Design the degraded path first Apple Intelligence isn't available on most devices today. The model may be downloading, disabled in Settings, or the hardware simply doesn't support it. A correction service that throws on unavailability pushes error handling into every caller. The better design: `normalise()` always returns something usable. On unavailability, it returns the raw transcript unchanged. Callers never handle errors — they always get a result back, either corrected or not. ```swift func normalise(_ rawTranscript: String, entityNames: [String] = []) async -> NormalisedTranscript { let trimmed = rawTranscript.trimmingCharacters(in: .whitespacesAndNewlines) guard !trimmed.isEmpty else { return NormalisedTranscript(normalisedText: "", extractedTerms: []) } guard case .available = checkAvailability() else { // Raw transcript returned unchanged — callers are unaffected return NormalisedTranscript(normalisedText: trimmed, extractedTerms: []) } do { let session = LanguageModelSession(instructions: Self.sessionInstructions(entityNames: entityNames)) let result = try await session.respond(to: trimmed, generating: NormalisedTranscript.self) return result.content } catch { return NormalisedTranscript(normalisedText: trimmed, extractedTerms: []) } } ``` The uncorrected path is the primary experience for most users right now. Worth testing it as carefully as the AI path. --- ### 3. Vocabulary hints at two layers Domain correction can happen at two points in this pipeline — and using both makes them complementary. **At the speech recogniser:** iOS 26's `SpeechAnalyzer` accepts `contextualStrings`, a list of terms to bias the STT model toward. Passing your domain vocabulary here makes the raw transcript cleaner before it reaches the language model. **At the language model:** The system prompt includes the same vocabulary list, plus explicit correction pairs for the most common failures: ``` Known terms: [your domain vocabulary list] Common corrections: kimora→Kimura, half card→Half Guard, arm bar→Armbar... ``` Two cheap injections. The first pass reduces the noise; the second corrects what slipped through. Neither is expensive — both are strings you already have. --- ### 4. Entity injection: precision over recall The `extractedTerms` from Pattern 1 are useful but imprecise by default. The model knows the domain vocabulary and might return "Kimura", "Armbar", or "Half Guard" — but "Kimura" in your database has a specific UUID and canonical casing you care about. The fix is to inject your known entity names into the prompt and instruct the model to constrain its output to exact matches: ```swift private static func sessionInstructions(entityNames: [String] = []) -> String { var base = """ You are a BJJ transcript corrector. Fix misrecognised terms... Known BJJ terms: \(BJJVocabularyHints.all.joined(separator: ", ")) """ if !entityNames.isEmpty { base += """ Entity extraction: in extractedTerms, return ONLY names that exactly \ match this list (preserve exact capitalisation): \ \(entityNames.joined(separator: ", ")). \ Return empty extractedTerms if no matches found. """ } return base } ``` Then on the coordinator, set the entity names before recording: ```swift normalisationCoordinator.entityNames = positions.map(\.name) + submissions.map(\.name) + people.map(\.name) ``` When the transcript is normalised, `extractedTerms` comes back as a filtered list — only names that are in your database, with exact capitalisation preserved. This turns extraction from a fuzzy matching problem into a direct lookup. **The tradeoff:** Higher precision, lower recall. If an entity isn't in the injected list, it won't appear in `extractedTerms`. For entity contexts where you've fetched the full set, this is fine. For open-ended contexts where you want the model to surprise you, omit the list and fall back to Levenshtein matching downstream. **Using extractedTerms downstream:** ```swift let analysisText = normalised.extractedTerms.isEmpty ? normalised.normalisedText // fallback: full text, fuzzy match : normalised.extractedTerms.joined(separator: " ") // precise: known entities only let matches = entityAnalyzer.analyze(notes: analysisText, inputs: analysisInputs) ``` When the model is unavailable, `extractedTerms` will be empty and the fallback path kicks in automatically — Levenshtein matching over the full transcript still finds most entities. --- ## The `AnyObject?` availability gating pattern One practical iOS 26 concern: adding `@State private var session: LanguageModelSession?` to a view forces `@available(iOS 26, *)` onto the whole view struct. A service class stored as `AnyObject?` and cast inside `#available` blocks avoids this. The coordinator also needs a version-agnostic result type — `NormalisedTranscript` is `@Generable` and thus iOS 26-only. Define a plain mirror struct that carries the same fields without the availability constraint: ```swift // No @available — works on all iOS versions struct NormalisedTranscriptResult { let normalisedText: String let extractedTerms: [String] } @Observable @MainActor final class VoiceNormalisationCoordinator { private var service: AnyObject? // TranscriptNormalisationService on iOS 26+ var entityNames: [String] = [] // injected before recording func setup() { if #available(iOS 26, *) { service = TranscriptNormalisationService() } } func normalise(_ rawText: String) async -> NormalisedTranscriptResult { guard #available(iOS 26, *), let s = service as? TranscriptNormalisationService else { return NormalisedTranscriptResult(normalisedText: rawText, extractedTerms: []) } let result = await s.normalise(rawText, entityNames: entityNames) return NormalisedTranscriptResult( normalisedText: result.normalisedText, extractedTerms: result.extractedTerms ) } } ``` Two things to notice: the coordinator now returns the full result (not just `normalisedText`), and `entityNames` is set externally before recording starts. The gating and availability handling are contained in one place. Callers work with a plain struct that compiles on any iOS version. --- ## When it's worth it This pattern adds an on-device LLM call and a soft device requirement. It's worth it when: - The vocabulary is large enough that a substitution dictionary becomes unmanageable - Transcripts feed a downstream pipeline where bad terms cause actual failures - You have a database of known entities and want precise extraction, not just correction - Privacy is a concern — on-device means nothing leaves the phone If you have a small fixed set of known corrections, a dictionary is simpler, faster, and works everywhere. Use the minimum viable tool. --- Full FoundationModels API detail — `@Generable`, sessions, token budgets, streaming, availability cases — is in the [iOS 26 FoundationModels reference](/writing/foundation-models-reference).