LLM gateway: capability-based model routing.
@polaris/llm-gateway routes LLM requests to the best available model by matching declared capability requirements against a configured model registry — with PII type enforcement, intent catalog validation, and mandatory audit logging.
Overview
What the gateway does.
Callers declare what they need (capabilities) and what the data contains (PII classification). The gateway resolves which configured model satisfies those requirements, enforces hard constraints defined in the intent catalog, and writes an immutable audit entry for every dispatch. No model names leak into application code.
Application Code
|
| ModelRequest { requires, prefers, tuning, intent, pii }
v
┌─────────────────┐
│ llm-gateway │ 1. validate intent against catalog
│ │ 2. enforce PII constraints
│ routeRequest() │ 3. score models by capability match
│ │ 4. write audit entry
└─────────────────┘
|
| selected ModelConfig + AuditEntry
v
Upstream LLM API (OpenAI-compatible) Canonical reference
All architectural decisions documented here are derived from ADR-020 v2. ADR-020 is the authoritative source — when this page and the ADR conflict, the ADR wins.
Capability vocabulary
19 capabilities across 4 axes.
Every capability is a member of the KnownCapability union type exported from @polaris/llm-gateway. Declare capabilities as requires (must have) or prefers (scored bonus) in a ModelRequest.
import type { KnownCapability } from '@polaris/llm-gateway'
// KnownCapability is a union of all 19 capability strings:
type KnownCapability =
// Modality
| 'text' | 'audioIn' | 'audioOut' | 'vision' | 'computerUse' | 'streaming'
// Reasoning
| 'reasoning' | 'longContext' | 'jsonMode' | 'toolUse' | 'agentic'
// Domain
| 'germanLanguage' | 'medicalGermanLanguage' | 'medicalCoding' | 'multilingual' | 'simplifiedLanguage'
// Deployment
| 'local' | 'lowLatency' | 'batch' Modality axis
| Capability | Description |
|---|---|
| text | Basic text generation; foundation capability required by all text-based intents |
| audioIn | Audio transcription/speech-to-text input processing |
| audioOut | Text-to-speech audio output generation |
| vision | Image and document visual understanding |
| computerUse | Autonomous GUI/computer interaction (PVS navigation) |
| streaming | Streaming token-by-token response delivery |
Reasoning axis
| Capability | Description |
|---|---|
| reasoning | Extended chain-of-thought reasoning for complex clinical decisions |
| longContext | Large context window (100k+ tokens) for long documents |
| jsonMode | Guaranteed JSON-structured output |
| toolUse | Function/tool calling capability |
| agentic | Autonomous multi-step task execution |
Domain axis
| Capability | Description |
|---|---|
| germanLanguage | German language fluency for administrative content |
| medicalGermanLanguage | Medical German: clinical terminology, Arztbriefe, Anamnesedokumentation |
| medicalCoding | ICD-10, EBM, GOÄ, GOZ coding knowledge |
| multilingual | Multi-language support beyond German |
| simplifiedLanguage | Simplified/plain language output (patient-facing) |
Deployment axis
| Capability | Description |
|---|---|
| local | On-premises/self-hosted deployment (no data leaves the network) |
| lowLatency | Optimized for real-time interactive use (<500ms first token) |
| batch | High-throughput batch processing mode |
PII type system
AnonymizedPrompt and RealPrompt.
The gateway enforces PII classification at the type level. You cannot pass a RealPrompt to an intent that declares pii: 'anonymized' — the TypeScript compiler rejects it before the code runs. Both types are opaque: they can only be constructed via their from() factory, ensuring input validation runs on every prompt.
AnonymizedPrompt
import { AnonymizedPrompt } from '@polaris/llm-gateway'
import type { AnonymizedInput } from '@polaris/llm-gateway'
// AnonymizedInput requires anonymizedText —
// no name, no DOB, no insurance number fields
const input: AnonymizedInput = {
anonymizedText: 'Patient, 42J, Rückenschmerzen seit 3 Wochen.',
}
// Opaque prompt — carries a .text property
// but cannot be accidentally created from raw strings
const prompt = AnonymizedPrompt.from(input)
// Use with intents that declare pii: 'anonymized'
// e.g. billing-coding-suggest, scribe-stt-summarize RealPrompt
import { RealPrompt } from '@polaris/llm-gateway'
import type { RealInput } from '@polaris/llm-gateway'
// RealInput allows identified patient data
// Only valid for intents that declare pii: 'real'
// AND models with local or DSGVO-compliant deployment
const input: RealInput = {
text: 'Full clinical note with patient context...',
practitionerJwt: '<bearer-token-from-keycloak>', // Required: Practitioner-JWT (ADR-020 §7)
}
const prompt = RealPrompt.from(input)
// Gateway rejects RealPrompt on anonymized intents —
// type error + runtime enforcement PII modes
Anonymized and real PII modes.
The gateway enforces PII classification per request via the gateway.pii field. anonymized mode (the default) is required for all background agents. real mode requires a Practitioner-JWT and is only available for intents explicitly designed for identified patient context.
| Mode | Auth requirement | Description |
|---|---|---|
| anonymized | Service token (default) | PHI must be removed or tokenized before forwarding to the LLM provider. phi_references enables tokenization. PII patterns must not appear in the post-tokenization prompt. |
| real | Practitioner-JWT required | Patient-identified data is forwarded. Only valid on intents designed for identified patient context. Rejected with HTTP 401 when a service token is used. |
PHI tokenization
In anonymized mode, callers may declare PHI strings via gateway.phi_references. The gateway replaces each declared value with an opaque token (e.g. [Patient-1]) before forwarding, and restores FHIR references in the response. The token-to-PHI mapping never leaves process memory. See the Prompt-Formulation Guide for declaration contract details and worked examples.
Declaration contract
Exhaustive declaration fast-path.
When a caller can enumerate all PHI in the request, it may set gateway.declaration: "exhaustive". The gateway then cross-checks the tokenized prompt against a set of embedded PII regex patterns. If no residual PII is found, the request is forwarded without calling the external pii-detector container — lower latency and resilient to container downtime.
How the cross-check works
- The caller declares all PHI strings via phi_references and sets declaration: "exhaustive".
- The gateway tokenizes declared PHI, replacing names and identifiers with [Patient-N] tokens.
- The tokenized prompt is scanned for residual PII patterns: German DOB (DD.MM.YYYY), KVNR ([A-Z]\d{9}), and full 5-digit PLZ.
- No match: request is forwarded. Match: HTTP 422 caller_declaration_violation is returned.
422 error responses
Re-ID preflight blocked
HTTP 422
{
"error": "Re-ID risk preflight: request blocked
— quasi-ID combination uniquely
identifies patient",
"doc_url": "https://polaris-fhir.de/docs/sdk/llm-gateway/prompt-formulation/"
} Declaration violation
HTTP 422
{
"error": "caller_declaration_violation:
undeclared PII found after tokenization",
"doc_url": "https://polaris-fhir.de/docs/sdk/llm-gateway/prompt-formulation/"
} Prompt-formulation guide
The Prompt-Formulation Guide documents the full declaration contract: quasi-ID generalization conventions (age buckets, ICD categories, PLZ prefix, relative dates), PHI declaration rules, PASS and FAIL examples, cross-check failure modes, and migration from the legacy combination_count API.
Tokenization fast-path
PHI tokenization pipeline.
When pii: 'anonymized' and phi_references are provided, the gateway runs the tokenization pipeline: PHI strings are replaced with opaque tokens before forwarding, and tokens in the LLM response are restored to FHIR references before returning to the caller. The token-to-PHI mapping lives in process memory only — it is never logged, never audited as values, and is dropped after each request.
Caller request Gateway pipeline
──────────────── ─────────────────────────────────────────────
messages: [{ 1. tokenizeMessages(messages, phi_references)
content: → replaces "Erika Müller" with [Patient-1]
"Erika Müller, → replaces "Dr. Schmidt" with [Practitioner-1]
Altersgruppe 51-65,
Hauptdiagnose E11.x" 2. (if declaration: "exhaustive")
}] → cross-check tokenized content for residual PII
DOB / KVNR / PLZ patterns
gateway: { → 422 caller_declaration_violation if match found
phi_references: [
{ resourceType: "Patient", 3. Forward tokenized messages to LLM provider
id: "pvs-patient-12345", phi_references + reid_preflight stripped
values: ["Erika Müller",
"Müller"] }, 4. LLM response contains [Patient-1], [Practitioner-1]
{ resourceType:
"Practitioner", 5. detokenizeText(response, tokenCtx)
id: "pvs-pract-42", → restores tokens to FHIR references
values: ["Dr. Schmidt", "Patient/pvs-patient-12345"
"Schmidt"] } "Practitioner/pvs-pract-42"
],
declaration: "exhaustive" 6. dropContext(tokenCtx) — map cleared from memory
} Security guarantee
The token-to-PHI mapping never touches disk, logs, or the audit trail. Only token counts and resource types are recorded. The mapping is cleared immediately after the response is detokenized.
Token format
[Patient-N], [Practitioner-N], [Encounter-N], [Condition-N]
Token numbers are randomized per request for cross-request unlinkability (ADR-027 §6).
Audit entry
The audit entry records tokenization_audit.token_count and resource_types only — never PHI values or the token mapping.
TuningParams reference
Controlling model behavior.
TuningParams are optional hints attached to a ModelRequest. The gateway translates these into provider-specific parameters (temperature, max_tokens, response_format) before dispatch. Intents in the catalog declare their expected tuning — the gateway validates that caller tuning is compatible.
| Parameter | Type | Description |
|---|---|---|
| effort | 'low' | 'medium' | 'high' | Reasoning depth and cost trade-off. Use high for coding/billing suggestions, low for classification tasks. |
| creativity | 'deterministic' | 'balanced' | 'creative' | Controls temperature. Clinical and billing intents must use deterministic. |
| responseFormat | 'text' | 'json' | 'markdown' | Output format hint. json activates jsonMode capability requirement automatically. |
| streaming | boolean | Enable streaming token delivery. Activates streaming capability requirement. |
| maxTokens | number | Maximum output token count. Omit to use model default. |
Intent catalog
10 intents — 3 full, 7 stub.
The intent catalog is the runtime contract between callers and the gateway. Full intents have complete capability declarations, hard constraints, and tuning. Stub intents are reserved IDs with risk annotations — implementation is in-work. Specifying an unknown intent ID at runtime raises a validation error.
The catalog is loaded from packages/llm-gateway/intent-catalog.yaml via loadCatalogFromFile() or embedded inline via loadCatalogFromYaml(). Use validateIntent() to check an intent ID against the loaded catalog at startup.
Full intents
Suggest ICD-10/EBM/GOZ billing codes from anonymized encounter documentation
Medical device risk: none — coding suggestion, physician decides
Requires
Prefers
Tuning
Hard constraints
- input must be AnonymizedPrompt — no PII permitted
- output is suggestion only — physician must confirm before submission
- must log to audit trail with capabilities_matched and model_used
Transcribe and summarize spoken consultation notes into structured FHIR-compatible documentation
Medical device risk: none — summary goes to approval queue before chart entry
Requires
Prefers
Tuning
Hard constraints
- audio stream must be processed locally or via DSGVO-compliant processor
- output must go to approval queue before any chart write
- no direct FHIR write without physician approval
Translate patient anamnesis from non-German language to German, verbatim without clinical interpretation
Medical device risk: none — verbatim translation, no clinical interpretation
Requires
Prefers
Tuning
Hard constraints
- verbatim translation only — no summarization, no clinical interpretation
- no triage decisions — translator role only
- source language must be identified and logged
Stub intents — reserved, in-work
Classify incoming documents/tasks as administrative vs clinical
Hard constraints
- output restricted to allowlist of administrative categories — no free-text clinical output
From consultation transcript, suggest structured FHIR resources
Hard constraints
- output must go to approval queue — no direct FHIR write
- physician must review and approve each suggested resource
Detect potential billing gaps
MDR: 5a Undercoding — detection only
Hard constraints
- detection only — no automatic billing submission
- must present evidence (documentation snippet) for each detected gap
Flag documentation quality gaps
MDR: 5b QM-Handbuch
Hard constraints
- output is quality flag only — no treatment recommendation
- reference to QM handbook section mandatory per finding
Identify jobs-to-be-done from transcripts
MDR: 5c JTBD
Hard constraints
- anonymized input only
- output is operational insight — not clinical decision support
Flag items in daily administrative protocols
Hard constraints
- output restricted to allowlist of flag types
- no clinical diagnosis in output
Automate PVS UI interactions
MDR: yellow-operational (navigation) / green-medical (data entry only)
Hard constraints
- BLOCKED: dicom-letter-from-text use case is explicitly excluded from this intent
- scope limited to data entry and navigation — no autonomous clinical decision
- requires explicit operator opt-in per session
Code snippets
Three complete examples.
All snippets use real types from @polaris/llm-gateway. Each snippet shows how to set up the gateway with createApp and how to call it. Capability requirements, intent, and PII mode are passed inside the body.gateway field — not as headers.
Example 1 — billing-coding-suggest
Suggest ICD-10/EBM/GOZ billing codes. Input must be anonymized — the AnonymizedPrompt type enforces this at compile time. Output goes to an approval queue before reaching the billing system.
import { createApp, AnonymizedPrompt } from '@polaris/llm-gateway'
import type { AnonymizedInput, ModelConfig, LlmProvider, GatewayConfig } from '@polaris/llm-gateway'
// 1. Configure models (provided by ops/platform team)
const models: ModelConfig[] = [
{
id: 'anthropic/claude-sonnet-4-5',
provider: 'anthropic',
modelName: 'claude-sonnet-4-5',
capabilities: ['text', 'reasoning', 'medicalCoding', 'germanLanguage', 'medicalGermanLanguage', 'jsonMode'],
},
]
// 2. Wire in your LLM provider (LiteLLM, Anthropic SDK, etc.)
const llmProvider: LlmProvider = async (model, request) => {
// Forward to your actual provider implementation
throw new Error('Replace with real provider implementation')
}
// 3. Create the gateway app
const config: GatewayConfig = {
models,
intentCatalogPath: './packages/llm-gateway/intent-catalog.yaml',
}
const app = createApp({ config, llmProvider })
// 4. Build an anonymized prompt (type-safe — no raw strings accepted)
const anonymizedInput: AnonymizedInput = {
anonymizedText: 'Patient klagt über anhaltende Rückenschmerzen seit 3 Wochen, keine Ausstrahlung.',
}
const prompt = AnonymizedPrompt.from(anonymizedInput)
// 5. POST to the chat completions endpoint
const response = await app.fetch(
new Request('http://localhost/v1/chat/completions', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
messages: [{ role: 'user', content: prompt.text }],
gateway: {
requires: ['text', 'medicalCoding', 'germanLanguage'],
prefers: ['reasoning', 'medicalGermanLanguage'],
intent: 'billing-coding-suggest',
pii: 'anonymized',
},
}),
}),
)
const result = await response.json()
// result.choices[0].message.content — JSON billing code suggestions
// Must be routed to physician approval before billing submission Example 2 — scribe-stt-summarize
Transcribe and summarize spoken consultation notes. Audio must be processed locally or via a DSGVO-compliant processor. Output goes to an approval queue before any chart write.
// scribe-stt-summarize: audio transcription + summarization
// Note: audioIn capability requires the llmProvider to handle audio content.
// The prompt text here represents the transcribed audio content.
const app = createApp({
config: {
models: [
{
id: 'anthropic/claude-opus-4-5',
provider: 'anthropic',
modelName: 'claude-opus-4-5',
capabilities: ['text', 'audioIn', 'germanLanguage', 'medicalGermanLanguage', 'streaming'],
},
],
intentCatalogPath: './packages/llm-gateway/intent-catalog.yaml',
},
llmProvider,
})
const transcribedNote = AnonymizedPrompt.from({
anonymizedText: 'Anamnese: Keine Vorerkrankungen. Aktuell Husten seit 5 Tagen, leichtes Fieber 37.8°C.',
})
const response = await app.fetch(
new Request('http://localhost/v1/chat/completions', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
messages: [
{ role: 'system', content: 'Erstelle eine strukturierte Dokumentation aus den folgenden Konsultationsnotizen.' },
{ role: 'user', content: transcribedNote.text },
],
gateway: {
requires: ['text', 'audioIn', 'germanLanguage'],
prefers: ['medicalGermanLanguage'],
intent: 'scribe-stt-summarize',
pii: 'anonymized',
},
}),
}),
)
// response includes X-Approval-Required: true header — must go to approval queue
const approvalRequired = response.headers.get('X-Approval-Required') === 'true' Example 3 — anamnese-translate
Translate patient anamnesis from a non-German source language to German. Verbatim only — no summarization, no clinical interpretation, no triage decisions. Source language must be identified and logged.
// anamnese-translate: translate non-German patient anamnesis to German
const app = createApp({
config: {
models: [
{
id: 'anthropic/claude-sonnet-4-5',
provider: 'anthropic',
modelName: 'claude-sonnet-4-5',
capabilities: ['text', 'multilingual', 'simplifiedLanguage'],
},
],
intentCatalogPath: './packages/llm-gateway/intent-catalog.yaml',
},
llmProvider,
})
const foreignLanguageText = AnonymizedPrompt.from({
anonymizedText: 'The patient reports chest pain that started two days ago, especially when breathing deeply.',
})
const response = await app.fetch(
new Request('http://localhost/v1/chat/completions', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
messages: [
{
role: 'system',
content: 'Übersetze den folgenden Text wörtlich ins Deutsche. Keine Zusammenfassung, keine klinische Interpretation.',
},
{ role: 'user', content: foreignLanguageText.text },
],
gateway: {
requires: ['text', 'multilingual'],
prefers: ['simplifiedLanguage'],
intent: 'anamnese-translate',
pii: 'anonymized',
// No approval queue — pure translation, no clinical decision
},
}),
}),
)
// approvalQueue: false for this intent — result can be used directly
// Gateway audit entry records source language + model used Discovery endpoint
Querying gateway capabilities at runtime.
The gateway exposes a capability discovery endpoint. Call it to determine which capabilities the currently configured model set covers, which model is the default, and what the coverage map looks like for routing decisions.
Request
GET /api/llm/capabilities
Authorization: Bearer <api-key> Response shape
{
"capabilities": [
"text", "reasoning", "germanLanguage",
"medicalGermanLanguage"
// ... all capabilities any configured model covers
],
"coverage": {
"text": ["claude-sonnet-4-5", "claude-opus-4-5"],
"reasoning": ["claude-opus-4-5"],
"germanLanguage": ["claude-sonnet-4-5", "claude-opus-4-5"],
"medicalGermanLanguage": ["claude-opus-4-5"],
"local": []
},
"default": "text"
} Use at startup
Call the discovery endpoint during application initialization to verify that all capabilities required by your intents are covered by the configured model set. Log a warning (or fail hard) if required capabilities are missing — better to detect at startup than at request time.
Public exports
Quick-start exports.
The most-used exports are listed below. See the package's TypeScript types for the full export surface.
Types
import type {
KnownCapability, // 19-member union
Capability, // string (open for custom caps)
TuningParams, // effort / creativity / responseFormat
ModelRequest, // requires + prefers + tuning + intent + pii
AnonymizedInput, // { anonymizedText: string }
RealInput, // { text: string }
ModelConfig, // per-model gateway configuration
GatewayConfig, // top-level gateway config
AuditEntry, // immutable audit log record
AuditStore, // storage interface for audit entries
} from '@polaris/llm-gateway' Values
import {
// Prompt constructors
AnonymizedPrompt, // .from(AnonymizedInput): opaque prompt
RealPrompt, // .from(RealInput): opaque prompt
// Audit stores
NoopAuditStore, // discards all entries (testing)
CapturingAuditStore, // in-memory capture (testing)
// Gateway core
routeRequest, // low-level routing function
scanForPii, // PII detection utility
// Intent catalog
loadCatalogFromFile, // load YAML catalog from disk
loadCatalogFromYaml, // load from YAML string
validateIntent, // validate intent ID against catalog
// App factory
createApp, // creates the gateway Hono app
} from '@polaris/llm-gateway' Error Codes
All error responses from POST /v1/chat/completions use a structured envelope:
{ error: { code, errorClass, message, doc_url, details? } }
| Code | HTTP | Error Class | Description |
|---|---|---|---|
| invalid_json | 400 | RequestParseError | Request body is not valid JSON |
| practitioner_jwt_required | 401 | AuthenticationError | pii: real requires a Practitioner-JWT, not a service token |
| unknown_intent | 400 | IntentValidationError | Intent ID is empty, non-string, or not found in the catalog |
| red_risk_intent | 501 | IntentValidationError | Intent is a known prohibited use case (red-risk denylist) |
| intent_catalog_unavailable | 503 | IntentCatalogError | Intent provided but catalog not loaded — cannot validate |
| no_model_for_capabilities | 503 | CapabilityRoutingError | No configured model satisfies the required capabilities |
| reid_preflight_invalid_input | 400 | ReidPreflightError | Missing or invalid quasi_ids / combination_count fields |
| reid_preflight_blocked | 422 | ReidPreflightError | Quasi-ID combination is too unique — blocked to prevent patient re-identification |
| caller_declaration_violation | 422 | PiiDeclarationError | Undeclared PII detected post-tokenization in declaration: "exhaustive" mode |
| llm_provider_error | 502 | LlmProviderError | Upstream LLM provider returned an error or timed out |
Canonical reference
ADR-020 v2 is the canonical source for all decisions documented here — capability axis taxonomy, intent catalog schema, PII enforcement rules, audit trail requirements, and the discovery endpoint contract. When this page and ADR-020 conflict, the ADR wins.