sdk · llm gateway

LLM gateway: capability-based model routing.

@polaris/llm-gateway routes LLM requests to the best available model by matching declared capability requirements against a configured model registry — with PII type enforcement, intent catalog validation, and mandatory audit logging.

← SDK Hub

Overview

What the gateway does.

Callers declare what they need (capabilities) and what the data contains (PII classification). The gateway resolves which configured model satisfies those requirements, enforces hard constraints defined in the intent catalog, and writes an immutable audit entry for every dispatch. No model names leak into application code.

Application Code
      |
      |  ModelRequest { requires, prefers, tuning, intent, pii }
      v
┌─────────────────┐
│  llm-gateway     │   1. validate intent against catalog
│                  │   2. enforce PII constraints
│  routeRequest()  │   3. score models by capability match
│                  │   4. write audit entry
└─────────────────┘
      |
      |  selected ModelConfig + AuditEntry
      v
Upstream LLM API (OpenAI-compatible)

Canonical reference

All architectural decisions documented here are derived from ADR-020 v2. ADR-020 is the authoritative source — when this page and the ADR conflict, the ADR wins.

Capability vocabulary

19 capabilities across 4 axes.

Every capability is a member of the KnownCapability union type exported from @polaris/llm-gateway. Declare capabilities as requires (must have) or prefers (scored bonus) in a ModelRequest.

import type { KnownCapability } from '@polaris/llm-gateway'

// KnownCapability is a union of all 19 capability strings:
type KnownCapability =
  // Modality
  | 'text' | 'audioIn' | 'audioOut' | 'vision' | 'computerUse' | 'streaming'
  // Reasoning
  | 'reasoning' | 'longContext' | 'jsonMode' | 'toolUse' | 'agentic'
  // Domain
  | 'germanLanguage' | 'medicalGermanLanguage' | 'medicalCoding' | 'multilingual' | 'simplifiedLanguage'
  // Deployment
  | 'local' | 'lowLatency' | 'batch'

Modality axis

Capability	Description
text	Basic text generation; foundation capability required by all text-based intents
audioIn	Audio transcription/speech-to-text input processing
audioOut	Text-to-speech audio output generation
vision	Image and document visual understanding
computerUse	Autonomous GUI/computer interaction (PVS navigation)
streaming	Streaming token-by-token response delivery

Reasoning axis

Capability	Description
reasoning	Extended chain-of-thought reasoning for complex clinical decisions
longContext	Large context window (100k+ tokens) for long documents
jsonMode	Guaranteed JSON-structured output
toolUse	Function/tool calling capability
agentic	Autonomous multi-step task execution

Domain axis

Capability	Description
germanLanguage	German language fluency for administrative content
medicalGermanLanguage	Medical German: clinical terminology, Arztbriefe, Anamnesedokumentation
medicalCoding	ICD-10, EBM, GOÄ, GOZ coding knowledge
multilingual	Multi-language support beyond German
simplifiedLanguage	Simplified/plain language output (patient-facing)

Deployment axis

Capability	Description
local	On-premises/self-hosted deployment (no data leaves the network)
lowLatency	Optimized for real-time interactive use (<500ms first token)
batch	High-throughput batch processing mode

PII type system

AnonymizedPrompt and RealPrompt.

The gateway enforces PII classification at the type level. You cannot pass a RealPrompt to an intent that declares pii: 'anonymized' — the TypeScript compiler rejects it before the code runs. Both types are opaque: they can only be constructed via their from() factory, ensuring input validation runs on every prompt.

AnonymizedPrompt

import { AnonymizedPrompt } from '@polaris/llm-gateway'
import type { AnonymizedInput } from '@polaris/llm-gateway'

// AnonymizedInput requires anonymizedText —
// no name, no DOB, no insurance number fields
const input: AnonymizedInput = {
  anonymizedText: 'Patient, 42J, Rückenschmerzen seit 3 Wochen.',
}

// Opaque prompt — carries a .text property
// but cannot be accidentally created from raw strings
const prompt = AnonymizedPrompt.from(input)

// Use with intents that declare pii: 'anonymized'
// e.g. billing-coding-suggest, scribe-stt-summarize

RealPrompt

import { RealPrompt } from '@polaris/llm-gateway'
import type { RealInput } from '@polaris/llm-gateway'

// RealInput allows identified patient data
// Only valid for intents that declare pii: 'real'
// AND models with local or DSGVO-compliant deployment
const input: RealInput = {
  text: 'Full clinical note with patient context...',
  practitionerJwt: '<bearer-token-from-keycloak>',  // Required: Practitioner-JWT (ADR-020 §7)
}

const prompt = RealPrompt.from(input)

// Gateway rejects RealPrompt on anonymized intents —
// type error + runtime enforcement

PII modes

Anonymized and real PII modes.

The gateway enforces PII classification per request via the gateway.pii field. anonymized mode (the default) is required for all background agents. real mode requires a Practitioner-JWT and is only available for intents explicitly designed for identified patient context.

Mode	Auth requirement	Description
anonymized	Service token (default)	PHI must be removed or tokenized before forwarding to the LLM provider. phi_references enables tokenization. PII patterns must not appear in the post-tokenization prompt.
real	Practitioner-JWT required	Patient-identified data is forwarded. Only valid on intents designed for identified patient context. Rejected with HTTP 401 when a service token is used.

PHI tokenization

In anonymized mode, callers may declare PHI strings via gateway.phi_references. The gateway replaces each declared value with an opaque token (e.g. [Patient-1]) before forwarding, and restores FHIR references in the response. The token-to-PHI mapping never leaves process memory. See the Prompt-Formulation Guide for declaration contract details and worked examples.

Declaration contract

Exhaustive declaration fast-path.

When a caller can enumerate all PHI in the request, it may set gateway.declaration: "exhaustive". The gateway then cross-checks the tokenized prompt against a set of embedded PII regex patterns. If no residual PII is found, the request is forwarded without calling the external pii-detector container — lower latency and resilient to container downtime.

How the cross-check works

The caller declares all PHI strings via phi_references and sets declaration: "exhaustive".
The gateway tokenizes declared PHI, replacing names and identifiers with [Patient-N] tokens.
The tokenized prompt is scanned for residual PII patterns: German DOB (DD.MM.YYYY), KVNR ([A-Z]\d{9}), and full 5-digit PLZ.
No match: request is forwarded. Match: HTTP 422 caller_declaration_violation is returned.

422 error responses

Re-ID preflight blocked

HTTP 422
{
  "error": "Re-ID risk preflight: request blocked
            — quasi-ID combination uniquely
            identifies patient",
  "doc_url": "https://polaris-fhir.de/docs/sdk/llm-gateway/prompt-formulation/" 
}

Declaration violation

HTTP 422
{
  "error": "caller_declaration_violation:
            undeclared PII found after tokenization",
  "doc_url": "https://polaris-fhir.de/docs/sdk/llm-gateway/prompt-formulation/" 
}

Prompt-formulation guide

The Prompt-Formulation Guide documents the full declaration contract: quasi-ID generalization conventions (age buckets, ICD categories, PLZ prefix, relative dates), PHI declaration rules, PASS and FAIL examples, cross-check failure modes, and migration from the legacy combination_count API.

Read the Prompt-Formulation Guide →

Tokenization fast-path

PHI tokenization pipeline.

When pii: 'anonymized' and phi_references are provided, the gateway runs the tokenization pipeline: PHI strings are replaced with opaque tokens before forwarding, and tokens in the LLM response are restored to FHIR references before returning to the caller. The token-to-PHI mapping lives in process memory only — it is never logged, never audited as values, and is dropped after each request.

Caller request                  Gateway pipeline
────────────────                ─────────────────────────────────────────────
messages: [{                   1. tokenizeMessages(messages, phi_references)
  content:                          → replaces "Erika Müller" with [Patient-1]
    "Erika Müller,                   → replaces "Dr. Schmidt" with [Practitioner-1]
     Altersgruppe 51-65,
     Hauptdiagnose E11.x"        2. (if declaration: "exhaustive")
}]                                 → cross-check tokenized content for residual PII
                                     DOB / KVNR / PLZ patterns
gateway: {                         → 422 caller_declaration_violation if match found
  phi_references: [
    { resourceType: "Patient",  3. Forward tokenized messages to LLM provider
      id: "pvs-patient-12345",       phi_references + reid_preflight stripped
      values: ["Erika Müller",
               "Müller"] },     4. LLM response contains [Patient-1], [Practitioner-1]
    { resourceType:
      "Practitioner",            5. detokenizeText(response, tokenCtx)
      id: "pvs-pract-42",            → restores tokens to FHIR references
      values: ["Dr. Schmidt",        "Patient/pvs-patient-12345"
               "Schmidt"] }         "Practitioner/pvs-pract-42"
  ],
  declaration: "exhaustive"     6. dropContext(tokenCtx) — map cleared from memory
}

Security guarantee

The token-to-PHI mapping never touches disk, logs, or the audit trail. Only token counts and resource types are recorded. The mapping is cleared immediately after the response is detokenized.

Token format

[Patient-N], [Practitioner-N], [Encounter-N], [Condition-N]

Token numbers are randomized per request for cross-request unlinkability (ADR-027 §6).

Audit entry

The audit entry records tokenization_audit.token_count and resource_types only — never PHI values or the token mapping.

TuningParams reference

Controlling model behavior.

TuningParams are optional hints attached to a ModelRequest. The gateway translates these into provider-specific parameters (temperature, max_tokens, response_format) before dispatch. Intents in the catalog declare their expected tuning — the gateway validates that caller tuning is compatible.

Parameter	Type	Description
effort	'low' \| 'medium' \| 'high'	Reasoning depth and cost trade-off. Use high for coding/billing suggestions, low for classification tasks.
creativity	'deterministic' \| 'balanced' \| 'creative'	Controls temperature. Clinical and billing intents must use deterministic.
responseFormat	'text' \| 'json' \| 'markdown'	Output format hint. json activates jsonMode capability requirement automatically.
streaming	boolean	Enable streaming token delivery. Activates streaming capability requirement.
maxTokens	number	Maximum output token count. Omit to use model default.

Intent catalog

10 intents — 3 full, 7 stub.

The intent catalog is the runtime contract between callers and the gateway. Full intents have complete capability declarations, hard constraints, and tuning. Stub intents are reserved IDs with risk annotations — implementation is in-work. Specifying an unknown intent ID at runtime raises a validation error.

The catalog is loaded from packages/llm-gateway/intent-catalog.yaml via loadCatalogFromFile() or embedded inline via loadCatalogFromYaml(). Use validateIntent() to check an intent ID against the loaded catalog at startup.

Full intents

billing-coding-suggest full risk: green approval queue

Suggest ICD-10/EBM/GOZ billing codes from anonymized encounter documentation

Medical device risk: none — coding suggestion, physician decides

Requires

textmedicalCodinggermanLanguage

Prefers

reasoningmedicalGermanLanguage

Tuning

effort: highcreativity: deterministicresponseFormat: json

Hard constraints

input must be AnonymizedPrompt — no PII permitted
output is suggestion only — physician must confirm before submission
must log to audit trail with capabilities_matched and model_used

scribe-stt-summarize full risk: green approval queue

Transcribe and summarize spoken consultation notes into structured FHIR-compatible documentation

Medical device risk: none — summary goes to approval queue before chart entry

Requires

audioIntextgermanLanguage

Prefers

medicalGermanLanguagestreaming

Tuning

effort: highcreativity: deterministicresponseFormat: markdown

Hard constraints

audio stream must be processed locally or via DSGVO-compliant processor
output must go to approval queue before any chart write
no direct FHIR write without physician approval

anamnese-translate full risk: green

Translate patient anamnesis from non-German language to German, verbatim without clinical interpretation

Medical device risk: none — verbatim translation, no clinical interpretation

Requires

textmultilingual

Prefers

simplifiedLanguage

Tuning

effort: mediumcreativity: deterministicresponseFormat: text

Hard constraints

verbatim translation only — no summarization, no clinical interpretation
no triage decisions — translator role only
source language must be identified and logged

Stub intents — reserved, in-work

rule-detect-administrative stub risk: green

Classify incoming documents/tasks as administrative vs clinical

Hard constraints

output restricted to allowlist of administrative categories — no free-text clinical output

scribe-fhir-resource-suggest stub risk: green

From consultation transcript, suggest structured FHIR resources

Hard constraints

output must go to approval queue — no direct FHIR write
physician must review and approve each suggested resource

voice-doc-gap-billing stub risk: green

Detect potential billing gaps

MDR: 5a Undercoding — detection only

Hard constraints

detection only — no automatic billing submission
must present evidence (documentation snippet) for each detected gap

voice-doc-gap-quality stub risk: green

Flag documentation quality gaps

MDR: 5b QM-Handbuch

Hard constraints

output is quality flag only — no treatment recommendation
reference to QM handbook section mandatory per finding

voice-doc-gap-jtbd stub risk: green

Identify jobs-to-be-done from transcripts

MDR: 5c JTBD

Hard constraints

anonymized input only
output is operational insight — not clinical decision support

admin-day-protocol-flag stub risk: green

Flag items in daily administrative protocols

Hard constraints

output restricted to allowlist of flag types
no clinical diagnosis in output

pvs-computer-use stub risk: yellow

Automate PVS UI interactions

MDR: yellow-operational (navigation) / green-medical (data entry only)

Hard constraints

BLOCKED: dicom-letter-from-text use case is explicitly excluded from this intent
scope limited to data entry and navigation — no autonomous clinical decision
requires explicit operator opt-in per session

Code snippets

Three complete examples.

All snippets use real types from @polaris/llm-gateway. Each snippet shows how to set up the gateway with createApp and how to call it. Capability requirements, intent, and PII mode are passed inside the body.gateway field — not as headers.

Example 1 — billing-coding-suggest

Suggest ICD-10/EBM/GOZ billing codes. Input must be anonymized — the AnonymizedPrompt type enforces this at compile time. Output goes to an approval queue before reaching the billing system.

import { createApp, AnonymizedPrompt } from '@polaris/llm-gateway'
import type { AnonymizedInput, ModelConfig, LlmProvider, GatewayConfig } from '@polaris/llm-gateway'

// 1. Configure models (provided by ops/platform team)
const models: ModelConfig[] = [
  {
    id: 'anthropic/claude-sonnet-4-5',
    provider: 'anthropic',
    modelName: 'claude-sonnet-4-5',
    capabilities: ['text', 'reasoning', 'medicalCoding', 'germanLanguage', 'medicalGermanLanguage', 'jsonMode'],
  },
]

// 2. Wire in your LLM provider (LiteLLM, Anthropic SDK, etc.)
const llmProvider: LlmProvider = async (model, request) => {
  // Forward to your actual provider implementation
  throw new Error('Replace with real provider implementation')
}

// 3. Create the gateway app
const config: GatewayConfig = {
  models,
  intentCatalogPath: './packages/llm-gateway/intent-catalog.yaml',
}
const app = createApp({ config, llmProvider })

// 4. Build an anonymized prompt (type-safe — no raw strings accepted)
const anonymizedInput: AnonymizedInput = {
  anonymizedText: 'Patient klagt über anhaltende Rückenschmerzen seit 3 Wochen, keine Ausstrahlung.',
}
const prompt = AnonymizedPrompt.from(anonymizedInput)

// 5. POST to the chat completions endpoint
const response = await app.fetch(
  new Request('http://localhost/v1/chat/completions', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      messages: [{ role: 'user', content: prompt.text }],
      gateway: {
        requires: ['text', 'medicalCoding', 'germanLanguage'],
        prefers: ['reasoning', 'medicalGermanLanguage'],
        intent: 'billing-coding-suggest',
        pii: 'anonymized',
      },
    }),
  }),
)
const result = await response.json()
// result.choices[0].message.content — JSON billing code suggestions
// Must be routed to physician approval before billing submission

Example 2 — scribe-stt-summarize

Transcribe and summarize spoken consultation notes. Audio must be processed locally or via a DSGVO-compliant processor. Output goes to an approval queue before any chart write.

// scribe-stt-summarize: audio transcription + summarization
// Note: audioIn capability requires the llmProvider to handle audio content.
// The prompt text here represents the transcribed audio content.

const app = createApp({
  config: {
    models: [
      {
        id: 'anthropic/claude-opus-4-5',
        provider: 'anthropic',
        modelName: 'claude-opus-4-5',
        capabilities: ['text', 'audioIn', 'germanLanguage', 'medicalGermanLanguage', 'streaming'],
      },
    ],
    intentCatalogPath: './packages/llm-gateway/intent-catalog.yaml',
  },
  llmProvider,
})

const transcribedNote = AnonymizedPrompt.from({
  anonymizedText: 'Anamnese: Keine Vorerkrankungen. Aktuell Husten seit 5 Tagen, leichtes Fieber 37.8°C.',
})

const response = await app.fetch(
  new Request('http://localhost/v1/chat/completions', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      messages: [
        { role: 'system', content: 'Erstelle eine strukturierte Dokumentation aus den folgenden Konsultationsnotizen.' },
        { role: 'user', content: transcribedNote.text },
      ],
      gateway: {
        requires: ['text', 'audioIn', 'germanLanguage'],
        prefers: ['medicalGermanLanguage'],
        intent: 'scribe-stt-summarize',
        pii: 'anonymized',
      },
    }),
  }),
)
// response includes X-Approval-Required: true header — must go to approval queue
const approvalRequired = response.headers.get('X-Approval-Required') === 'true'

Example 3 — anamnese-translate

Translate patient anamnesis from a non-German source language to German. Verbatim only — no summarization, no clinical interpretation, no triage decisions. Source language must be identified and logged.

// anamnese-translate: translate non-German patient anamnesis to German
const app = createApp({
  config: {
    models: [
      {
        id: 'anthropic/claude-sonnet-4-5',
        provider: 'anthropic',
        modelName: 'claude-sonnet-4-5',
        capabilities: ['text', 'multilingual', 'simplifiedLanguage'],
      },
    ],
    intentCatalogPath: './packages/llm-gateway/intent-catalog.yaml',
  },
  llmProvider,
})

const foreignLanguageText = AnonymizedPrompt.from({
  anonymizedText: 'The patient reports chest pain that started two days ago, especially when breathing deeply.',
})

const response = await app.fetch(
  new Request('http://localhost/v1/chat/completions', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      messages: [
        {
          role: 'system',
          content: 'Übersetze den folgenden Text wörtlich ins Deutsche. Keine Zusammenfassung, keine klinische Interpretation.',
        },
        { role: 'user', content: foreignLanguageText.text },
      ],
      gateway: {
        requires: ['text', 'multilingual'],
        prefers: ['simplifiedLanguage'],
        intent: 'anamnese-translate',
        pii: 'anonymized',
        // No approval queue — pure translation, no clinical decision
      },
    }),
  }),
)
// approvalQueue: false for this intent — result can be used directly
// Gateway audit entry records source language + model used

Discovery endpoint

Querying gateway capabilities at runtime.

The gateway exposes a capability discovery endpoint. Call it to determine which capabilities the currently configured model set covers, which model is the default, and what the coverage map looks like for routing decisions.

Request

GET /api/llm/capabilities
Authorization: Bearer <api-key>

Response shape

{
  "capabilities": [
    "text", "reasoning", "germanLanguage",
    "medicalGermanLanguage"
    // ... all capabilities any configured model covers
  ],
  "coverage": {
    "text": ["claude-sonnet-4-5", "claude-opus-4-5"],
    "reasoning": ["claude-opus-4-5"],
    "germanLanguage": ["claude-sonnet-4-5", "claude-opus-4-5"],
    "medicalGermanLanguage": ["claude-opus-4-5"],
    "local": []
  },
  "default": "text"
}

Use at startup

Call the discovery endpoint during application initialization to verify that all capabilities required by your intents are covered by the configured model set. Log a warning (or fail hard) if required capabilities are missing — better to detect at startup than at request time.

Public exports

Quick-start exports.

The most-used exports are listed below. See the package's TypeScript types for the full export surface.

Types

import type {
  KnownCapability,    // 19-member union
  Capability,         // string (open for custom caps)
  TuningParams,       // effort / creativity / responseFormat
  ModelRequest,       // requires + prefers + tuning + intent + pii
  AnonymizedInput,    // { anonymizedText: string }
  RealInput,          // { text: string }
  ModelConfig,        // per-model gateway configuration
  GatewayConfig,      // top-level gateway config
  AuditEntry,         // immutable audit log record
  AuditStore,         // storage interface for audit entries
} from '@polaris/llm-gateway'

Values

import {
  // Prompt constructors
  AnonymizedPrompt,       // .from(AnonymizedInput): opaque prompt
  RealPrompt,             // .from(RealInput): opaque prompt

  // Audit stores
  NoopAuditStore,         // discards all entries (testing)
  CapturingAuditStore,    // in-memory capture (testing)

  // Gateway core
  routeRequest,           // low-level routing function
  scanForPii,             // PII detection utility

  // Intent catalog
  loadCatalogFromFile,    // load YAML catalog from disk
  loadCatalogFromYaml,    // load from YAML string
  validateIntent,         // validate intent ID against catalog

  // App factory
  createApp,              // creates the gateway Hono app
} from '@polaris/llm-gateway'

Error Codes

All error responses from POST /v1/chat/completions use a structured envelope:
{ error: { code, errorClass, message, doc_url, details? } }

Code	HTTP	Error Class	Description
invalid_json	400	RequestParseError	Request body is not valid JSON
practitioner_jwt_required	401	AuthenticationError	`pii: real` requires a Practitioner-JWT, not a service token
unknown_intent	400	IntentValidationError	Intent ID is empty, non-string, or not found in the catalog
red_risk_intent	501	IntentValidationError	Intent is a known prohibited use case (red-risk denylist)
intent_catalog_unavailable	503	IntentCatalogError	Intent provided but catalog not loaded — cannot validate
no_model_for_capabilities	503	CapabilityRoutingError	No configured model satisfies the required capabilities
reid_preflight_invalid_input	400	ReidPreflightError	Missing or invalid `quasi_ids` / `combination_count` fields
reid_preflight_blocked	422	ReidPreflightError	Quasi-ID combination is too unique — blocked to prevent patient re-identification
caller_declaration_violation	422	PiiDeclarationError	Undeclared PII detected post-tokenization in `declaration: "exhaustive"` mode
llm_provider_error	502	LlmProviderError	Upstream LLM provider returned an error or timed out

Canonical reference

ADR-020 v2 is the canonical source for all decisions documented here — capability axis taxonomy, intent catalog schema, PII enforcement rules, audit trail requirements, and the discovery endpoint contract. When this page and ADR-020 conflict, the ADR wins.

Read ADR-020 v2 →