Prompt-formulation guide.
How to format prompts for the LLM gateway declaration contract. Covers PII modes, quasi-ID conventions, the phi_references declaration contract, worked PASS and FAIL examples, and migration from the legacy combination_count API.
Implementation Status
The declaration: "exhaustive" fast-path is rolling out in an upcoming release.
The PHI declaration contract described here reflects the target API; draft your callers against
this spec now. The reid_preflight endpoint is fully live.
Trust model
What the gateway guarantees — and what callers must do.
The gateway enforces a Zone-A → Zone-C anonymization boundary (ADR-027 §6). Callers are internal Polaris services acting in good faith; adversarial misuse is addressed at the Keycloak / service-token access-control layer.
Gateway guarantees
- PHI declared via phi_references is tokenized before leaving the gateway — the mapping lives in process memory only, never on disk or in logs.
- Quasi-identifiers (birthDate, ICD code, PLZ) are generalized before forwarding: birthDate → age group, ICD → 3-char category, PLZ → 2-digit prefix.
- Re-identification risk preflight blocks requests where the quasi-ID combination is too unique to be safely anonymized.
- Every call is audited — caller, model, capabilities, PII mode, tokenization counts, preflight result. Metadata only; never PHI values.
Caller responsibilities
- When using declaration: "exhaustive", declare ALL PHI in the prompt via phi_references. The gateway cross-checks tokenized text — undeclared PII returns HTTP 422.
- Write prompts using generalized quasi-ID forms (see Section 3) — the gateway does not re-parse free German text to discover quasi-IDs.
- Do not put generalized quasi-IDs (e.g. "51-65", "E11.x") in phi_references.values — those are left verbatim in the generalized text and not tokenized.
Declaration modes
Declared vs undeclared mode.
The gateway supports two PHI-handling modes depending on whether the caller can enumerate all PHI in the prompt.
Declared mode — declaration: "exhaustive"
- 1.Tokenize phi_references values, replacing PHI strings with tokens.
- 2.Apply quasi-ID generalizations in the tokenized text.
- 3.Run embedded-regex cross-check on the tokenized prompt.
- 4.No match: forward the request. Works even when the pii-detector container is down.
- 5.Match: return HTTP 422 caller_declaration_violation.
When to use: Background agents with full structural access to the FHIR resource (billing agents, coding suggestion agents). Lower latency, resilient to pii-detector downtime.
Undeclared mode — no declaration field (target behavior)
- 1.Call the external pii-detector container (Presidio + GERNERMED) for entity inventory.
- 2.Map detected entities to quasi-ID fields, generalize them in the prompt.
- 3.Run the public-population Re-ID preflight against the detected quasi-ID combination.
- 4.If pii-detector is unavailable: HTTP 503 fail-closed. No fallback, no silent forwarding.
Current status: pii-detector container integration is pending. In the current production build, undeclared requests are forwarded with output filtering only — input scanning is not yet active. This mode will fail-closed once the pii-detector container lands.
When to use: Unstructured free-text pipelines (scanned Arztbrief, free-text entries). Higher latency, fail-closed on pii-detector downtime, no PHI enumeration required.
| Dimension | declaration: "exhaustive" | No declaration |
|---|---|---|
| PHI discovery | Caller enumerates | Presidio container |
| Detection engine | Embedded regex | Presidio + GERNERMED |
| Presidio down | Request proceeds | HTTP 503 blocked |
| Latency | Low | Higher (Presidio round-trip) |
| Caller effort | High (must declare all PHI) | Low |
Quasi-ID conventions
Generalized quasi-identifiers.
Quasi-identifiers are indirect attributes that can re-identify a patient when combined. The gateway generalizes them before forwarding. Callers must write prompts using the generalized form, not the raw value.
Age groups
Use: Altersgruppe <range> or Patientin in Altersgruppe <range>
| Age | Correct form |
|---|---|
| 0 – 17 | Altersgruppe 0-17 |
| 18 – 30 | Altersgruppe 18-30 |
| 31 – 50 | Altersgruppe 31-50 |
| 51 – 65 | Altersgruppe 51-65 |
| 66 – 80 | Altersgruppe 66-80 |
| 81+ | Altersgruppe 81+ |
Never include the exact age: "die 64-jährige Patientin" exposes the direct birth year. Use instead: "Patientin in Altersgruppe 51-65".
ICD codes
Use the 3-character ICD category with .x suffix:
| Raw code | Generalized form |
|---|---|
| E11.65 | Hauptdiagnose E11.x |
| F32.1 | Diagnose F32.x |
| I10 | Diagnose I10.x |
| J06.9 | Hauptdiagnose J06.x |
Use Hauptdiagnose for primary, Diagnose for secondary. Never include the full description alongside the code.
Dates
Use relative encoding where a reference event is known:
Encounter +3 Tage nach Anamnese
Diagnose gestellt +7 Tage nach Erstvorstellung
Kontrolle geplant +14 Tage Never include absolute dates: "15.04.2026" uniquely identifies a visit timeline. If relative encoding is not meaningful, use month+year only: "April 2026".
PLZ (Postleitzahl)
Use the first 2 digits (PLZ-2-prefix):
| Raw PLZ | Generalized form |
|---|---|
| 90402 | PLZ-Region 90 |
| 80331 | PLZ-Region 80 |
| 10115 | PLZ-Region 10 |
| 20095 | PLZ-Region 20 |
Never include the full PLZ, Ortsname, or Stadtteil name — those can be cross-correlated with age + diagnosis to narrow the population to re-identifiable size.
PHI declaration
Declaring PHI via phi_references.
The phi_references array maps FHIR resource IDs to the PHI strings that appear in the prompt. Every string in values that occurs in any message will be replaced by the token for that resource.
phi_references structure
"gateway": {
"phi_references": [
{
"resourceType": "Patient",
"id": "pvs-patient-12345",
"values": [
"Müller",
"Erika Müller",
"Frau Müller",
"A123456789" // KVNR, if present
]
},
{
"resourceType": "Practitioner",
"id": "pvs-practitioner-42",
"values": [
"Dr. Schmidt",
"Schmidt",
"123456789" // LANR
]
}
],
"declaration": "exhaustive"
} Rules
- Include all surface forms the name may appear as (full name, surname only, title + surname).
- Include identifier strings (KVNR, LANR, BSNR) if they appear in the prompt.
- Do not include generalized quasi-IDs ("51-65", "E11.x", "PLZ-Region 90") in values — those are left verbatim.
- Do not pre-substitute tokens ([Patient-1]) into your prompt text — token numbers are randomized per request.
- Token format: [Patient-N], [Practitioner-N], [Encounter-N], [Condition-N]. Numbers are scoped to the request.
Correct pattern
Write the prompt with real PHI values in the message text. List all PHI strings in phi_references.values. The gateway tokenizes on your behalf — callers never see or control the generated tokens.
What the caller sends
{
"role": "user",
"content": "Prüfe den Fall für
Frau Müller, Altersgruppe
51-65, Hauptdiagnose E11.x."
} What the gateway forwards
{
"role": "user",
"content": "Prüfe den Fall für
[Patient-1], Altersgruppe
51-65, Hauptdiagnose E11.x."
} PASS and FAIL examples
Worked examples.
PASS examples are shown as the caller sends them (with real PHI values, declared in phi_references), plus the tokenized form the gateway forwards. FAIL examples show the exact pattern that triggers caller_declaration_violation.
Example 1 — Billing coding suggestion
What the caller sends
{
"messages": [
{
"role": "user",
"content": "Encounter für Erika
Müller, Altersgruppe 51-65.
Hauptdiagnose E11.x.
Behandelnder Arzt: Dr. Schmidt.
Abgerechnete Ziffern: EBM 03220.
Prüfe weitere EBM-Ziffern."
}
],
"gateway": {
"requires": ["text", "medicalCoding"],
"pii": "anonymized",
"phi_references": [
{ "resourceType": "Patient",
"id": "pvs-patient-12345",
"values": ["Müller",
"Erika Müller"] },
{ "resourceType": "Practitioner",
"id": "pvs-practitioner-42",
"values": ["Dr. Schmidt"] }
],
"declaration": "exhaustive"
}
} What the gateway forwards to the LLM
"Encounter für [Patient-1],
Altersgruppe 51-65.
Hauptdiagnose E11.x.
Behandelnder Arzt: [Practitioner-1].
Abgerechnete Ziffern: EBM 03220.
Prüfe weitere EBM-Ziffern." Why it passes: All PHI is declared in phi_references and replaced by the gateway. Quasi-IDs (51-65, E11.x) are already in generalized form. No exact dates or full PLZ.
Example 2 — Diagnosis documentation
What the caller sends
{
"messages": [
{
"role": "user",
"content": "Formuliere eine SOAP-Notiz
für Tobias Weber (Altersgruppe
18-30) mit Hauptdiagnose F32.x.
Encounter +3 Tage nach
Erstvorstellung."
}
],
"gateway": {
"pii": "anonymized",
"phi_references": [
{ "resourceType": "Patient",
"id": "pvs-patient-99001",
"values": ["Weber",
"Tobias Weber"] }
],
"declaration": "exhaustive"
}
} Gateway forwards
"Formuliere eine SOAP-Notiz für
[Patient-1] (Altersgruppe 18-30)
mit Hauptdiagnose F32.x.
Encounter +3 Tage nach
Erstvorstellung." Why it passes: Patient name declared and tokenized. Relative date used. ICD in 3-char+.x form. No PLZ or city mentioned.
Example 3 — Exact birthdate in free text
{
"messages": [
{
"role": "user",
"content": "Patient geboren am
15.04.1962, Diagnose E11.65.
Schlage EBM-Ziffern vor."
}
],
"gateway": {
"phi_references": [],
"declaration": "exhaustive"
}
} Why it fails
"15.04.1962" is a German DOB format — matched by the embedded regex \b\d{2}\.\d{2}\.\d{4}\b. The cross-check fires caller_declaration_violation because declaration: "exhaustive" was set but an undeclared quasi-identifier remains.
Fix:
Replace with "Altersgruppe 51-65". Remove the exact birthdate entirely.
Example 4 — KVNR embedded in prompt text
{
"messages": [
{
"role": "user",
"content": "Versicherter A123456789
hat Diagnose E11.65.
Schlage EBM-Ziffern vor."
}
],
"gateway": {
"phi_references": [],
"declaration": "exhaustive"
}
} Why it fails
"A123456789" matches the KVNR pattern \b[A-Z]\d{9}\b. The cross-check fires caller_declaration_violation because declaration: "exhaustive" was set but an undeclared identifier remains.
Fix:
Declare the patient in phi_references so tokenization replaces the KVNR before the cross-check runs:
{ "resourceType": "Patient",
"id": "pvs-patient-<id>",
"values": ["A123456789"] } After tokenization: "Versicherter [Patient-1] hat Diagnose E11.65."
Note: The embedded cross-check detects DOB, KVNR, and PLZ (with keyword prefix) via regex only — not surnames or free-text names. Name-based leakage (e.g. "Herr Bauer" without a matching phi_references entry) is caught by the tokenizer when the name is declared, but is silently forwarded if undeclared. Use undeclared mode with the pii-detector container if name detection is required.
Cross-check failure modes
Debugging caller_declaration_violation.
When the embedded-regex cross-check finds a leftover PII pattern in the tokenized prompt, the gateway returns HTTP 422 with caller_declaration_violation and a doc_url pointing to this page.
| Cause | What you see | Fix |
|---|---|---|
| Patient name not in phi_references | caller_declaration_violation | Add all name surface forms to phi_references.values |
| Exact date in message text | caller_declaration_violation | Replace with relative date or month+year |
| Full PLZ in message text | caller_declaration_violation | Replace with PLZ-2-prefix (PLZ-Region XX) |
| KVNR/LANR/BSNR in free text, not declared | caller_declaration_violation | Add identifier string to phi_references.values |
Debugging workflow
- 1. Check the audit entry in polaris.llm_audit_events for pii_scan_triggered = true.
- 2. Review your prompt text for embedded-regex target patterns: German DOB (DD.MM.YYYY), KVNR (single uppercase letter + 9 digits), full 5-digit PLZ.
- 3. Ensure every name that appears in the prompt is listed in phi_references.values.
- 4. Ensure every identifier (KVNR, LANR) that appears in the prompt is listed in phi_references.values.
- 5. Replace any absolute date with a relative encoding or month+year.
- 6. Replace any full PLZ or city name with the PLZ-2-prefix form.
- 7. Resubmit. Multi-turn conversations require re-declaring PHI on each turn (ADR-027 §6: no implicit cross-turn mapping retention).
Migration guide
Migration from the legacy combination_count API.
The legacy API required a caller-supplied combination_count and practice_size inside gateway.reid_preflight. The current API still accepts reid_preflight.quasi_ids and reid_preflight.combination_count, but the preferred path is to use declaration: "exhaustive" with phi_references instead, which removes the need to supply a caller-computed count.
Before — legacy m1gc API
{
"messages": [{
"role": "user",
"content": "Patient Müller,
64 Jahre, E11.65 ..."
}],
"gateway": {
"pii": "anonymized",
"reid_preflight": {
"quasi_ids": {
"age": "64",
"icd": "E11.65"
},
"combination_count": 12,
"practice_size": 1800
}
}
} After — declaration contract
{
"messages": [{
"role": "user",
"content": "Müller, Altersgruppe
51-65, Hauptdiagnose E11.x ..."
}],
"gateway": {
"pii": "anonymized",
"phi_references": [
{ "resourceType": "Patient",
"id": "pvs-patient-...",
"values": ["Müller",
"Patient Müller"] }
],
"declaration": "exhaustive"
}
} Migration steps
- 1. Remove gateway.reid_preflight.combination_count (future) — Currently still required by the gateway. A future release will make it optional when declaration: "exhaustive" is used instead.
- 2. Remove gateway.reid_preflight.practice_size — Optional field — safe to remove; gateway uses its own default.
- 3. Generalize quasi-IDs inline in the prompt — Replace exact age/PLZ/ICD values with buckets directly in the prompt text (e.g. "Altersgruppe 51-65", "Hauptdiagnose E11.x"). No gateway.quasi_ids field exists — generalization is done by the caller.
- 4. Add gateway.declaration: "exhaustive" — If your caller can enumerate all PHI. Otherwise omit and the gateway will call Presidio for entity discovery.
- 5. Move declared PHI into phi_references — From wherever it was inlined in the prompt text. Tokenization replaces the values before forwarding.
Canonical references
ADR-027 §6 is the normative specification for the tokenization mechanism and quasi-ID generalization rules. ADR-020 v2 is the authoritative source for the gateway API, capability vocabulary, authentication, and audit trail schema.