F POLARIS · FHIR
sdk · llm gateway · prompt formulation

Prompt-formulation guide.

How to format prompts for the LLM gateway declaration contract. Covers PII modes, quasi-ID conventions, the phi_references declaration contract, worked PASS and FAIL examples, and migration from the legacy combination_count API.

Implementation Status

The declaration: "exhaustive" fast-path is rolling out in an upcoming release. The PHI declaration contract described here reflects the target API; draft your callers against this spec now. The reid_preflight endpoint is fully live.

Trust model

What the gateway guarantees — and what callers must do.

The gateway enforces a Zone-A → Zone-C anonymization boundary (ADR-027 §6). Callers are internal Polaris services acting in good faith; adversarial misuse is addressed at the Keycloak / service-token access-control layer.

Gateway guarantees

  • PHI declared via phi_references is tokenized before leaving the gateway — the mapping lives in process memory only, never on disk or in logs.
  • Quasi-identifiers (birthDate, ICD code, PLZ) are generalized before forwarding: birthDate → age group, ICD → 3-char category, PLZ → 2-digit prefix.
  • Re-identification risk preflight blocks requests where the quasi-ID combination is too unique to be safely anonymized.
  • Every call is audited — caller, model, capabilities, PII mode, tokenization counts, preflight result. Metadata only; never PHI values.

Caller responsibilities

  • When using declaration: "exhaustive", declare ALL PHI in the prompt via phi_references. The gateway cross-checks tokenized text — undeclared PII returns HTTP 422.
  • Write prompts using generalized quasi-ID forms (see Section 3) — the gateway does not re-parse free German text to discover quasi-IDs.
  • Do not put generalized quasi-IDs (e.g. "51-65", "E11.x") in phi_references.values — those are left verbatim in the generalized text and not tokenized.

Declaration modes

Declared vs undeclared mode.

The gateway supports two PHI-handling modes depending on whether the caller can enumerate all PHI in the prompt.

Declared mode — declaration: "exhaustive"

  1. 1.Tokenize phi_references values, replacing PHI strings with tokens.
  2. 2.Apply quasi-ID generalizations in the tokenized text.
  3. 3.Run embedded-regex cross-check on the tokenized prompt.
  4. 4.No match: forward the request. Works even when the pii-detector container is down.
  5. 5.Match: return HTTP 422 caller_declaration_violation.

When to use: Background agents with full structural access to the FHIR resource (billing agents, coding suggestion agents). Lower latency, resilient to pii-detector downtime.

Undeclared mode — no declaration field (target behavior)

  1. 1.Call the external pii-detector container (Presidio + GERNERMED) for entity inventory.
  2. 2.Map detected entities to quasi-ID fields, generalize them in the prompt.
  3. 3.Run the public-population Re-ID preflight against the detected quasi-ID combination.
  4. 4.If pii-detector is unavailable: HTTP 503 fail-closed. No fallback, no silent forwarding.

Current status: pii-detector container integration is pending. In the current production build, undeclared requests are forwarded with output filtering only — input scanning is not yet active. This mode will fail-closed once the pii-detector container lands.

When to use: Unstructured free-text pipelines (scanned Arztbrief, free-text entries). Higher latency, fail-closed on pii-detector downtime, no PHI enumeration required.

Dimension declaration: "exhaustive" No declaration
PHI discovery Caller enumerates Presidio container
Detection engine Embedded regex Presidio + GERNERMED
Presidio down Request proceeds HTTP 503 blocked
Latency Low Higher (Presidio round-trip)
Caller effort High (must declare all PHI) Low

Quasi-ID conventions

Generalized quasi-identifiers.

Quasi-identifiers are indirect attributes that can re-identify a patient when combined. The gateway generalizes them before forwarding. Callers must write prompts using the generalized form, not the raw value.

Age groups

Use: Altersgruppe <range> or Patientin in Altersgruppe <range>

Age Correct form
0 – 17 Altersgruppe 0-17
18 – 30 Altersgruppe 18-30
31 – 50 Altersgruppe 31-50
51 – 65 Altersgruppe 51-65
66 – 80 Altersgruppe 66-80
81+ Altersgruppe 81+

Never include the exact age: "die 64-jährige Patientin" exposes the direct birth year. Use instead: "Patientin in Altersgruppe 51-65".

ICD codes

Use the 3-character ICD category with .x suffix:

Raw code Generalized form
E11.65 Hauptdiagnose E11.x
F32.1 Diagnose F32.x
I10 Diagnose I10.x
J06.9 Hauptdiagnose J06.x

Use Hauptdiagnose for primary, Diagnose for secondary. Never include the full description alongside the code.

Dates

Use relative encoding where a reference event is known:

Encounter +3 Tage nach Anamnese
Diagnose gestellt +7 Tage nach Erstvorstellung
Kontrolle geplant +14 Tage

Never include absolute dates: "15.04.2026" uniquely identifies a visit timeline. If relative encoding is not meaningful, use month+year only: "April 2026".

PLZ (Postleitzahl)

Use the first 2 digits (PLZ-2-prefix):

Raw PLZ Generalized form
90402 PLZ-Region 90
80331 PLZ-Region 80
10115 PLZ-Region 10
20095 PLZ-Region 20

Never include the full PLZ, Ortsname, or Stadtteil name — those can be cross-correlated with age + diagnosis to narrow the population to re-identifiable size.

PHI declaration

Declaring PHI via phi_references.

The phi_references array maps FHIR resource IDs to the PHI strings that appear in the prompt. Every string in values that occurs in any message will be replaced by the token for that resource.

phi_references structure

"gateway": {
  "phi_references": [
    {
      "resourceType": "Patient",
      "id": "pvs-patient-12345",
      "values": [
        "Müller",
        "Erika Müller",
        "Frau Müller",
        "A123456789"   // KVNR, if present
      ]
    },
    {
      "resourceType": "Practitioner",
      "id": "pvs-practitioner-42",
      "values": [
        "Dr. Schmidt",
        "Schmidt",
        "123456789"    // LANR
      ]
    }
  ],
  "declaration": "exhaustive"
}

Rules

  • Include all surface forms the name may appear as (full name, surname only, title + surname).
  • Include identifier strings (KVNR, LANR, BSNR) if they appear in the prompt.
  • Do not include generalized quasi-IDs ("51-65", "E11.x", "PLZ-Region 90") in values — those are left verbatim.
  • Do not pre-substitute tokens ([Patient-1]) into your prompt text — token numbers are randomized per request.
  • Token format: [Patient-N], [Practitioner-N], [Encounter-N], [Condition-N]. Numbers are scoped to the request.

Correct pattern

Write the prompt with real PHI values in the message text. List all PHI strings in phi_references.values. The gateway tokenizes on your behalf — callers never see or control the generated tokens.

What the caller sends

{
  "role": "user",
  "content": "Prüfe den Fall für
    Frau Müller, Altersgruppe
    51-65, Hauptdiagnose E11.x."
}

What the gateway forwards

{
  "role": "user",
  "content": "Prüfe den Fall für
    [Patient-1], Altersgruppe
    51-65, Hauptdiagnose E11.x."
}

PASS and FAIL examples

Worked examples.

PASS examples are shown as the caller sends them (with real PHI values, declared in phi_references), plus the tokenized form the gateway forwards. FAIL examples show the exact pattern that triggers caller_declaration_violation.

pass

Example 1 — Billing coding suggestion

What the caller sends

{
  "messages": [
    {
      "role": "user",
      "content": "Encounter für Erika
        Müller, Altersgruppe 51-65.
        Hauptdiagnose E11.x.
        Behandelnder Arzt: Dr. Schmidt.
        Abgerechnete Ziffern: EBM 03220.
        Prüfe weitere EBM-Ziffern."
    }
  ],
  "gateway": {
    "requires": ["text", "medicalCoding"],
    "pii": "anonymized",
    "phi_references": [
      { "resourceType": "Patient",
        "id": "pvs-patient-12345",
        "values": ["Müller",
                   "Erika Müller"] },
      { "resourceType": "Practitioner",
        "id": "pvs-practitioner-42",
        "values": ["Dr. Schmidt"] }
    ],
    "declaration": "exhaustive"
  }
}

What the gateway forwards to the LLM

"Encounter für [Patient-1],
 Altersgruppe 51-65.
 Hauptdiagnose E11.x.
 Behandelnder Arzt: [Practitioner-1].
 Abgerechnete Ziffern: EBM 03220.
 Prüfe weitere EBM-Ziffern."

Why it passes: All PHI is declared in phi_references and replaced by the gateway. Quasi-IDs (51-65, E11.x) are already in generalized form. No exact dates or full PLZ.

pass

Example 2 — Diagnosis documentation

What the caller sends

{
  "messages": [
    {
      "role": "user",
      "content": "Formuliere eine SOAP-Notiz
        für Tobias Weber (Altersgruppe
        18-30) mit Hauptdiagnose F32.x.
        Encounter +3 Tage nach
        Erstvorstellung."
    }
  ],
  "gateway": {
    "pii": "anonymized",
    "phi_references": [
      { "resourceType": "Patient",
        "id": "pvs-patient-99001",
        "values": ["Weber",
                   "Tobias Weber"] }
    ],
    "declaration": "exhaustive"
  }
}

Gateway forwards

"Formuliere eine SOAP-Notiz für
 [Patient-1] (Altersgruppe 18-30)
 mit Hauptdiagnose F32.x.
 Encounter +3 Tage nach
 Erstvorstellung."

Why it passes: Patient name declared and tokenized. Relative date used. ICD in 3-char+.x form. No PLZ or city mentioned.

fail

Example 3 — Exact birthdate in free text

{
  "messages": [
    {
      "role": "user",
      "content": "Patient geboren am
        15.04.1962, Diagnose E11.65.
        Schlage EBM-Ziffern vor."
    }
  ],
  "gateway": {
    "phi_references": [],
    "declaration": "exhaustive"
  }
}

Why it fails

"15.04.1962" is a German DOB format — matched by the embedded regex \b\d{2}\.\d{2}\.\d{4}\b. The cross-check fires caller_declaration_violation because declaration: "exhaustive" was set but an undeclared quasi-identifier remains.

Fix:

Replace with "Altersgruppe 51-65". Remove the exact birthdate entirely.

fail

Example 4 — KVNR embedded in prompt text

{
  "messages": [
    {
      "role": "user",
      "content": "Versicherter A123456789
        hat Diagnose E11.65.
        Schlage EBM-Ziffern vor."
    }
  ],
  "gateway": {
    "phi_references": [],
    "declaration": "exhaustive"
  }
}

Why it fails

"A123456789" matches the KVNR pattern \b[A-Z]\d{9}\b. The cross-check fires caller_declaration_violation because declaration: "exhaustive" was set but an undeclared identifier remains.

Fix:

Declare the patient in phi_references so tokenization replaces the KVNR before the cross-check runs:

{ "resourceType": "Patient",
  "id": "pvs-patient-<id>",
  "values": ["A123456789"] }

After tokenization: "Versicherter [Patient-1] hat Diagnose E11.65."

Note: The embedded cross-check detects DOB, KVNR, and PLZ (with keyword prefix) via regex only — not surnames or free-text names. Name-based leakage (e.g. "Herr Bauer" without a matching phi_references entry) is caught by the tokenizer when the name is declared, but is silently forwarded if undeclared. Use undeclared mode with the pii-detector container if name detection is required.

Cross-check failure modes

Debugging caller_declaration_violation.

When the embedded-regex cross-check finds a leftover PII pattern in the tokenized prompt, the gateway returns HTTP 422 with caller_declaration_violation and a doc_url pointing to this page.

Cause What you see Fix
Patient name not in phi_references caller_declaration_violation Add all name surface forms to phi_references.values
Exact date in message text caller_declaration_violation Replace with relative date or month+year
Full PLZ in message text caller_declaration_violation Replace with PLZ-2-prefix (PLZ-Region XX)
KVNR/LANR/BSNR in free text, not declared caller_declaration_violation Add identifier string to phi_references.values

Debugging workflow

  1. 1. Check the audit entry in polaris.llm_audit_events for pii_scan_triggered = true.
  2. 2. Review your prompt text for embedded-regex target patterns: German DOB (DD.MM.YYYY), KVNR (single uppercase letter + 9 digits), full 5-digit PLZ.
  3. 3. Ensure every name that appears in the prompt is listed in phi_references.values.
  4. 4. Ensure every identifier (KVNR, LANR) that appears in the prompt is listed in phi_references.values.
  5. 5. Replace any absolute date with a relative encoding or month+year.
  6. 6. Replace any full PLZ or city name with the PLZ-2-prefix form.
  7. 7. Resubmit. Multi-turn conversations require re-declaring PHI on each turn (ADR-027 §6: no implicit cross-turn mapping retention).

Migration guide

Migration from the legacy combination_count API.

The legacy API required a caller-supplied combination_count and practice_size inside gateway.reid_preflight. The current API still accepts reid_preflight.quasi_ids and reid_preflight.combination_count, but the preferred path is to use declaration: "exhaustive" with phi_references instead, which removes the need to supply a caller-computed count.

Before — legacy m1gc API

{
  "messages": [{
    "role": "user",
    "content": "Patient Müller,
      64 Jahre, E11.65 ..."
  }],
  "gateway": {
    "pii": "anonymized",
    "reid_preflight": {
      "quasi_ids": {
        "age": "64",
        "icd": "E11.65"
      },
      "combination_count": 12,
      "practice_size": 1800
    }
  }
}

After — declaration contract

{
  "messages": [{
    "role": "user",
    "content": "Müller, Altersgruppe
      51-65, Hauptdiagnose E11.x ..."
  }],
  "gateway": {
    "pii": "anonymized",
    "phi_references": [
      { "resourceType": "Patient",
        "id": "pvs-patient-...",
        "values": ["Müller",
                   "Patient Müller"] }
    ],
    "declaration": "exhaustive"
  }
}

Migration steps

  1. 1. Remove gateway.reid_preflight.combination_count (future) — Currently still required by the gateway. A future release will make it optional when declaration: "exhaustive" is used instead.
  2. 2. Remove gateway.reid_preflight.practice_size — Optional field — safe to remove; gateway uses its own default.
  3. 3. Generalize quasi-IDs inline in the prompt — Replace exact age/PLZ/ICD values with buckets directly in the prompt text (e.g. "Altersgruppe 51-65", "Hauptdiagnose E11.x"). No gateway.quasi_ids field exists — generalization is done by the caller.
  4. 4. Add gateway.declaration: "exhaustive" — If your caller can enumerate all PHI. Otherwise omit and the gateway will call Presidio for entity discovery.
  5. 5. Move declared PHI into phi_references — From wherever it was inlined in the prompt text. Tokenization replaces the values before forwarding.

Canonical references

ADR-027 §6 is the normative specification for the tokenization mechanism and quasi-ID generalization rules. ADR-020 v2 is the authoritative source for the gateway API, capability vocabulary, authentication, and audit trail schema.