De-identification IG catalogue.
The three Library manifests from io.cognovis.de-identification.de@0.11.0 that drive AnonymizingTransport — PII fields, free-text scrubbing patterns, and quasi-identifier k-floor thresholds. All tables are machine-generated from the IG source at build time.
Overview
Three manifests, three de-identification concerns.
The de-identification IG ships three Library resources, each covering a distinct de-identification concern. The TypeScript generated files in packages/fhir-de/src/client/generated/ are the single source of truth — they are machine-generated from this IG and must not be edited by hand.
Library 1
Library-pii-fields-manifest
Structured PII fields per FHIR resource type. These fields are deleted from responses by AnonymizingTransport (transformation 2).
generated/pii-fields.ts
Library 2
Library-scrub-patterns
Free-text field paths and regex patterns for scrubbing PII tokens from clinical narrative text (transformation 5).
generated/free-text-patterns.ts
Library 3
Library-quasi-id-k-floors-manifest
Minimum k-anonymity floor thresholds for quasi-identifier fields. Used by analytics pipelines to enforce k-anonymity before aggregation.
generated/quasi-id-k-floors.ts
Library-pii-fields-manifest
PII_FIELDS — structured direct identifiers.
Direct identifiers: fields that unambiguously identify an individual without aggregation. These fields are deleted from FHIR resources by filterPiiFromResource(), which is called by AnonymizingTransport as transformation 2. Array sub-paths like dosageInstruction[].text remove only the named sub-field, preserving the rest of the array element.
// Source file: packages/fhir-de/src/client/generated/pii-fields.ts
// Source IG: io.cognovis.de-identification.de@0.11.0
import { PII_FIELDS, filterPiiFromResource } from '@polaris/sdk/fhir' | Resource type | Removed field paths | Fields |
|---|---|---|
| Account | name, owner | 2 |
| CarePlan | author | 1 |
| ChargeItem | enterer, performingOrganization | 2 |
| Coverage | identifier, payor | 2 |
| DocumentReference | content[].attachment.url | 1 |
| ExplanationOfBenefit | disposition | 1 |
| Location | address, description, name, position | 4 |
| MedicationDispense | dosageInstruction[].text | 1 |
| MedicationRequest | dosageInstruction[].text | 1 |
| Patient | address, identifier, name, telecom | 4 |
| Practitioner | address, birthDate, name, telecom | 4 |
| Provenance | agent | 1 |
| Specimen | processing[].description | 1 |
Using filterPiiFromResource directly
import { filterPiiFromResource, PII_FIELDS } from '@polaris/sdk/fhir'
// Filter a single resource without AnonymizingTransport
const cleaned = filterPiiFromResource(
{ resourceType: 'Patient', id: 'p-1', name: [...], gender: 'male' },
'Patient'
)
// cleaned.name === undefined
// cleaned.gender === 'male' (preserved)
// Inspect the catalogue
console.log(PII_FIELDS.Patient)
// ['address', 'identifier', 'name', 'telecom'] Array sub-path semantics
// 'dosageInstruction[].text' means:
// for each element in dosageInstruction[], delete .text
// but preserve all other sub-fields
// Before:
dosageInstruction: [{
text: 'private dosage note', // ← removed
route: { coding: [...] }, // ← preserved
timing: { repeat: {...} }, // ← preserved
doseAndRate: [...], // ← preserved
}]
// After:
dosageInstruction: [{
route: { coding: [...] },
timing: { repeat: {...} },
doseAndRate: [...],
}] Library-scrub-patterns
FREE_TEXT_FIELDS + FREE_TEXT_PATTERNS — narrative scrubbing.
Two exports from a single generated file: field paths that contain clinical narrative text, and the regex patterns used to scrub PII tokens from those fields. AnonymizingTransport applies all four patterns to every field in the FREE_TEXT_FIELDS catalogue (transformation 5).
// Source file: packages/fhir-de/src/client/generated/free-text-patterns.ts
// Source IG: io.cognovis.de-identification.de@0.11.0
import { FREE_TEXT_FIELDS, FREE_TEXT_PATTERNS } from '@polaris/sdk/fhir' FREE_TEXT_FIELDS
| Resource type | Scrubbed field path |
|---|---|
| AllergyIntolerance | note[].text |
| CarePlan | note[].text |
| ChargeItem | note[].text |
| DiagnosticReport | conclusion |
| DocumentReference | description |
| MedicationAdministration | dosage.text |
| Observation | valueString |
FREE_TEXT_PATTERNS
| ID | Replacement | Example match |
|---|---|---|
| de-titled-name | [NAME] | "Dr. Mustermann", "Hr. Schmidt", "Fr. Weber" |
| de-date | [DATE] | "15.03.1985", "07.04.2025" |
| de-german-phone | [TEL] | "+49 30 1234567", "030 9876543" |
| de-kvnr | [KV-NR] | "A123456789" |
Scope limitation
Bare name bigrams (Firstname Lastname without a title) are excluded to avoid false positives on clinical terms like "Diabetes Mellitus" or "Akute Otitis Media". Only titled names (Dr., Hr., Fr.) are caught by de-titled-name.
Using FREE_TEXT_PATTERNS directly
import { FREE_TEXT_PATTERNS } from '@polaris/sdk/fhir'
// Each pattern is a { id, pattern, replacement } object
for (const { id, pattern, replacement } of FREE_TEXT_PATTERNS) {
console.log(id, pattern.toString())
}
// Apply all patterns to a string:
function scrub(text: string): string {
let result = text
for (const { pattern, replacement } of FREE_TEXT_PATTERNS) {
result = result.replace(pattern, replacement)
}
return result
}
scrub('Patient Dr. Mustermann, born 15.03.1985, KV: A123456789')
// → 'Patient [NAME], born [DATE], KV: [KV-NR]' Library-quasi-id-k-floors-manifest
QUASI_ID_K_FLOORS — k-anonymity thresholds.
Quasi-identifiers are fields that cannot identify an individual in isolation but can do so in combination (birthDate, postal code, occupation). The k-floor is the minimum group size required before such a field can be released in an analytics context. AnonymizingTransport does not enforce k-anonymity directly — these thresholds are used by analytics pipelines (aggregation, reporting) that consume anonymized FHIR data.
// Source file: packages/fhir-de/src/client/generated/quasi-id-k-floors.ts
// Source IG: io.cognovis.de-identification.de@0.11.0
import {
QUASI_ID_K_FLOORS,
DEFAULT_QUASI_ID_K_FLOOR,
getQuasiIdKFloor,
} from '@polaris/fhir-de' // re-exported from generated file | Resource type | Field | k-floor | Notes |
|---|---|---|---|
| Condition | icd-3char | 5 | ICD-10 code at 3-character level (e.g. J00) — requires at least 5 patients per group |
| Condition | rare-icd-categories | 11 | Rare disease ICD categories — higher floor due to higher re-identification risk |
| Patient | birthDate | 11 | Full birth date — high re-identification risk, stricter floor |
| Patient | occupation | 5 | Patient occupation code |
| Patient | plz | 5 | German postal code (Postleitzahl) |
| DEFAULT | all other fields | 5 | DEFAULT_QUASI_ID_K_FLOOR — fallback for fields not explicitly listed |
Using getQuasiIdKFloor
import {
getQuasiIdKFloor,
DEFAULT_QUASI_ID_K_FLOOR,
QUASI_ID_K_FLOORS,
} from '@polaris/fhir-de' // note: from @polaris/fhir-de, not @polaris/sdk/fhir
// Get the k-floor for a specific field:
getQuasiIdKFloor('Patient', 'birthDate') // → 11
getQuasiIdKFloor('Patient', 'plz') // → 5
getQuasiIdKFloor('Condition', 'icd-3char') // → 5
getQuasiIdKFloor('Unknown', 'someField') // → 5 (DEFAULT_QUASI_ID_K_FLOOR)
// Check if a cohort is large enough before releasing a field:
function isKAnonymous(
resourceType: string,
fieldPath: string,
cohortSize: number
): boolean {
return cohortSize >= getQuasiIdKFloor(resourceType, fieldPath)
}
isKAnonymous('Patient', 'birthDate', 10) // → false (k=11 required)
isKAnonymous('Patient', 'birthDate', 11) // → true IG version
Version tracing and regeneration.
All three generated files carry a machine-generated header with the source IG version. The TypeScript constants DE_IDENTIFICATION_IG_VERSION and DE_IDENTIFICATION_IG_PACKAGE in pii-fields.ts are the canonical source of truth for the bundled version.
Current bundled version
- Package
- io.cognovis.de-identification.de
- Version
- 0.11.0
Regenerating the catalogue
# Regenerate all three generated files from the IG:
bun run --cwd packages/fhir-de generate:deidentification
# The generated files carry a DO NOT EDIT header:
# Source: io.cognovis.de-identification.de@0.11.0
# Regenerate: bun run --cwd packages/fhir-de generate:deidentification IG version pinning with requireIgVersion
Import DE_IDENTIFICATION_IG_VERSION and pass it as requireIgVersion to AnonymizingTransport if you want startup-time validation that the bundled IG version matches what your code expects. See the AnonymizingTransport docs for details.
See also
anonymizing-transport
Five transformations
ID hashing, PII removal, display replacement, reference rewriting, free-text scrubbing.
fhir sdk
Full FHIR client reference
createFhirDeClient, FhirTransport SPI, ResourceClient, error classes.
llm gateway
Zone-A to Zone-C boundary
Anonymize with AnonymizingTransport before crossing the boundary to the LLM gateway.