| features |
|
|||||
|---|---|---|---|---|---|---|
| languages |
|
Use Aidbox's per-column de-identification to create analytics-ready tables without exposing patient identifiers. This example uses the pre-built io.health-samurai.de-identification.r4 package which provides Safe Harbor ViewDefinitions for 17 FHIR R4 resource types.
Read the full walkthrough: HIPAA Safe Harbor De-Identification in Aidbox.
- Docker
- Cloned repository
git clone https://github.com/Aidbox/examples.git
cd examples/aidbox-features/de-identification
docker compose upThe init bundle registers an AidboxMigration that installs the de-identification package from the artifact registry on startup.
Open the Aidbox console and log in with password admin.
On the Aidbox home page, click Import Data → Import synthetic dataset. This imports 100 Synthea patients (and related Encounters, Observations, Conditions, etc.).
See the sample data guide for details or larger datasets.
Navigate to Resource browser in the sidebar and open ViewDefinitions.
Select hipaa-patient.
The ViewDefinition Builder tab shows each column with its de-identification method — shield icons indicate which columns are protected.
The ViewDefinition ships with blank keys ("").
Click the shield icon on any protected column (e.g. id with cryptoHash) and enter a key value. Use the same key across all columns that need to be joinable — for example, set cryptoHashKey to my-research-key on every cryptoHash column.
Do the same for dateShiftKey on date-shifted columns.
Click Run in the builder to preview the de-identified output. You'll see:
- Patient IDs replaced with HMAC-SHA256 hashes
- Birth dates shifted (or redacted for patients over 89 via
birthDateSafeHarbor) - Names redacted to NULL
- Gender and other non-identifying fields passed through
Verify the output looks correct before materializing.
Click Materialize and select Table as the type. This creates sof.hipaa_patient in the database.
Note: ViewDefinitions with de-identification extensions can only be materialized as
table— notviewormaterialized-view. Views expose cryptographic keys in PostgreSQL system catalogs.
Open the SQL console and query the de-identified data:
SELECT * FROM sof.hipaa_patient;All patient identifiers are transformed — hashed IDs, shifted dates, redacted names. The clinical data (gender, marital status, etc.) is preserved.
Count patients by gender:
SELECT gender, count(*) FROM sof.hipaa_patient GROUP BY 1;Check that birth dates are shifted (compare with raw FHIR data):
SELECT p.id, p.resource#>>'{birthDate}' AS real_birthdate, h.birth_date AS shifted_birthdate
FROM patient p
JOIN sof.hipaa_patient h ON public.aidbox_deident_crypto_hash(p.id::text, 'my-research-key') = h.id;- Pre-built ViewDefinitions apply de-identification methods per column —
cryptoHashon identifiers,dateshifton dates,redacton names birthDateSafeHarboron Patient.birthDate automatically redacts when age >89 (HIPAA Safe Harbor requirement)- Materialized tables contain only transformed data — cryptographic keys stay inside the ViewDefinition resource, protected by access control
- The same
cryptoHashKeyacross ViewDefinitions means hashed IDs are joinable — de-identify multiple resource types and still link them by patient