Show HN: Synthea Fhir Data in BigQuery

We generated ~1,100 synthetic patients with Synthea, processed the FHIR R4 output through our normalization engine (Forge), and published it as a free public dataset on BigQuery Analytics Hub.

8 resource types: Patient, Encounter, Observation, Condition, Procedure, Immunization, MedicationRequest, DiagnosticReport.

The raw Synthea output has 459 nested fields per resource, urn:uuid: references, and no column descriptions. We flatten it to clean views with ~15 columns each, pre-extracted IDs, and descriptions sourced from the FHIR R4 OpenAPI spec. Example:

-- Raw FHIR: SELECT id, code.text FROM diagnostic_report WHERE subject.reference = CONCAT("urn:uuid:", patient_id) -- Forge view: SELECT report_name, patient_id FROM v_diagnostic_report Data scanned per query drops ~90x (450 MB → 5 MB).

Free to subscribe: https://console.cloud.google.com/bigquery/analytics-hub/exch...

Updated weekly. Useful if you're building anything against FHIR data and want a realistic test dataset without standing up your own Synthea pipeline.

Happy to answer questions about the normalization approach or FHIR data modeling tradeoffs.

2 points | by brady_bastian 2 hours ago

0 comments