8 resource types: Patient, Encounter, Observation, Condition, Procedure, Immunization, MedicationRequest, DiagnosticReport.
The raw Synthea output has 459 nested fields per resource, urn:uuid: references, and no column descriptions. We flatten it to clean views with ~15 columns each, pre-extracted IDs, and descriptions sourced from the FHIR R4 OpenAPI spec. Example:
-- Raw FHIR: SELECT id, code.text FROM diagnostic_report WHERE subject.reference = CONCAT("urn:uuid:", patient_id) -- Forge view: SELECT report_name, patient_id FROM v_diagnostic_report Data scanned per query drops ~90x (450 MB → 5 MB).
Free to subscribe: https://console.cloud.google.com/bigquery/analytics-hub/exch...
Updated weekly. Useful if you're building anything against FHIR data and want a realistic test dataset without standing up your own Synthea pipeline.
Happy to answer questions about the normalization approach or FHIR data modeling tradeoffs.
0 comments