Rectangularized Schema in Maintenance Mode

Prior to the relational schema, data mart users had to extract JSON from key data elements to answer their analytical questions. Now, a data mart user can easily query the data directly without needing to deal with complex JSON structures. This makes the data more accessible and easier to use. No more LATERAL FLATTEN or UNNEST functions!

The new data mart schema features include:

  • Foundational resource types (Medication, Location, Practitioner, Organization) from third-party data sources
  • The addition of new FHIR resources types: DeviceUseStatement, RelatedPerson
  • The separation of Zus-summarized data (Lenses) into their own tables. Check out the lens_rxnorm_medication_statement, lens_snomed_condition, and lens_transition_of_care tables
  • The full JSON of the FHIR resource for zero lossiness. See the resource_json column in the base resource tables.
  • The top-leveling of codes. For example, in the observation table, there are code_loinc and code_snomed columns
  • The consistency of naming conventions for foreign keys
  • The removal of columns that were always NULL

However, if you are still on the non-relational schema, the following can support you in querying.

Child tables for complex elements in FHIR resources

For elements in a resource that are complex, rectangularization offers child tables that express the values in that complex element of the parent as scalars for analytic convenience. For example, the Patient resource has an address element of data type Address that includes the street, city, state, and postal code.

The structure of a Patient FHIR resource

The structure of a Patient FHIR resource

The structure of an Address FHIR data type

The structure of an Address FHIR data type

The data mart includes a Patient table and PatientAddress child table that can be joined to the Patient table by the patient_id column.

All child tables have columns that contain the ID field of the parent resource.

Parent ResourceChild TablesJoin Keys
MedicationDispenseMedicationDispenseAuthorizingPrescription- MedicationDispense on medication_dispense_id
- MedicationRequest on medication_request_id
MedicationDispenseMedicationDispenseDosageInstructionMedicationDispense on id
MedicationRequestMedicationRequestDosageInstructionMedicationRequest on id
ObservationObservationPerformer- Observation on id
- Practitioner on reference_id
OrganizationOrganizationAddress
OrganizationContact
OrganizationIdentifier
OrganizationTelecom
Organization on id
PatientPatientAddress
PatientCommunication
PatientContact
PatientGeneralPractitioner
Patient on patient_id
PractitionerPractitionerAddress
PractitionerQualification
PractitionerTelecom
Practitioner on id

Extracting JSON data in arrays and structures of complex elements

FHIR resources are complex and can include elements with cardinalities greater than one and with embedded data types. For example, a Patient resource can have any number of identifiers through the Identifier data type.

The Identifier FHIR data type includes use, type, system, value, period, and assigner elements.

The Identifier FHIR data type includes use, type, system, value, period, and assigner elements.

Suppose you want to extract all patients with your system’s identifier. Querying for the identifier column in the Patient table would return an array of Identifier structs.

select identifier from patient

[
  {
    "assigner": {},
    "period": {},
    "type": {
      "coding": []
    },
    "use": "usual",
    "value": "MH00010315"
  },
  {
    "assigner": {},
    "period": {},
    "type": {
      "coding": []
    },
    "use": "usual",
    "value": "MH1514"
  },
  {
    "assigner": {},
    "period": {},
    "type": {
      "coding": []
    },
    "use": "usual",
    "value": "a332ea7c-6625-432d-80a0-dcc2afd2dbbd"
  },
  {
    "assigner": {},
    "period": {},
    "system": "https://zusapi.com/fhir/identifier/universal-id",
    "type": {
      "coding": []
    },
    "value": "4901b5b1-6abc-469c-b88a-abfda5b58460"
  }
]

Therefore, you would need to explode the array of identifiers in the Identifier column in the Patient table into multiple rows and produce an inline view that references Patient columns. In Snowflake, this can be achieved with the lateral keyword and the flatten table function.

The flatten table function outputs a fixed set of columns:
| SEQ | KEY | PATH | INDEX | VALUE | THIS |

Column NameColumn Description
SEQA unique sequence number associated with the input record. The sequence is not guaranteed to be gap-free or ordered in any particular way.
KEYFor maps or objects, this column contains the key to the exploded value.
PATHThe path to the element within a data structure which needs to be flattened.
INDEXThe index of the element, if it is an array; otherwise NULL.
VALUEThe value of the element of the flattened array/object.
THISThe element being flattened (useful in recursive flattening).

Flattening the Identifier column outputs a single JSON identifier from the Identifier column array in the VALUE column. To traverse a path in a JSON object in Snowflake, the dot notation is used.

<column>:<level1_element>.<level2_element>.<level3_element>

📘

For more information on querying semi-structured data in Snowflake, including the dot notation and bracket notation, please refer to Snowflake's user guide.

The lateral keyword and flatten function can be combined as follows:

select 
    patient.id,
    identifiers.VALUE:use as identifier_use,
    identifiers.VALUE:type as identifier_type,
    identifiers.VALUE:system as identifier_system,
    identifiers.VALUE:value as identifier_value,
    identifiers.VALUE:period as identifier_period,
    identifiers.VALUE:assigner as identifier_assigner
from patient,
lateral flatten(input => identifier) as identifiers
where identifier_system = 'https://zusapi.com/fhir/identifier/universal-id'

Analyzing prevalence of patients’ zipcodes

The query below counts the number of distinct patients by zip code in descending order. A patient may be counted in more than one zip code.

select 
    left(pa.postal_code, 5) as zip_code, 
    count(distinct p.universal_patient_identifier) as distinct_upid_count
from patient p
left join patientaddress pa
on p.id = pa.patient_id
group by zip_code
order by distinct_upid_count desc

Analyzing prevalence of Conditions

Zus helps Builders clean up real-world data with enrichment and aggregation engines to make data more usable and easier to query. Conditions and medication-related FHIR resources written to the Zus platform, including data from third party partner networks, are enriched.

First, confirm the 95%+ coverage of SNOMED codes, including Zus-enriched SNOMED codes, with the query below.

select
    contains(code:coding, 'http://snomed.info/sct') as snomed_coded,
    count(distinct id)
from condition
group by 1

The query below counts the number of distinct patients by SNOMED code in Condition resources in descending order.

with condition_snomed_codes as (
    select
        condition.*,
        codes.VALUE:code as code_code,
        codes.VALUE:display as code_display,
        codes.VALUE:system as code_system 
    from condition,
    lateral flatten(input=>code:coding) as codes
    where code_system = 'http://snomed.info/sct')
select 
    code_code, 
    code_display, 
    count(distinct universal_patient_identifier) as distinct_upid_count
from condition_snomed_codes
group by code_code, code_display
order by distinct_upid_count desc

Analyzing prevalence of Medications summarized by Zus

For third-party FHIR resources, their data source is available in the DATA_SOURCE column, which is otherwise NULL. To identify Zus-summarized and de-duplicated Lens data, you can leverage the TAG suffix table. The query below counts the number of distinct patients by RxNorm code in summary MedicationStatement Lens resources in descending order.

with summary_medication_rxnorm_codes as (
    select
        medicationstatement.*,
        codes.VALUE:code as code_code,
        codes.VALUE:display as code_display,
        codes.VALUE:system as code_system 
    from medicationstatement
    join medicationstatementtag
    on medicationstatement.id = medicationstatementtag.fhir_id
    and medicationstatementtag.system = 'https://zusapi.com/summary',
    lateral flatten(input=>medication_codeable_concept:coding) as codes
    where code_system = 'http://www.nlm.nih.gov/research/umls/rxnorm')
select 
    code_code, 
    code_display, 
    count(distinct universal_patient_identifier) as distinct_upid_count
from summary_medication_rxnorm_codes
group by code_code, code_display
order by distinct_upid_count desc

Identifying patients with at least one A1C Observation

The A1C test measures average blood sugar levels over the past 3 months and is commonly used to diagnose prediabetes and diabetes. To identify patients with at least one A1C test, first characterize the types of systems and codes that are used to signify the A1C test from most to least common.

select distinct
    codes.VALUE:code as code_code,
    codes.VALUE:system as code_system,
    codes.VALUE:display,
    count(distinct id)
from observation,
lateral flatten(input=>code:coding) as codes 
where lower(codes.VALUE:display) like '%a1c%'
group by 1,2,3
order by 4 desc ;

Mapping from the Zus Universal Patient Identifier to your internal identifier

The query returns a 1:1 mapping between the Zus Universal Patient Identifier and an internal identifier labeled with system, 'https://organizationName.com/ID'.

select
    patient.universal_patient_identifier as zus_upid,
    identifiers.VALUE:value::string as organization_patient_id
from patient,
lateral flatten (input => identifier) as identifiers
where identifiers.VALUE:system = 'https://organizationName.com/ID' ;