Data Marts

To support population health management, Zus offers data marts that extract and transform FHIR data into a tabular format — a process called rectangularization — suitable for SQL-based exploration and analysis. If you would like to access your Zus FHIR data in this tabular format, please submit a support ticket.

As a Builder, you endeavor to bring greater value to patient care by improving health outcomes and reducing cost. To prove your differentiation, you report your clinical and operational metrics to your customers. You characterize your patient population, identifying trends and patterns of disease prevalence and health behaviors, to implement interventions.

As a standard for electronically exchanging healthcare information, FHIR is designed to be interoperable through its flexibility and extensibility. It is optimized for real-time transactional interaction on a single resource or bundle of resources. However, FHIR is not designed for analytics use cases. FHIR’s heavily nested and embedded structure is challenging for the majority of existing analytics infrastructure that relies on “flat” storage. Zus offers a data mart that extracts and transforms your FHIR data into a tabular format suitable for SQL-based exploration and analysis. Then, you can layer on a business intelligence and visualization tool such as Tableau or Looker to develop dashboards and reports.

Accessing and refreshing your data mart

You can access your data mart through:

  • Snowflake secure data sharing. If your organization already has a Snowflake account, you can provide your account identifier to Zus. Otherwise, Zus can provision and manage read-only reader accounts for your users that enable them to access and query the data mart.

Data is continuously streamed to Snowflake data marts. The latest state of FHIR resources should be reflected in Snowflake data marts within one hour of the change in the Zus FHIR Store.

👍

If you would like access to your data mart, please submit a support ticket to us here.

Querying FHIR resources

The following resource types are exported as data mart tables: AllergyIntolerance, Appointment, Consent, CarePlan, CareTeam, Condition, Coverage, Device, DeviceUseStatement, DiagnosticReport, DocumentReference, Encounter, FamilyMemberHistory, Goal, Immunization, Location, MedicationAdministration, MedicationDispense, MedicationRequest, MedicationStatement, Observation, Organization, Patient, Practitioner, Procedure, Provenance, and QuestionnaireResponse.

Child tables for complex elements in FHIR resources

For elements in a resource that are complex, rectangularization offers child tables that express the values in that complex element of the parent as scalars for analytic convenience. For example, the Patient resource has an address element of data type Address that includes the street, city, state, and postal code.

The structure of a Patient FHIR resource

The structure of a Patient FHIR resource

The structure of an Address FHIR data type

The structure of an Address FHIR data type

The data mart includes a Patient table and PatientAddress child table that can be joined to the Patient table by the patient_id column.

All child tables have columns that contain the ID field of the parent resource.

Parent ResourceChild TablesJoin Keys
MedicationDispenseMedicationDispenseAuthorizingPrescription- MedicationDispense on medication_dispense_id
- MedicationRequest on medication_request_id
MedicationDispenseMedicationDispenseDosageInstructionMedicationDispense on id
MedicationRequestMedicationRequestDosageInstructionMedicationRequest on id
ObservationObservationPerformer- Observation on id
- Practitioner on reference_id
OrganizationOrganizationAddress
OrganizationContact
OrganizationIdentifier
OrganizationTelecom
Organization on id
PatientPatientAddress
PatientCommunication
PatientContact
PatientGeneralPractitioner
Patient on patient_id
PractitionerPractitionerAddress
PractitionerQualification
PractitionerTelecom
Practitioner on id

Extracting JSON data in arrays and structures of complex elements

FHIR resources are complex and can include elements with cardinalities greater than one and with embedded data types. For example, a Patient resource can have any number of identifiers through the Identifier data type.

The Identifier FHIR data type includes use, type, system, value, period, and assigner elements.

The Identifier FHIR data type includes use, type, system, value, period, and assigner elements.

Suppose you want to extract all patients with your system’s identifier. Querying for the identifier column in the Patient table would return an array of Identifier structs.

select identifier from patient

[
  {
    "assigner": {},
    "period": {},
    "type": {
      "coding": []
    },
    "use": "usual",
    "value": "MH00010315"
  },
  {
    "assigner": {},
    "period": {},
    "type": {
      "coding": []
    },
    "use": "usual",
    "value": "MH1514"
  },
  {
    "assigner": {},
    "period": {},
    "type": {
      "coding": []
    },
    "use": "usual",
    "value": "a332ea7c-6625-432d-80a0-dcc2afd2dbbd"
  },
  {
    "assigner": {},
    "period": {},
    "system": "https://zusapi.com/fhir/identifier/universal-id",
    "type": {
      "coding": []
    },
    "value": "4901b5b1-6abc-469c-b88a-abfda5b58460"
  }
]

Therefore, you would need to explode the array of identifiers in the Identifier column in the Patient table into multiple rows and produce an inline view that references Patient columns. In Snowflake, this can be achieved with the lateral keyword and the flatten table function.

The flatten table function outputs a fixed set of columns:
| SEQ | KEY | PATH | INDEX | VALUE | THIS |

Column NameColumn Description
SEQA unique sequence number associated with the input record. The sequence is not guaranteed to be gap-free or ordered in any particular way.
KEYFor maps or objects, this column contains the key to the exploded value.
PATHThe path to the element within a data structure which needs to be flattened.
INDEXThe index of the element, if it is an array; otherwise NULL.
VALUEThe value of the element of the flattened array/object.
THISThe element being flattened (useful in recursive flattening).

Flattening the Identifier column outputs a single JSON identifier from the Identifier column array in the VALUE column. To traverse a path in a JSON object in Snowflake, the dot notation is used.

<column>:<level1_element>.<level2_element>.<level3_element>

📘

For more information on querying semi-structured data in Snowflake, including the dot notation and bracket notation, please refer to Snowflake's user guide.

The lateral keyword and flatten function can be combined as follows:

select 
    patient.id,
    identifiers.VALUE:use as identifier_use,
    identifiers.VALUE:type as identifier_type,
    identifiers.VALUE:system as identifier_system,
    identifiers.VALUE:value as identifier_value,
    identifiers.VALUE:period as identifier_period,
    identifiers.VALUE:assigner as identifier_assigner
from patient,
lateral flatten(input => identifier) as identifiers
where identifier_system = 'https://zusapi.com/fhir/identifier/universal-id'

Leveraging standardized vocabularies

Data marts come with OHDSI’s (Observational Health Data Sciences and Informatics) standardized vocabularies. The standardized vocabularies are an integral part of the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), an open community data guide designed to standardize the structure and content of health data to enable efficient analyses and reproducible evidence.

Vocabulary concepts are stored in the CONCEPT table, which is meant to be comprehensive enough to cover any event relevant to the patient’s healthcare experience (conditions, procedures, etc.) as well as administrative information of the healthcare system (visits, care sites).

The OHDSI standardized vocabulary tables are:

CONCEPTSemantic notion of each event
CONCEPT_ANCESTORWithin a domain, concepts organized in a hierarchical structure. Built automatically from CONCEPT_RELATIONSHIP, it traverses all possible concepts connected through hierarchical relationships.
CONCEPT_CLASSSome vocabularies classify their codes or concepts. For example, SNOMED has 33 concept classes known as semantic tags: clinical finding, social context, body structure, etc.
CONCEPT_RELATIONSHIPAny two concepts, regardless of whether they belong to the same domain or vocabulary, can have a defined relationship.
CONCEPT_SYNONYMIf the source vocabulary has more than one name for a concept.
DOMAINExamples of domains are Condition, Drug, Procedure, Visit, Device, Specimen, etc.
DRUG_STRENGTHAn additional table defining the strength of each ingredient in the Drug Concept Classes except Ingredient, Clinical Drug Form and Branded Drug From, Dose Form, Brand Name and Supplier.
RELATIONSHIPExamples of mapping relationships include “maps to,” “mapped from,” “maps to value,” “value mapped from,” thereby crosswalking equivalent concepts. Examples of hierarchical relationships are “Is a” and “Subsumes.”
VOCABULARYExamples of vocabularies are SNOMED, LOINC, ICD10CM, etc.

📘

For more information on the OHDSI standardized vocabularies, please refer to the Book of OHDSI.

Examples

Zus Aggregated Profile

The Zus Aggregated Profile (or ZAP) is an up-to-date, comprehensive record of a given patient’s health history, assembled and curated by Zus from various external data networks. The ZAP is structured to include first-party data written and managed by your Builder organization, Zus-summarized and de-duplicated (Lens) data, and third-party data from partner networks like Commonwell, CareQuality, and Surescripts via the Patient History API.

Universal Patient Index

Zus offers its own identity mapping service, the Universal Patient Index, to link disparate and duplicative patient records. UNIVERSAL_PATIENT_IDENTIFIER is available as a column for the AllergyIntolerance, CarePlan, CareTeam, Condition, Coverage, DiagnosticReport, DocumentReference, Encounter, Immunization, MedicationAdministration, MedicationDispense, MedicationRequest, MedicationStatement, Observation, Patient, Procedure, and QuestionnaireResponse resource types.

Analyzing prevalence of patients’ zipcodes

The query below counts the number of distinct patients by zip code in descending order. A patient may be counted in more than one zip code.

select 
    left(pa.postal_code, 5) as zip_code, 
    count(distinct p.universal_patient_identifier) as distinct_upid_count
from patient p
left join patientaddress pa
on p.id = pa.patient_id
group by zip_code
order by distinct_upid_count desc

Analyzing prevalence of Conditions

Zus helps Builders clean up real-world data with enrichment and aggregation engines to make data more usable and easier to query. Conditions and medication-related FHIR resources written to the Zus platform, including data from third party partner networks, are enriched.

First, confirm the 95%+ coverage of SNOMED codes, including Zus-enriched SNOMED codes, with the query below.

select
    contains(code:coding, 'http://snomed.info/sct') as snomed_coded,
    count(distinct id)
from condition
group by 1

The query below counts the number of distinct patients by SNOMED code in Condition resources in descending order.

with condition_snomed_codes as (
    select
        condition.*,
        codes.VALUE:code as code_code,
        codes.VALUE:display as code_display,
        codes.VALUE:system as code_system 
    from condition,
    lateral flatten(input=>code:coding) as codes
    where code_system = 'http://snomed.info/sct')
select 
    code_code, 
    code_display, 
    count(distinct universal_patient_identifier) as distinct_upid_count
from condition_snomed_codes
group by code_code, code_display
order by distinct_upid_count desc

Analyzing prevalence of Medications summarized by Zus

For third-party FHIR resources, their data source is available in the DATA_SOURCE column, which is otherwise NULL. To identify Zus-summarized and de-duplicated Lens data, you can leverage the TAG suffix table. The query below counts the number of distinct patients by RxNorm code in summary MedicationStatement Lens resources in descending order.

with summary_medication_rxnorm_codes as (
    select
        medicationstatement.*,
        codes.VALUE:code as code_code,
        codes.VALUE:display as code_display,
        codes.VALUE:system as code_system 
    from medicationstatement
    join medicationstatementtag
    on medicationstatement.id = medicationstatementtag.fhir_id
    and medicationstatementtag.system = 'https://zusapi.com/summary',
    lateral flatten(input=>medication_codeable_concept:coding) as codes
    where code_system = 'http://www.nlm.nih.gov/research/umls/rxnorm')
select 
    code_code, 
    code_display, 
    count(distinct universal_patient_identifier) as distinct_upid_count
from summary_medication_rxnorm_codes
group by code_code, code_display
order by distinct_upid_count desc

Identifying patients with at least one A1C Observation

The A1C test measures average blood sugar levels over the past 3 months and is commonly used to diagnose prediabetes and diabetes. To identify patients with at least one A1C test, first characterize the types of systems and codes that are used to signify the A1C test from most to least common.

select distinct
    codes.VALUE:code as code_code,
    codes.VALUE:system as code_system,
    codes.VALUE:display,
    count(distinct id)
from observation,
lateral flatten(input=>code:coding) as codes 
where lower(codes.VALUE:display) like '%a1c%'
group by 1,2,3
order by 4 desc ;

The query returns several LOINC codes, including 41995-2, 4548-4, 4549-2, 4637-5, 17855-8, 17856-6, and 27353-2, and non-LOINC codes. To be exhaustive, the query below returns all A1C test Observations.

select
    observation.id,
    observation.created_at,
    observation.identifier,
    observation.subject_type,
    observation.subject_id,
    observation.universal_patient_identifier,
    observation.effective_date_time,
    observation.effective_period,
    ifnull(observation.value_string, observation.value_quantity:value) as value,
    observation.interpretation,
    observation.note,
    observation.reference_range,
    observation.last_updated,
    code:text as code_text,
    codes.VALUE:code as code_code,
    codes.VALUE:display as code_display,
    codes.VALUE:system as code_system
from observation,
lateral flatten(input=>code:coding) as codes 
where lower(code_display) like '%a1c%' ;