Example Queries

Analyzing prevalence of patients’ zip codes

The query below counts the number of distinct patients by zip code in descending order. A patient may be counted in more than one zip code.

select 
    left(pa.postal_code, 5) as zip_code, 
    count(distinct p.upid) as distinct_upid_count
from patient p
left join patient_address pa
on p.id = pa.patient_id
group by zip_code
order by distinct_upid_count desc

Analyzing prevalence of Conditions

Zus helps Builders clean up real-world data with enrichment and aggregation engines to make data more usable and easier to query. Conditions and medication-related FHIR resources are enriched.

First, confirm the 95%+ coverage of SNOMED codes, including Zus-enriched SNOMED codes, on Condition resources with the query below:

select 
    sum(case when code_snomed is not null then 1 else 0 end) as snomed_populated_count,
    count(*) as total_count,
    non_null_snomed_count / total_count as snomed_populated_percentage
from condition

Then, count the number of distinct patients by SNOMED code in Condition resources in descending order:

select 
    code_snomed,
    code_display,
    count(distinct upid) as distinct_upid_count
from condition
group by code_snomed, code_display
order by distinct_upid_count desc ;

Analyzing prevalence of Medications summarized by Zus

Zus helps Builders extract signal quickly from raw data with structural normalization, de-duplication, summarization, and tagging. A Zus Lens is a set of FHIR resources derived from a patient’s raw resources, and Lenses are separated into their own tables for easy identification and access.

The query below counts the number of distinct patients by RxNorm code in summary MedicationStatement Lens resources in descending order.

select 
    code_rxnorm, 
    code_display, 
    count(distinct upid) as distinct_upid_count
from lens_rxnorm_medication_statement
group by code_rxnorm, code_display
order by distinct_upid_count desc ;

Retrieving A1C results across your population

The A1C test measures average blood sugar levels over the past 3 months and is commonly used to diagnose prediabetes and diabetes. To identify patients with at least one A1C test, first characterize the type LOINC codes that are used to signify the A1C test from most to least common.

select distinct
    code_loinc,
    code_display,
    count(distinct id)
from observation,
where lower(code_display) like '%a1c%'
group by 1,2
order by 3 desc ;

This shows that some of the most common LOINC codes for representing an A1C test are:

  • 4548-4 - Hemoglobin A1c/Hemoglobin.total in Blood
  • 4549-2 - Hemoglobin A1c/Hemoglobin.total in Blood by HPLC
  • 17855-8 - Hemoglobin A1c/Hemoglobin. total in Blood by calculation
  • 17856-6 - Hemoglobin A1c/Hemoglobin.total in Blood by Electrophoresis
  • 41995-2 - Hemoglobin A1c/Hemoglobin.total in Blood by Immunoassay
  • 96595-4 - Hemoglobin A1c/Hemoglobin.total in DBS

Then, list the A1C tests per patient from most recent to least recent, filtered down to the codes identified above:

select
    upid,
    effective_start,
    effective_end,
    value_string,
    value_quantity_value,
    value_quantity_unit,
from observation
where code_loinc in ('4548-4', '4549-2', '17855-8', '17856-6', '41995-2', '96595-4') 
order by upid, effective_start desc;

Given that A1C result values can be represented across a variety of formats, you can coalesce across those types and strip non-numbers to produce a single result value column.

select distinct
    upid,
    encounter_id,
    effective_start,
    ROUND(COALESCE(value_quantity_value, TRY_TO_DECIMAL(REGEXP_SUBSTR(value_string, '[0-9]+\\.+[0-9]'), 10, 1), value_integer, COALESCE(value_range_low, value_range_high), REGEXP_SUBSTR(observation_value_concept.display, '[0-9]+\\.+[0-9]')), 1) AS result,
    COALESCE(value_quantity_unit, REGEXP_SUBSTR(value_string, '\\%'), REGEXP_SUBSTR(observation_value_concept.display, '\\%')) AS unit,
    case
        when value_quantity_value is not null then 'QUANTITY'
        when value_string is not null then 'STRING'
        when (value_range_low is not NULL or value_range_high is not NULL) then 'RANGE'
        when observation.observation_value_concept_id is not NULL then 'CODEABLE_CONCEPT'
        when value_integer is not null then 'INTENGER'
    end as result_type,
    coalesce(observation_interpretation.display, observation_interpretation_coding.code) as interpretation,
    reference_range_low,
    reference_range_high,
    reference_range_unit,
    reference_range_display
from observation
left join observation_interpretation
on observation.observation_interpretation_id = observation_interpretation.id
left join observation_interpretation_coding
on observation_interpretation.id = observation_interpretation_coding.observation_interpretation_id
left join observation_value_concept
on observation.observation_value_concept_id = observation_value_concept.id
left join observation_value_concept_coding
on observation_value_concept.id = observation_value_concept_coding.observation_value_concept_id
where (code_loinc in ('4548-4', '4549-2', '17855-8', '17856-6', '41995-2', '96595-4')
or code_snomed in ('43396009', '313835008')
or code_cpt in ('83036', '83037') 
or lower(code_display) like '%a1c%')
ORDER BY 1, 3 DESC;

Retrieving blood pressure readings across your population

select
    upid,
    encounter_id,
    effective_start,
    coalesce(value_quantity_value, value_integer, try_to_number(value_string)) as result,
    value_quantity_unit as unit,  
    case 
        when (code_loinc in ('8480-6', '3074F', '3075F', '3077F', '75997-7', '8459-0', '8460-8', '8461-6', '8508-4', '8546-4', '8547-2', '89268-7')
            or code_snomed = '271649006')
        then 'SYSTOLIC'
        else 'DIASTOLIC'
    end as bp_type,
    case
        when value_quantity_value is not null then 'QUANTITY'
        when value_string is not null then 'STRING'
        when (value_range_low is not NULL or value_range_high is not NULL) then 'RANGE'
        when observation_value_concept_id is not NULL then 'CODEABLE_CONCEPT'
        when value_integer is not null then 'INTENGER'
    end as result_type,
    coalesce(observation_interpretation.display, observation_interpretation_coding.code) as interpretation,
    reference_range_low,
    reference_range_high,
    reference_range_unit,
    reference_range_display,
    text_markdown as note
from observation
left join observation_interpretation
on observation.observation_interpretation_id = observation_interpretation.id
left join observation_interpretation_coding
on observation_interpretation.id = observation_interpretation_coding.observation_interpretation_id
left join observation_note
on observation.id = observation_note.observation_id
WHERE (code_loinc in
-- systolic
('8462-4', '3078F', '3079F', '3080F', '75995-1', '8453-3', '8454-1', '8455-8', '8496-2', '8514-2', '8515-9', '89267-9',
-- diastolic
'8480-6', '3074F', '3075F', '3077F', '75997-7', '8459-0', '8460-8', '8461-6', '8508-4', '8546-4', '8547-2', '89268-7')
OR code_snomed in ('271650006', '271649006'));

Mapping from the Zus Universal Patient Identifier to your internal identifier

The query returns a 1:1 mapping between the Zus Universal Patient Identifier and an internal identifier labeled with system, https://organizationName.com/ID

select
    patient.upid as zus_upid,
    identifier.value as organization_patient_id
from patient
join patient_identifier
on patient.id = patient_identifier.patient_id
join identifier
on identifier.id = patient_identifier.identifier_id
where identifier.system = 'https://organizationName.com/ID' ;