Skip to main content

Table 2 Performance of our approach and BioBERT and BioBERT + CRF models in extracting the most frequently occurring SDOH terms (occurrence > 90%) from our test set across multiple iterations

From: Discovering social determinants of health from case reports using natural language processing: algorithmic development and validation

 

Our method

BioBERT

BioBERT + CRF

SDOH factor

F1 (± SD)

F1 (± SD)

F1 (± SD)

Demographics: Gender

0.828 ± 0.021

0.818 ± 0.021

0.808 ± 0.021

Demographics: Race/Ethnicity

0.813 ± 0.024

0.803 ± 0.024

0.793 ± 0.024

Biometric Factors: BMI

0.897 ± 0.021

0.887 ± 0.021

0.877 ± 0.021

Temporal Factors: Date, Duration, Time

0.825 ± 0.026

0.815 ± 0.026

0.805 ± 0.026

Lifestyle Factors: Smoking

0.787 ± 0.022

0.777 ± 0.022

0.767 ± 0.022

Socioeconomic Factors: Employment

0.801 ± 0.023

0.791 ± 0.023

0.781 ± 0.023

Healthcare: Admission/Discharge

0.814 ± 0.022

0.804 ± 0.022

0.794 ± 0.022

Disease: Diabetes

0.906 ± 0.021

0.896 ± 0.021

0.886 ± 0.021

Other: Psychological Condition

0.817 ± 0.025

0.807 ± 0.025

0.797 ± 0.025

Other: Relationship Status

0.790 ± 0.026

0.780 ± 0.026

0.770 ± 0.026

Other: Death Entity

0.870 ± 0.023

0.860 ± 0.023

0.850 ± 0.023

Macro-average

0.832 ± 0.023

0.822 ± 0.023

0.812 ± 0.023