Skip to main content

Table 3 Comparison of predicted labels by best performing baselines: BioBERT, BioBERT + CRF and our NLP model for running examples. The comparison highlights the challenges associated with entity identification, including correctly identifying an entity, failing to identify an entity type, and misclassifying a non-entity

From: Discovering social determinants of health from case reports using natural language processing: algorithmic development and validation

Sentence with ground truth label

Entity Category

Our Approach

BioBERT

BioBERT + CRF

The patient has diabetes and is a smoker

Disease, Smoking

diabetes, smoker

diabetes, smoker

diabetes, smoker

There were more unemployed men than women

Employment, Gender

unemployed, men, women

men, women

men

The individual's relationship status is single and currently employed

Relationship Status, Employment

single, employed

-

-

The patient has hypertension and engages in regular physical activity

Disease, Lifestyle Factors

hypertension, physical activity

hypertension

hypertension

The individual has completed a higher level of education and has a specific disease

Socioeconomic Factors, Disease

education

disease

disease

The event occurred on a specific date and involved healthcare interaction

Temporal Factors, Healthcare System Interaction

date, healthcare interaction

date

date

The patient has a psychological condition

Psychological Condition

psychological condition

psychological condition

psychological condition

The patient’s diet affects their blood pressure

Lifestyle Factors

diet

blood

blood

The individual’s income level is associated with a specific disease

Socioeconomic Factors, Disease

income level, disease

disease

disease