Skip to main content

Table 1 Comparative performance of various NER methods across different biomedical datasets. The F1-scores for each dataset are given, along with the mean performance and standard deviation (SD) (Mean ± SD column) for each method across all datasets

From: Discovering social determinants of health from case reports using natural language processing: algorithmic development and validation

 

NCBI

BC5CDR

tmVar

BC4CHEMD

BC2GM

i2b2-clinical

Our test set

Mean ± SD

BiLSTM-CRF [31]

85.81

86.18

86.28

89.48

81.08

85.66

87.24

85.74 ± 2.46*1

BILSTM-CNN-Char [30]

88.19

87.58

87.20

90.06

83.29

84.08

89.25

87.43 ± 2.09*

BiLSTM-CRF-MTL [32]

88.85

84.93

83.35

89.42

82.12

83.25

86.78

85.95 ± 2.71*

Doc-Att-BiLSTM-CRF [33]

88.61

87.33

83.31

88.2

81.80

85.18

86.94

86.89 ± 2.62*

CollaboNet [34]

84.08

87.12

81.75

87.12

79.73

85.61

87.13

85.21 ± 2.98**

BLUE-BERT [35]

88.37

87.62

87.24

90.19

82.93

86.09

88.10

87.26 ± 2.24*

ClinicalBERT [17]

87.01

84.19

79.10

80.13

78.13

84.10

84.93

82.51 ± 2.51**

BioBERT [15]

90.01

89.30

88.70

91.28

88.52

88.33

91.94

89.58 ± 2.05*

BioBERT + CRF [36]

89.71

88.39

88.58

90.28

88.01

87.33

90.94

89.03 ± 2.01*

BioBERT + MLP [37]

89.10

88.37

88.10

90.08

87.72

86.73

90.34

88.63 ± 2.10*

Our approach

90.08

89.98

89.13

91.58

89.15

89.17

92.98

90.31 ± 1.96

  1. 1* = p-value < 0.005, ** = p-value < 0.001; asterisk (*) means that the difference in mean performance is statistically significant with a p-value less than 0.005, while two asterisks (**) indicate a higher level of statistical significance with a p-value less than 0.001