A framework for multi-faceted content analysis of social media chatter regarding non-medical use of prescription medications

Table 1 Performance evaluation of the NER task uses Precision (P), Recall (R), and F1-score (F1). Bold indicates the best score. All baselines are tuned to their optimal settings, and the best result for each method is reported. The train-test split information for each dataset follows the original papers. For our test data, a standard ratio of 70–15-15 is employed. A fivefold cross-validation is conducted, providing mean and standard deviation (SD) (Mean ± SD) values for each measure

Model/Dataset		NCBI [67]	i2b2-clinical [68]	i2b2-2012 [69]	Our test set
BiLSTM-CRF [71]	P	88.3 ± 1.9	89.2 ± 1.6	87.4 ± 2.2	90.1 ± 2.1
	R	90.1 ± 1.7	91.1 ± 1.8	89.1 ± 2.0	92.1 ± 1.9
	F1	89.2 ± 1.8	90.1 ± 1.7	88.2 ± 2.1	91.1 ± 2.0
Att-BiLSTM-CRF [72]	P	85.5 ± 1.5	88.6 ± 1.7	89.8 ± 1.9	88.1 ± 1.6
	R	86.2 ± 1.4	90.0 ± 1.8	92.0 ± 1.7	92.2 ± 1.5
	F1	85.9 ± 1.4	89.3 ± 1.7	90.9 ± 1.8	90.1 ± 1.5
CollabNet [73]	P	82.1 ± 1.2	84.2 ± 1.3	83.1 ± 1.4	86.7 ± 1.5
	R	84.1 ± 1.3	85.4 ± 1.2	84.3 ± 1.3	88.1 ± 1.4
	F1	83.1 ± 1.2	84.8 ± 1.2	83.7 ± 1.3	87.4 ± 1.4
BLUE [74]	P	90.4 ± 1.7	92.1 ± 1.5	91.5 ± 1.6	91.0 ± 1.4
	R	92.9 ± 1.6	93.2 ± 1.4	92.1 ± 1.5	93.0 ± 1.3
	F1	91.6 ± 1.6	92.6 ± 1.4	91.8 ± 1.5	91.9 ± 1.3
BioBERT [45]	P	91.2 ± 1.8	91.0 ± 1.7	91.4 ± 1.9	92.3 ± 1.6
	R	92.3 ± 1.7	91.1 ± 1.8	93.1 ± 1.7	93.1 ± 1.5
	F1	91.7 ± 1.7	91.0 ± 1.7	92.3 ± 1.8	92.7 ± 1.5
Our approach	P	91.9 ± 1.6	92.1 ± 1.4	92.2 ± 1.5	94.8 ± 1.3
	R	93.4 ± 1.5	93.3 ± 1.3	92.1 ± 1.4	95.1 ± 1.2
	F1	92.6 ± 1.5	92.7 ± 1.3	92.2 ± 1.4	94.9 ± 1.2

ISSN: 2731-684X