BMC Digital Health

Table 1 Accuracy summary

From: Evaluating GPT-4-based ChatGPT's clinical potential on the NEJM quiz

	Accuracy without choice	Accuracy with choice	P-values
Total	87%(54/62)	97%(60/62)	0.01
Types of quiz
Diagnosis	89%(49/55)	98%(54/55)	0.11
Finding	0%(0/1)	100%(1/1)	> 0.99
Treatment	100%(2/2)	100%(2/2)	> 0.99
Cause	50%(1/2)	50%(1/2)	> 0.99
Other	100%(2/2)	100%(2/2)	> 0.99
Specialty of quiz
Dermatology	83%(24/29)	93%(27/29)	0.02
Emergency medicine	92%(11/12)	92%(11/12)	0.08
Infectious disease	92%(12/13)	100%(13/13)	> 0.99
Radiology	88%(7/8)	100%(8/8)	> 0.99
Ophthalmology	80%(8/10)	100%(10/10)	> 0.99
Pediatrics	100%(6/6)	100%(6/6)	> 0.99
Hematology/Oncology	80%(8/10)	90%(9/10)	0.22
Gastroenterology	100%(7/7)	100%(7/7)	> 0.99
Neurology/Neurosurgery	100%(7/7)	100%(7/7)	> 0.99
Pulmonary/Critical Care	100%(3/3)	100%(3/3)	> 0.99
Surgery	100%(13/13)	100%(13/13)	> 0.99
Obstetrics/Gynecology	80%(4/5)	100%(5/5)	> 0.99
Otolaryngology	50%(1/2)	100%(2/2)	> 0.99
Nephrology	100%(4/4)	100%(4/4)	> 0.99
Genetics	67%(2/3)	67%(2/3)	0.33
Cardiology	100%(2/2)	100%(2/2)	> 0.99
Allergy/Immunology	50%(1/2)	100%(2/2)	> 0.99
Rheumatology	67%(2/3)	100%(3/3)	> 0.99
Urology/Prostate disease	100%(3/3)	100%(3/3)	> 0.99
Endocrinology	100%(3/3)	100%(3/3)	> 0.99
Toxicology	100%(2/2)	100%(2/2)	> 0.99
Orthopedics	100%(2/2)	100%(2/2)	> 0.99

Back to article page

ISSN: 2731-684X

Contact us

Submission enquiries: bmcdigitalhealth@biomedcentral.com
General enquiries: ORSupport@springernature.com