Skip to main content

Table 1 Accuracy summary

From: Evaluating GPT-4-based ChatGPT's clinical potential on the NEJM quiz

 

Accuracy without choice

Accuracy with choice

P-values

Total

87%(54/62)

97%(60/62)

0.01

Types of quiz

 Diagnosis

89%(49/55)

98%(54/55)

0.11

 Finding

0%(0/1)

100%(1/1)

 > 0.99

 Treatment

100%(2/2)

100%(2/2)

 > 0.99

 Cause

50%(1/2)

50%(1/2)

 > 0.99

 Other

100%(2/2)

100%(2/2)

 > 0.99

Specialty of quiz

 Dermatology

83%(24/29)

93%(27/29)

0.02

 Emergency medicine

92%(11/12)

92%(11/12)

0.08

 Infectious disease

92%(12/13)

100%(13/13)

 > 0.99

 Radiology

88%(7/8)

100%(8/8)

 > 0.99

 Ophthalmology

80%(8/10)

100%(10/10)

 > 0.99

 Pediatrics

100%(6/6)

100%(6/6)

 > 0.99

 Hematology/Oncology

80%(8/10)

90%(9/10)

0.22

 Gastroenterology

100%(7/7)

100%(7/7)

 > 0.99

 Neurology/Neurosurgery

100%(7/7)

100%(7/7)

 > 0.99

 Pulmonary/Critical Care

100%(3/3)

100%(3/3)

 > 0.99

 Surgery

100%(13/13)

100%(13/13)

 > 0.99

 Obstetrics/Gynecology

80%(4/5)

100%(5/5)

 > 0.99

 Otolaryngology

50%(1/2)

100%(2/2)

 > 0.99

 Nephrology

100%(4/4)

100%(4/4)

 > 0.99

 Genetics

67%(2/3)

67%(2/3)

0.33

 Cardiology

100%(2/2)

100%(2/2)

 > 0.99

 Allergy/Immunology

50%(1/2)

100%(2/2)

 > 0.99

 Rheumatology

67%(2/3)

100%(3/3)

 > 0.99

 Urology/Prostate disease

100%(3/3)

100%(3/3)

 > 0.99

 Endocrinology

100%(3/3)

100%(3/3)

 > 0.99

 Toxicology

100%(2/2)

100%(2/2)

 > 0.99

 Orthopedics

100%(2/2)

100%(2/2)

 > 0.99