From: Evaluating GPT-4-based ChatGPT's clinical potential on the NEJM quiz
Accuracy without choice | Accuracy with choice | P-values | |
---|---|---|---|
Total | 87%(54/62) | 97%(60/62) | 0.01 |
Types of quiz | |||
Diagnosis | 89%(49/55) | 98%(54/55) | 0.11 |
Finding | 0%(0/1) | 100%(1/1) | > 0.99 |
Treatment | 100%(2/2) | 100%(2/2) | > 0.99 |
Cause | 50%(1/2) | 50%(1/2) | > 0.99 |
Other | 100%(2/2) | 100%(2/2) | > 0.99 |
Specialty of quiz | |||
Dermatology | 83%(24/29) | 93%(27/29) | 0.02 |
Emergency medicine | 92%(11/12) | 92%(11/12) | 0.08 |
Infectious disease | 92%(12/13) | 100%(13/13) | > 0.99 |
Radiology | 88%(7/8) | 100%(8/8) | > 0.99 |
Ophthalmology | 80%(8/10) | 100%(10/10) | > 0.99 |
Pediatrics | 100%(6/6) | 100%(6/6) | > 0.99 |
Hematology/Oncology | 80%(8/10) | 90%(9/10) | 0.22 |
Gastroenterology | 100%(7/7) | 100%(7/7) | > 0.99 |
Neurology/Neurosurgery | 100%(7/7) | 100%(7/7) | > 0.99 |
Pulmonary/Critical Care | 100%(3/3) | 100%(3/3) | > 0.99 |
Surgery | 100%(13/13) | 100%(13/13) | > 0.99 |
Obstetrics/Gynecology | 80%(4/5) | 100%(5/5) | > 0.99 |
Otolaryngology | 50%(1/2) | 100%(2/2) | > 0.99 |
Nephrology | 100%(4/4) | 100%(4/4) | > 0.99 |
Genetics | 67%(2/3) | 67%(2/3) | 0.33 |
Cardiology | 100%(2/2) | 100%(2/2) | > 0.99 |
Allergy/Immunology | 50%(1/2) | 100%(2/2) | > 0.99 |
Rheumatology | 67%(2/3) | 100%(3/3) | > 0.99 |
Urology/Prostate disease | 100%(3/3) | 100%(3/3) | > 0.99 |
Endocrinology | 100%(3/3) | 100%(3/3) | > 0.99 |
Toxicology | 100%(2/2) | 100%(2/2) | > 0.99 |
Orthopedics | 100%(2/2) | 100%(2/2) | > 0.99 |