An innovative research study shows how large language models (LLMs), including GPT-4, could play an increasing role in ophthalmology. Cambridge University scientists conducted an experiment where GPT-4 and other LLMs competed against human ophthalmologists during an interactive test scenario simulation test scenario.
Examining GPT-4 Performance
GPT-4 passed this examination with 60 out of 87 questions answered correctly - outperforming trainee doctors (average: 59.5) and junior doctors (37) but fell short of expert ophthalmologists (66.4). Other LLMs like PaLM 2 and GPT-3.5 showed less impressive performance results.
Balancing Potential Benefits And Risks
Though these results offer promising possibilities, researchers caution about potential risks when employing LLMs in this context. Their limited applicability raises serious doubts. Furthermore, LLMs' tendency to produce inaccurate information (known as hallucinating), could result in misdiagnosis of critical conditions like cataracts or cancer and their inherent lack of nuanced understanding may exacerbate inaccuracies further.
Future Steps and Caution
This study emphasizes the need for additional research and advancement before using LLMs as reliable medical diagnostic tools. Given their considerable risks associated with medical diagnosis applications, considerable time and effort must be expended before LLMs can safely integrate themselves into mainstream medical practices.