ChatGPT passes neurology exam for first time

Posted on 12/11/2023

OpenAI’s latest update of its large language model (LLM), ChatGPT 4.0, has passed a clinical neurology exam with 85% correct answers in a proof-of-concept study. The research authors believe that after some fine-tuning, LLMs could have “significant applications” in clinical neurology.

The results of the experiment, conducted by a group of researchers from the University Hospital Heidelberg and the German Cancer Research Center in Heidelberg, were published on Dec. 7. The test, performed on May 31, featured two LLMs, ChatGPT 3.5 and its later version, ChatGPT 4.0.

The researchers used the bank of questions for a neurology exam from the American Board of Psychiatry and Neurology with a small cohort of questions from the European Board for Neurology.

Related: Google’s Gemini demo is now getting accused of being ‘fake’

While the older version of ChatGPT scored 66.8%, answering 1306 out of 1956 questions correctly, the more recent model, ChatGPT 4.0, gained 85% with 1662 correct answers. The average human score was 73.8%. ChatGPT 4.0 outperformed human users in behavioral, cognitive, and psychological–related questions and effectively “passed” the neurology exam, as 70% of correct answers are generally considered a passing score in educational institutions.

However, both models demonstrated weaker performance in tasks requiring “higher-order thinking” than questions requiring only “lower-order thinking.”

According to the group of researchers conducting the experiment, these results serve as a recommendation to use the LLMs in clinical neurology after some modifications:

“These findings suggest that with further refinements, large language models could have significant applications in clinical neurology.”

The researchers point out there are still several reservations. While there is a clear perspective for applying the LLMs in the documentation and decision-making support systems, neurologists should be cautious about their usage in practice, as they are still imperfect regarding high-order cognitive tasks. Speaking to Cointelegraph, one of the study’s authors, Dr. Varun Venkataramani, said:

We see our study more as a proof of concept for the capabilities of LLMs. There is still development needed and probably even specific fine-tuning of LLMs to make them properly applicable for clinical neurology.

AI is already working on some major tasks within healthcare, such as finding the cure for cancer for AstraZeneca or fighting the overprescription of antibiotics in Hong Kong.

Magazine: Lawmakers’ fear and doubt drives proposed crypto regulations in US

Source: Read Full Article