ChatGPT passed the Radiology Board exam

According to research published in the Journal of the Radiological Society of North America, the latest version of ChatGPT, an artificial intelligence chatbot designed to interpret speech and generate responses, successfully passed a radiology board-style exam that demonstrated both potential and limitations.

The latest version of ChatGPT passed a radiology board-style exam, highlighting the potential of large language models while also revealing the limitations that led to obstruction, according to two new studies published in Radiology, the journal of the Radiological Society of North America (RSNA). reliability.

ChatGPT is an artificial intelligence (AI) chatbot that uses a deep learning model to recognize patterns and relationships between words in large training data to generate human-like responses based on prompts. However, since there is no source of truth in the training data, the tool may actually produce answers that are incorrect.

“The use of major language models like ChatGPT is growing rapidly and will continue to increase,” said FRCPC lead author Rajesh Bhayana, abdominal radiologist and chief technologist at University Medical Imaging Toronto, Toronto General Hospital in Toronto, Canada. “Our work provides insight into the performance of ChatGPT in a radiology context, highlighting the incredible potential of large language models as well as the current limitations that make it unreliable.”

Dr. Bhayana noted that ChatGPT was recently named the fastest growing consumer app in history, and similar chatbots have been included in popular search engines like Google and Bing, which doctors and patients use to find health information.

Dr. Bhayana and colleagues first tested ChatGPT against GPT-3.5, currently the most widely used version, to evaluate its performance on radiology board exam questions and explore its strengths and limitations. Researchers used 150 multiple-choice questions designed to fit the style, content, and difficulty of the Royal College of Canada and American Board of Radiology exams.

The questions did not contain pictures and were grouped by question type to gain insight into performance: lower-order thinking (remembering information, basic understanding) and higher-order thinking (application, analysis, synthesis). Higher-order thinking questions were further subclassified by type (describing imaging findings, clinical management, calculation and classification, disease associations). ChatGPT’s performance was evaluated overall, as well as by question type and topic. Language confidence in responses was also assessed.

The researchers found that the GPT-3.5-based ChatGPT got 69% of the questions correct (104 out of 150), which is close to the 70% passing score used by King’s College in Canada. The model performed relatively well on questions that required low-level thinking (84%, 51/61%), but struggled with questions involving high-level thinking (60%, 53/89%). Specifically, she struggled with high-level questions about identifying the resulting images (61%, 28/46), calculating and classifying (25%, 2/8), and applying concepts (30%, 3 out of 10). His poor performance on higher-order thinking questions was not surprising given his previous training in radiology.

GPT-4 was released in a limited form to paid users in March 2023, specifically claiming to have advanced advanced reasoning capabilities over GPT-3.5.

In a follow-up study, the GPT-4 answered 81% (121/150) of the same questions correctly, surpassing the GPT-3.5 and exceeding the 70% pass threshold. GPT-4 outperformed the GPT-3.5 especially on higher-order thinking questions (81%) that included describing the results of images (85%) and applying concepts (90%).

The findings show that the alleged enhanced advanced reasoning capabilities of the GPT-4 contribute to improved performance in a radiology context. They also offer an enhanced contextual understanding of certain radiology terminology, including image annotations that are critical for other future applications.

Dr. “Our work shows impressive improvement in radiology performance of ChatGPT in a short period of time and highlights the growing potential of large language models in this context,” said Bhayana.

GPT-4 showed no improvement in low-level thinking questions (84% vs 80%), and gave incorrect answers to 12 questions that GPT-3.5 answered correctly, raising questions about the reliability of information collection.

Dr. “At first we were surprised by ChatGPT’s accurate and confident answers to some difficult radiology questions, but later we were equally surprised by some very illogical and incorrect statements,” Bhayana said. “Of course, the wrong answers shouldn’t be too surprising, given how these models work.”

ChatGPT’s dangerous tendency to produce false responses, called hallucinations, is less common than GPT-4, but still limits its use in medical education and practice at this time. Both studies found that ChatGPT consistently uses confident language, even if it’s wrong. Bhayana states that relying solely on information is particularly dangerous, especially for beginners who do not confidently accept wrong answers as wrong.

Source: Port Altele

Sandra

I’m Sandra Torres, a passionate journalist and content creator. My specialty lies in covering the latest gadgets, trends and tech news for Div Bracket. With over 5 years of experience as a professional writer, I have built up an impressive portfolio of published works that showcase my expertise in this field.

Recent Posts

An unprecedented ecosystem found under the ice of an Antarctic lake December 24, 2024

Vivo Y29 5G officially launched: entry level with LED Dynamic Light December 24, 2024

Can animals have schizophrenia like humans? December 24, 2024