ChatGPT-4 Performs Clinical Information Retrieval Tasks Utilizing Consistently More Trustworthy Resources Than Does Google Search for Queries Concerning the Latarjet Procedure.

TitleChatGPT-4 Performs Clinical Information Retrieval Tasks Utilizing Consistently More Trustworthy Resources Than Does Google Search for Queries Concerning the Latarjet Procedure.
Publication TypeJournal Article
Year of Publication2024
AuthorsOeding JF, Lu AZ, Mazzucco M, Fu MC, Taylor SA, Dines DM, Warren RF, Gulotta LV, Dines JS, Kunze KN
JournalArthroscopy
Date Published2024 Jun 25
ISSN1526-3231
Abstract

PURPOSE: To assess the ability for ChatGPT-4, an automated Chatbot powered by artificial intelligence (AI), to answer common patient questions concerning the Latarjet procedure for patients with anterior shoulder instability and compare this performance to Google Search Engine.

METHODS: Using previously validated methods, a Google search was first performed using the query "Latarjet." Subsequently, the top ten frequently asked questions (FAQs) and associated sources were extracted. ChatGPT-4 was then prompted to provide the top ten FAQs and answers concerning the procedure. This process was repeated to identify additional FAQs requiring discrete-numeric answers to allow for a comparison between ChatGPT-4 and Google. Discrete, numeric answers were subsequently assessed for accuracy based on the clinical judgement of two fellowship-trained sports medicine surgeons blinded to search platform.

RESULTS: Mean (±standard deviation) accuracy to numeric-based answers were 2.9±0.9 for ChatGPT-4 versus 2.5±1.4 for Google (p=0.65). ChatGPT-4 derived information for answers only from academic sources, which was significantly different from Google Search Engine (p=0.003), which used only 30% academic sources and websites from individual surgeons (50%) and larger medical practices (20%). For general FAQs, 40% of FAQs were found to be identical when comparing ChatGPT-4 and Google Search Engine. In terms of sources used to answer these questions, ChatGPT-4 again used 100% academic resources, while Google Search Engine used 60% academic resources, 20% surgeon personal websites, and 20% medical practices (p=0.087).

CONCLUSION: ChatGPT-4 demonstrated the ability to provide accurate and reliable information about the Latarjet procedure in response to patient queries, using multiple academic sources in all cases. This was in contrast to Google Search Engine, which more frequently used single surgeon and large medical practice websites. Despite differences in the resources accessed to perform information retrieval tasks, the clinical relevance and accuracy of information provided did not significantly differ between ChatGPT-4 and Google Search Engine.

DOI10.1016/j.arthro.2024.05.025
Alternate JournalArthroscopy
PubMed ID38936557

Person Type: