Researchers tested three popular AI chatbots (ChatGPT, Google Gemini, and Claude) to see how well they answer common questions about celiac disease and gluten-free diets. Six celiac disease experts rated the chatbots’ answers on how clear, accurate, and complete they were. The results were encouraging: the chatbots scored an average of 4.3 out of 5 points, with especially strong performance on practical management questions. However, the study found some differences between the chatbots and some less accurate answers about diagnosis, suggesting that AI tools can be helpful for learning about celiac disease but shouldn’t replace talking to your doctor.
The Quick Take
- What they studied: How well three popular AI chatbots answer common questions that people with celiac disease and their families ask
- Who participated: The study didn’t involve patients. Instead, six pediatric gastroenterologists (doctors who specialize in celiac disease) reviewed and rated the chatbots’ answers to 12 frequently asked questions about celiac disease
- Key finding: AI chatbots gave clear and mostly accurate answers to celiac disease questions, scoring 4.3 out of 5 on average. They were especially good at answering practical questions about managing a gluten-free diet, but less reliable when answering questions about how celiac disease is diagnosed
- What it means for you: AI chatbots like ChatGPT and Google Gemini can be helpful tools for learning about celiac disease and gluten-free living. However, they’re not perfect and shouldn’t replace conversations with your doctor, especially when it comes to diagnosis and treatment decisions
The Research Details
Researchers selected 12 common questions that people with celiac disease frequently ask online. They submitted these same 12 questions to three different AI chatbots: ChatGPT-4, Google Gemini Flash 2.5, and Claude Sonnet 3.7. Six expert doctors in celiac disease independently read all the chatbot responses and rated each one on four things: how accurate it was, how complete it was, how clear it was, and overall quality. Each response was rated on a scale of 1 to 5, with 5 being the best.
This type of study is called a cross-sectional evaluation, which means the researchers looked at all the information at one point in time rather than following people over weeks or months. The doctors didn’t know which chatbot gave which answer, which helps prevent bias in their ratings.
The researchers also checked whether the six doctors agreed with each other in their ratings. When multiple raters agree, it shows the ratings are reliable and trustworthy.
As more people turn to the internet for health information, it’s important to know whether AI tools give reliable answers. This study matters because it tests whether these popular chatbots can be trusted as educational resources for a serious lifelong condition like celiac disease. Understanding the strengths and weaknesses of AI in medical education helps patients and doctors decide when and how to use these tools safely
The study had several strengths: expert doctors did the rating (not just anyone), the doctors agreed well with each other on their ratings (intraclass correlation coefficient = 0.74, which is considered good), and the evaluation was systematic and fair. However, the study only tested 12 questions, which is a relatively small sample. The study also didn’t test how actual patients would understand or use this information. Additionally, AI chatbots change and improve frequently, so results from early 2026 may not reflect how these tools perform today
What the Results Show
Overall, the three AI chatbots performed well, with an average score of 4.3 out of 5 across all questions and all models. When broken down by category, clarity was the strongest area (4.56 out of 5), meaning the chatbots explained things in understandable ways. Accuracy came next (4.26 out of 5), showing the information was mostly correct. Completeness scored 4.17 out of 5, meaning the answers covered most important points. Overall quality was rated 4.20 out of 5.
There was an important difference between types of questions: answers about managing celiac disease and following a gluten-free diet scored significantly higher (4.4 out of 5) compared to answers about how celiac disease is diagnosed (4.2 out of 5). This suggests the chatbots are better at practical advice than at explaining medical testing and diagnosis.
When comparing the three chatbots, Google Gemini received the highest ratings overall, performing significantly better than the other two models. This suggests that different AI tools have different levels of reliability for medical information.
The study found that the six expert doctors generally agreed with each other in their ratings, which is important because it means the evaluation was consistent and fair. The variability in accuracy scores (ranging from 4.26 to 4.56) shows that while the chatbots were generally reliable, there was still room for improvement and some inconsistency in how well they answered different types of questions
This appears to be one of the first studies specifically testing AI chatbots’ ability to answer celiac disease questions. Previous research has shown that AI tools perform well on some medical topics but less well on others. This study adds to growing evidence that AI can be a useful educational tool but shouldn’t be the only source of medical information. The findings align with other research showing that AI is better at practical information than at complex diagnostic explanations
The study only evaluated 12 questions, which may not represent all the questions people with celiac disease ask. The study used expert doctors to rate the answers, but didn’t test whether actual patients with celiac disease would find the answers helpful or understandable. AI chatbots are constantly being updated and improved, so these results may change over time. The study was published in early 2026, and newer versions of these chatbots may perform differently. Additionally, the study didn’t evaluate whether people would actually follow the advice given by the chatbots or whether following that advice would improve their health
The Bottom Line
AI chatbots like ChatGPT, Google Gemini, and Claude can be helpful tools for learning about celiac disease and gluten-free living, particularly for practical questions about diet management (moderate confidence). However, they should not replace conversations with your doctor, especially for questions about diagnosis, testing, or treatment decisions (high confidence). If you use AI chatbots for health information, it’s a good idea to verify important information with your healthcare provider (high confidence)
People with celiac disease and their families may find AI chatbots helpful for quick answers to common questions about gluten-free living. Healthcare providers should be aware that patients may be using these tools and should be prepared to discuss and verify the information. People newly diagnosed with celiac disease should use these tools as supplementary resources, not replacements for professional medical advice. People with complicated celiac disease cases or other health conditions should rely more heavily on their doctors than on AI tools
You could start using AI chatbots immediately for general information about celiac disease and gluten-free diets. However, don’t expect them to replace your doctor’s personalized advice. If you’re using AI tools to help manage your celiac disease, check in with your healthcare provider regularly (at least at your scheduled appointments) to make sure the information you’re getting is working well for your specific situation
Want to Apply This Research?
- Track which AI chatbot questions you ask and which answers you verify with your doctor. Create a simple log noting: (1) the question asked, (2) which chatbot you used, (3) whether you verified the answer with your doctor, and (4) whether the information was helpful. This helps you learn which tools and questions are most reliable for your needs
- Use AI chatbots as a first step for learning about celiac disease management, but always flag important health decisions to discuss with your doctor. For example, if an AI tool suggests a dietary change, note it in your app and bring it up at your next appointment. This creates a helpful bridge between quick AI answers and professional medical guidance
- Over time, track which types of questions the AI tools answer well for you and which ones need doctor verification. Create categories like ’label reading,’ ‘restaurant dining,’ ‘cross-contamination prevention,’ and ‘symptom management.’ Rate how helpful each AI answer was after you’ve tried the advice. This personalized tracking helps you understand which AI tools work best for your specific needs and builds your confidence in knowing when to trust AI versus when to ask your doctor
This study evaluated how well AI chatbots answer questions about celiac disease, but it does not provide medical advice. AI chatbots should not be used as a substitute for professional medical diagnosis, treatment, or advice from a qualified healthcare provider. If you have celiac disease or suspect you might have it, please consult with a doctor or gastroenterologist. Always verify important health information with your healthcare provider before making decisions about your diet or treatment. The accuracy and capabilities of AI chatbots change frequently, and this study reflects performance at a specific point in time. Individual responses from AI tools may vary, and some answers may contain errors or incomplete information.
