Responses of Different Artificial Intelligence Systems to Questions Related with Short Stature as Assessed by Pediatric Endocrinologists

Kamber Kaşali; Özgür Fırat Özpolat; Merve Ülkü; Ayşe Sena Dönmez; Serap Kılıç Kaya; Esra Dişçi; Serkan Bilge Koca; Ufuk Özkaya; Hüseyin Demirbilek; Atilla Çayır

doi:10.4274/jcrpe.galenos.2025.2025-6-14

PDF

Cite

Request

Original Article

E-PUB

2 September 2025

Responses of Different Artificial Intelligence Systems to Questions Related with Short Stature as Assessed by Pediatric Endocrinologists

J Clin Res Pediatr Endocrinol. Published online 2 September 2025.

DOI: 10.4274/jcrpe.galenos.2025.2025-6-14

Kamber Kaşali ¹

Özgür Fırat Özpolat ²

1. Department of Biostatistics, Atatürk University Faculty of Medicine, Erzurum, Türkiye

2. Data Management Office, Atatürk University, Erzurum, Türkiye

3. Erzurum City Hospital, Clinic of Pediatric Endocrinology, Erzurum, Türkiye

4. Erzurum City Hospital, Clinic of Pediatrics, Erzurum, Türkiye

5. Department of Pediatrics, Division of Pediatric Endocrinology, Kayseri City Training and Research Hospital, Kayseri, Türkiye

6. Hacettepe University Faculty of Medicine, Department of Pediatric Endocrinology, Ankara, Türkiye

7. Atatürk University Faculty of Medicine, Department of Pediatric Endocrinology, Erzurum, Türkiye

No information available.

No information available

Received Date: 11.06.2025

Accepted Date: 14.08.2025

E-Pub Date: 02.09.2025

PDF

Cite

Request

Abstract

Objective

Artificial intelligence (AI) is increasingly utilized in medicine, including pediatric endocrinology. AI models have the potential to support clinical decision-making, patient education, and guidance. However, their accuracy, reliability, and effectiveness in providing medical information and recommendations remain unclear. This study aims to evaluate and compare the performance of four AI models—ChatGPT, Bard, Microsoft Copilot, and Pi—in answering frequently asked questions related to pediatric endocrinology.

Methods

Nine questions commonly asked by parents regarding short stature in paediatric endocrinology have been selected based on literature reviews and expert opinions. These questions were posed to four AI models in both Turkish and English. The AI-generated responses were evaluated by 10 pediatric endocrinologists using a 12-item Likert-scale questionnaire assessing medical accuracy, completeness, guidance, and informativeness. Statistical analyses, including Kruskal-Wallis and post-hoc tests, were conducted to determine significant differences between AI models.

Results

Bard outperformed other models in guidance and recommendation categories, excelling in directing users to medical consultation. Microsoft Copilot demonstrated strong medical accuracy but lacked guidance capacity. ChatGPT showed consistent performance in knowledge dissemination, making it effective for patient education. Pi scored the lowest in guidance and recommendations, indicating limited applicability in clinical settings. Significant differences were observed among AI models (p < 0.05), particularly in completeness and guidance-related categories.

Conclusion

The study highlights the varying strengths and weaknesses of AI models in pediatric endocrinology. While Bard is effective in guidance, Microsoft Copilot excels in accuracy, and ChatGPT is informative. Future AI improvements should focus on balancing accuracy and guidance to enhance clinical decision-support and patient education. Tailored AI applications may optimize AI’s role in specialized medical fields.

Keywords:

Pediatric Endocrinology, Artificial Intelligence (AI), Clinical Decision Support, Medical Informatics