Poster - 8
Generative Artificial Intelligence Accuracy in Interpreting Forest Plots in Pediatric Surgery Meta-analyses: A Perspective From Pediatric Surgery Meta-analysis Study Group (PESMA)
Mustafa Azizoğlu 1, Maria Escolino 2, Tahsin Onat Kamçı 3, Sergey Klyuev 4, Sonia Perez Bertolez 5, Toni Risteksi 6, Ismael Elhaleby 7, Nitinkumar Borkar 8, Ciro Esposito 9, Mehmet Hanifi Okur 10, Martin Lacher 11, Annika Mutanen 12, Sameh Shehata 13, Fabio Chiarenza 14, Mark Davenport 15
1 Esenyurt Necmi Kadioglu State Hospital, Dep of Pediatric Surgery, Istinye University, Dep of Stem Cell and Tissue Engineering & 3D Bioprinting, Istanbul, Turkey
2 Federico II University, Department of Pediatric Surgery, Naples, Italy
3 Tatvan State Hospital, Department of Pediatric Surgery, Bitlis, Turkey
4 AO GK MEDSI, Department of Pediatric Surgery, Moscow, Turkey
5 Pediatric Urology Unit, Department of Pediatric Surgery, Hospital Sant Joan de Déu, Universitat de Barcelona, Barcelona, Spain
6 Ss. Cyril and Methodius University of Skopje, Faculty of Medicine, Department of Pediatric Surgery, Skopje, Macedonia
7 Department of Pediatric Surgery, Faculty of Medicine, Tanta University, Tanta, Egypt
8 Department of Paediatric Surgery, AIIMS, Raipur, Chhattisgarh, India
9 Pediatric Surgery Unit, Federico II University Naples, Naples, Italy
10 Department of Pediatric Surgery, Faculty of Medicine Balıkesir Üniversity, Balıkesir, Turkey
11 Department of Pediatric Surgery, University Hospital Leipzig, Leipzig, Germany
12 Department of Pediatric Surgery, The New Children's Hospital, Helsinki University Hospital and University of Helsinki, Helsinki, Finland
13 Department of Pediatric Surgery, Faculty of Medicine, Alexandria University, Alexandria, Egypt
14 Department of Pediatric Surgery, Pediatric Minimally Invasive Surgery and New Technologies, San Bortolo Hospital, Vicenza, Italy
15 Department of Paediatric Surgery, Kings College Hospital, London, UK
Aim: Pediatric surgery trainees and newly specialized surgeons often find it difficult to accurately analyze forest plots. Interpreting elements such as study weights, risk ratio, odds ratio, random effects, fixed effects, and heterogeneity, as well as understanding the implications of the graphical distribution of results, can be challenging. Therefore, in this study, we aimed to evaluate ChatGPT-4’s accuracy in interpreting forest plots from published pediatric surgery meta-analyses.
Methods: To address this issue, we analyzed published meta-analyses in pediatric surgery and examined the forest plots within these studies. We randomly selected 20 forest plots from each pediatric surgery subspecialty and used ChatGPT-4 to interpret them (Totally 140 forest plots). The forest plots obtained from published articles were first reviewed by two pediatric surgeons and two statisticians to ensure the accuracy of the published data. Subsequently, the responses generated by the AI were also evaluated by the same group—two pediatric surgeons and two statisticians—and the accuracy of the AI-generated interpretations was thoroughly discussed.
Results: The results demonstrated a high level of accuracy across all subspecialties. In general pediatric surgery, pediatric oncological surgery, pediatric thoracic surgery, and pediatric gastrointestinal surgery, ChatGPT provided correct interpretations in all cases (100 %). In pediatric hepatobiliary surgery and pediatric traumatology, the accuracy was slightly lower, with 95 % correct interpretations and one incorrect response (5 %) in each category. Pediatric urology had the lowest accuracy, with 90 % correct interpretations and two incorrect responses (10 %). Overall, ChatGPT correctly interpreted 136 out of 140 forest plots, yielding an overall accuracy rate of 97 %.
Conclusions: The analysis highlights the significant potential of AI in meta-analysis interpretation; however, there remains a lack of standardized guidelines for its practical integration into research and clinical practice. Establishing clear protocols is essential to ensure accuracy, consistency, and appropriate human oversight in AI-assisted statistical analysis. Developing best practices for AI-human collaboration and incorporating AI training into pediatric surgical education would enhance the reliability and applicability of these tools, ultimately strengthening evidence-based decision-making in the field.
Pediatrik Cerrahi Meta-analizlerinde Generatif Yapay Zekânın Forest Plot Yorumlamadaki Doğruluğu: Pediatrik Cerrahi Meta-analiz Çalışma Grubu (PESMA) Perspektifi
Mustafa Azizoğlu 1, Maria Escolino 2, Tahsin Onat Kamçı 3, Sergey Klyuev 4, Sonia Perez Bertolez 5, Toni Risteksi 6, Ismael Elhaleby 7, Nitinkumar Borkar 8, Ciro Esposito 9, Mehmet Hanifi Okur 10, Martin Lacher 11, Annika Mutanen 12, Sameh Shehata 13, Fabio Chiarenza 14, Mark Davenport 15
1 Esenyurt Necmi Kadioglu State Hospital, Dep of Pediatric Surgery, Istinye University Dep of Stem Cell, and Tissue Engineering & 3D Bioprinting, Istanbul, Turkey
2 Federico II University, Department of Pediatric Surgery, Naples, Italy
3 Tatvan State Hospital, Department of Pediatric Surgery, Bitlis, Turkey
4 AO GK MEDSI, Department of Pediatric Surgery, Moscow, Turkey
5 Pediatric Urology Unit, Department of Pediatric Surgery, Hospital Sant Joan de Déu, Universitat de Barcelona, Barcelona, Spain
6 Ss. Cyril and Methodius University of Skopje, Faculty of Medicine, Department of Pediatric Surgery, Skopje, Macedonia
7 Department of Pediatric Surgery, Faculty of Medicine, Tanta University, Tanta, Egypt
8 Department of Paediatric Surgery, AIIMS, Raipur, Chhattisgarh, India
9 Pediatric Surgery Unit, Federico II University Naples, Naples, Italy
10 Balıkesir Üniversitesi Çocuk Cerrahisi ABD
11 Department of Pediatric Surgery, University Hospital Leipzig, Leipzig, Germany
12 Department of Pediatric Surgery, The New Children's Hospital, Helsinki University Hospital and University of Helsinki, Helsinki, Finland
13 Department of Pediatric Surgery, Faculty of Medicine, Alexandria University, Alexandria, Egypt
14 Department of Pediatric Surgery, Pediatric Minimally Invasive Surgery and New Technologies, San Bortolo Hospital, Vicenza, Italy
15 Department of Paediatric Surgery, Kings College Hospital, London, UK
Amaç: Pediatrik cerrahi asistanları ve yeni uzmanlaşmış cerrahlar, forest plot analizinde ve yorumlamada sıklıkla zorluk yaşamaktadır. Çalışma ağırlıkları, risk oranı, odds oranı, rastgele etkiler, sabit etkiler ve heterojenite gibi unsurları yorumlamak ve sonuçların grafiksel dağılımının anlamını kavramak zorlayıcı olabilir. Bu nedenle bu çalışmada, yayımlanmış pediatrik cerrahi meta-analizlerinden elde edilen forest plotların yorumlanmasında ChatGPT-4’ün doğruluğunu değerlendirmeyi amaçladık.
Yöntem: Bu sorunu ele almak amacıyla pediatrik cerrahide yayımlanmış meta-analizleri inceledik ve bu çalışmalardaki forest plotları analiz ettik. Her pediatrik cerrahi alt dalından rastgele 20 forest plot seçtik ve bunları ChatGPT-4 ile yorumladık (toplamda 140 forest plot). Yayımlanmış makalelerden elde edilen forest plotlar önce iki pediatrik cerrah ve iki istatistikçi tarafından incelenerek verilerin doğruluğu teyit edildi. Daha sonra yapay zekâ tarafından oluşturulan yanıtlar aynı grup tarafından yeniden değerlendirildi ve yapay zekânın ürettiği yorumların doğruluğu ayrıntılı olarak tartışıldı.
Bulgular: Sonuçlar, tüm alt dallarda yüksek bir doğruluk düzeyini ortaya koydu. Genel pediatrik cerrahi, pediatrik onkolojik cerrahi, pediatrik torasik cerrahi ve pediatrik gastrointestinal cerrahide ChatGPT tüm vakalarda doğru yorum sağladı (%100). Pediatrik hepatobiliyer cerrahi ve pediatrik travmatolojide doğruluk oranı biraz daha düşüktü; her kategoride %95 doğru yorum ve %5 yanlış yanıt görüldü. Pediatrik üroloji ise en düşük doğruluğa sahipti, %90 doğru yorum ve %10 yanlış yanıt ile sonuçlandı. Genel olarak, ChatGPT 140 forest plotun 136’sını doğru yorumlayarak %97 genel doğruluk oranına ulaştı.
Sonuç: Analiz, meta-analiz yorumlamada yapay zekânın önemli potansiyelini vurgulamaktadır; ancak, araştırma ve klinik uygulamada pratik entegrasyonu için hâlâ standartlaştırılmış kılavuzlar eksiktir. Doğruluk, tutarlılık ve uygun insan denetimini sağlamak için açık protokollerin oluşturulması esastır. Yapay zekâ-insan iş birliği için en iyi uygulamaların geliştirilmesi ve pediatrik cerrahi eğitimine yapay zekâ eğitiminin entegre edilmesi, bu araçların güvenilirliğini ve uygulanabilirliğini artırarak alanda kanıta dayalı karar vermeyi güçlendirecektir.