Artificial intelligence in congenital heart surgery: a scoping review and primer for surgeons

Jeevan Francis; Asmita Singhania; Sarah Dawson; Massimo Caputo; Jelena Savovic; Serban Stoica

doi:10.21037/tp-2026-0345

Review Article

Artificial intelligence in congenital heart surgery: a scoping review and primer for surgeons

Jeevan Francis^1,2, Asmita Singhania³, Sarah Dawson^4,5, Massimo Caputo³, Jelena Savovic^4,5, Serban Stoica⁶

¹Department of Cardiac Surgery, St Thomas’ Hospital, London, UK; ²Faculty of Medicine, King’s College London, London, UK; ³Bristol Heart Institute, University Hospitals Bristol NHS Foundation Trust, Bristol, UK; ⁴Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK; ⁵NIHR Applied Research Collaboration West (ARC West), University Hospitals Bristol and Weston NHS Foundation Trust, Bristol, UK; ⁶Department of Paediatric Cardiac Surgery, Bristol Royal Children’s Hospital, Bristol, UK

Contributions: (I) Conception and design: All authors; (II) Administrative support: None; (III) Provision of study materials or patients: NA; (IV) Collection and assembly of data: J Francis, A Singhania; (V) Data analysis and interpretation: J Francis, A Singhania, S Dawson, J Savovic, S Stoica; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Jeevan Francis, MBChB, MPH, BSc (Hons), PGCert. Department of Cardiac Surgery, St Thomas’ Hospital, Westminster Bridge Road, London, SE1 7EH, UK; Faculty of Medicine, King’s College London, London, UK. Email: Jeevanfrancis15@gmail.com.

Background: Congenital heart surgery (CHS) encompasses a wide spectrum of complex cardiac defects, many of which demand specialised perioperative management and tailored surgical planning. Artificial intelligence (AI), including machine learning (ML), is gaining prominence as a tool to optimise clinical decision-making and achieve better outcomes. This scoping review aims to map and summarise the existing applications of AI modalities in CHS.

Methods: A comprehensive search of MEDLINE, Embase, and Web of Science was performed, combining terms for AI with terms for congenital heart disease and surgery.

Results: A total of 2,871 articles were retrieved from the search, of which 93 studies were included. The majority of studies focused on outcome prediction and imaging-based applications. Smaller proportions addressed decision-making and data augmentation, omics integration, benchmarking and quality improvement. The majority of studies examined heterogeneous congenital heart disease (CHD) populations, with tetralogy of Fallot (TOF) and single ventricle physiology most frequently represented.

Conclusions: AI applications in CHS are rapidly expanding across diverse domains, with early studies showing encouraging potential to support diagnostics, guide surgical decision-making, and improve perioperative outcomes. However, most models remain in the preliminary stage with limited external validation. Emerging advances in AI may further accelerate progress, but careful evaluation and integration are essential to translate this promise into tangible clinical benefits.

Keywords: Artificial intelligence (AI); congenital heart surgery (CHS); machine learning (ML); deep learning (DL)

Submitted Apr 06, 2026. Accepted for publication May 26, 2026. Published online Jun 25, 2026.

doi: 10.21037/tp-2026-0345

Highlight box

Key findings

• Machine learning and deep learning models showed promising performance in predicting mortality, complications, and resource utilisation.

• Generative artificial intelligence (AI) and large language models remain early in development but may support data augmentation and clinical decision-making.

What is known and what is new?

• AI has increasingly been explored in cardiovascular medicine, particularly for imaging and predictive analytics. Recent literature in this field focused on cardiology applications rather than congenital cardiac surgery-specific relevance.

• This review provides a focused overview and practical primer on AI applications directly relevant to congenital heart surgeons, including perioperative risk stratification, imaging-guided decision support, and emerging generative AI tools.

What is the implication and what should change now?

• AI may enhance personalised perioperative care, improve surgical decision-making, and increase efficiency within congenital cardiac programmes.

• Future work should prioritise multicentre collaboration, external validation, explainable AI frameworks, and integration into clinical workflows.

Introduction

The management of congenital heart diseases (CHDs) has seen significant improvements over the past few decades; however, congenital heart surgery (CHS) remains a speciality with considerable complexity and risk. Surgical decision-making in CHD must account for the heterogeneous anatomies, the patient’s growth and development, and dynamic perioperative physiology (1). In this context, artificial intelligence (AI) can modulate complex and vast data into actionable insights, supporting more precise surgical strategies.

From a technical perspective, AI is not a single technology but a hierarchy of computational approaches, each with distinct capabilities and limitations. Table 1 provides an overview of AI, the different subtypes, and their potential applications in CHS. These approaches also vary in their limitations; for example, machine learning (ML) is limited by the quality of clinical coding, deep learning (DL) demands large and curated datasets, large language models (LLMs) require safeguards against erroneous outputs, and generative models risk producing artificial data that may not accurately reflect clinical reality.

Table 1

AI methods and brief glossary

Term	Description	Potential applications
AI	AI is an umbrella term for computer systems designed to replicate aspects of human reasoning, allowing to tackle complex problems that typically require intelligence and judgment	AI-driven software can interpret medical images, predict patient outcomes, or optimise treatment plans by “learning” from large clinical datasets
ML	ML is an umbrella term for algorithms that learn patterns from data rather than following explicit instructions. ML algorithms encompass supervised, unsupervised and reinforcement approaches, whereby they learn from labelled data to predict outcomes, from unlabelled data to uncover hidden structure, or from trial-and-error feedback to optimise decisions, which allows them to detect patterns in complex datasets and generate predictive outputs (2)	ML models are used to predict risk by automatically analysing combinations of patient variables. Possible applications also include identifying novel risk factors for poor post-operative outcomes within the heterogeneous CHD populations
DL	Another subset of ML is DL, which employs multilayered neural networks to automatically extract relevant features from data, enabling the recognition of complex patterns without the need for manual feature engineering (3)	DL algorithms (e.g., convolutional neural networks) can analyse scans, or pathology slides to detect subtle abnormalities (tumours, lesions, etc.) with high accuracy using pattern recognition
Generative AI	A computational method that can create new content (text, images, data) resembling the patterns in their training data. Generative models include GANs, foundational models, and LLMs. They can synthesise realistic data while preserving key statistical properties. Foundational models are built by training on large and diverse datasets, giving it a general understanding that can be reused and adapted for many different purposes. LLMs such as OpenAI’s ChatGPT, are trained on large collections of text, and can be used for natural language processing and evidence synthesis following a prompt (4). A prompt is the input given to a system that guides the output it produces. Unlike ML, LLMs require prompts, especially when developing training data which is the information used to teach an AI system how to recognise patterns and make predictions. Inference is the process where a trained AI system uses what it has learned to give an answer or generate an output from new information	Generative AI is used to produce synthetic medical data for research and training, e.g., generating images to augment radiology datasets, or creating synthetic health records to protect patient privacy while enabling data analysis for clinical trials

AI, artificial intelligence; CHD, congenital heart disease; DL, deep learning; GAN, generative adversarial network; LLM, large language model; ML, machine learning.

Early applications of AI in surgery have shown promise across domains, from outperforming clinicians in repetitive pattern-recognition tasks such as image analysis to incorporating far more variables than traditional methods in risk prediction models (5). In cardiothoracic surgery, recent reviews have highlighted that AI techniques may enhance diagnosis and risk assessment, aid intraoperative guidance, and enable personalised treatment planning (1,6). More recently, LLMs have expanded this scope by leveraging unstructured clinical text to improve perioperative risk prediction and decision support, for example, by utilising GenAI to predict acute kidney injury following paediatric cardiopulmonary bypass (7).

However, this excitement is tempered by key challenges ahead, including the need for high-quality data, transparent algorithms, rigorous validation, and seamless integration into clinical workflows. A central concern is model interpretability, as purely “black box” predictions may limit clinical trust—surgeons will rightly want to understand the reasoning behind a risk score before relying on it. Emerging foundation models address part of this challenge by incorporating elements of reasoning and explainability, which makes them more acceptable to clinicians who retain ultimate responsibility for decisions (8). In the high-stakes environment of CHS, this reinforces the need for rigorous validation, ethical oversight, and robust governance frameworks before AI tools can be safely embedded.

Importantly, the application of AI in CHS cannot be viewed in isolation from CHD care or paediatric cardiology more broadly. In contemporary practice, surgical decision-making is intrinsically multidisciplinary, spanning diagnostic imaging, catheter-based intervention, and operative management (9). Many AI tools influencing surgical strategy originate within cardiology or imaging domains, e.g., models predicting CHD morphology and suitability for transcatheter closure. This overlap reflects how CHS functions not as a standalone discipline, but as an integrated component of congenital cardiac care. Accordingly, although this review focuses on studies with direct relevance to surgical decision-making, it inevitably captures work developed at the interface between cardiology, imaging, and surgery.

Published studies have called attention to the promise of AI in CHD, including theoretical opportunities and obstacles (9). What is less clear is the current state of the evidence: which specific clinical problems have these methods been applied to, what techniques have been most commonly used, and what insights or improvements have they yielded. To address these questions, and by deliberately positioning AI as both a methodological primer for surgeons and an emerging clinical innovation, we undertook a scoping review of studies applying AI-driven techniques in CHS, with the dual aim of informing cardiac surgeons and researchers on the current capabilities of AI and related tools in CHS, as well as the limitations that must be overcome for these innovations to translate into improved patient outcomes. We present this article in accordance with the PRISMA-ScR reporting checklist (available at https://tp.amegroups.com/article/view/10.21037/tp-2026-0345/rc).

Methods

We followed a scoping review framework to identify and map research on AI in CHS. The search strategy was developed in consultation with a medical librarian and designed to be highly sensitive to studies involving AI computational techniques applied in CHD surgical care. We searched MEDLINE, Embase, and Web of Science from 1946 to October 1, 2025. The search consisted of three main concept blocks: (I) AI and computational techniques; (II) congenital heart disease; and (III) cardiac surgery.

The first search block utilised the MeSH term “exp Artificial Intelligence/” in combination with an extensive range of keywords encompassing ML and related methodologies. These included terms for general approaches (e.g., “machine learning”, “deep learning”, “neural network*”, “random forest”, “support vector machine”, “pattern recognition”, “clustering algorithm”, “autoencoder”, “transformer”) as well as specific algorithm names and commonly used platforms (e.g., “XGBoost”, “ChatGPT”, “large language model*”). A broad spectrum of synonyms and algorithm classes was incorporated to account for the rapidly evolving nomenclature in this domain.

The second block targeted congenital cardiac conditions, combining the MeSH term “exp Heart Defects, Congenital/” with title and keyword searches for specific lesions and syndromes, including generic formulations such as “(congenital adj4 heart adj4 defect)*” and individual conditions (e.g., “tetralogy of Fallot”, “hypoplastic left heart syndrome”, “transposition of the great arteries”).

The third block addressed surgical or interventional contexts, using MeSH terms (e.g., “exp Cardiovascular Surgical Procedures/”) and keywords including “surgery”, “operative”, and “transplantation”.

These three blocks were combined with the AND operator. No language restrictions were applied. In addition, references of included articles were reviewed to identify any potentially missed studies. Figure 1 outlines the Preferred Reporting Items for Systematic Reviews and Meta-Analysis flow diagram.

Figure 1 PRISMA flowchart.

All records were imported into Rayyan (10), and duplicates were removed. A two-stage screening process was undertaken. Studies focusing exclusively on adult cardiac surgery without relevance to CHDs were excluded. Screening was conducted by two independent reviewers (J.F. & A.S.), with disagreements resolved through discussion or by consultation with a third reviewer (S.S.).

The studies were collated into descriptive thematic categories according to their primary focus: imaging and diagnostics, risk prediction and prognostic modelling, surgical planning and intraoperative support, and postoperative outcome evaluation and quality improvement. Some studies spanned more than one theme and were therefore classified by their dominant application.

Results

The search identified 2,871 studies before de-duplication. After screening titles and abstracts, 139 articles were assessed for full-text review; 93 met the eligibility criteria and were included in the final analysis. A detailed summary of all included studies, including their full references, is provided in Table 1 and in Table S1.

The majority of studies utilised ML algorithms, including random forest, gradient boosting, and tree analysis. Supervised ML algorithms were commonly applied to clinical datasets to predict outcomes, including mortality, morbidity, and resource utilisation. Over half of the included studies on “traditional” ML algorithms often compare the performance of several models to each other or to conventional logistic regression as a baseline. A smaller subset of studies explored DL methods [e.g., neural networks and convolutional neural networks (CNNs)], primarily in imaging-focused applications where large datasets of images were available for model training. Only a minority of studies applied unsupervised learning techniques. This reflects that most research to date has been geared toward prediction or classification tasks with known outcomes, whereas unsupervised approaches, which may be used for discovering novel patient subgroups or phenotypes, have been less common. A few studies combined supervised and unsupervised techniques, for instance, using clustering to pre-group patients before building a predictive model, though such hybrid approaches were rare. Figure 2 illustrates the distribution of AI algorithm types used across the studies.

Figure 2 AI algorithms utilised in the included studies. AI, artificial intelligence.

The populations and CHD studied were not evenly spread across studies. For example, tetralogy of Fallot (TOF) was commonly studied in imaging analysis, whereas single ventricle pathology was evaluated most in risk modelling. Over 50% of the included studies used classic ML algorithms (decision trees, random forests, gradient boosting machines, support vector machines, etc.), often comparing several such algorithms to conventional logistic regression. Some studies used DL techniques, but primarily in imaging-based studies. Figure 3 highlights CHD types observed in each study.

Figure 3 Distribution of CHDs in the included studies. ASD, atrial septal defect; CHD, congenital heart disease; MV, mitral valve; SV, single ventricle; TGA, transposition of the great arteries; TOF, tetralogy of Fallot; VSD, ventricular septal defect.

Analysis of the included studies also revealed recurring thematic focuses on the application of AI to CHS. The largest proportion of studies centred on outcome prediction, reflecting efforts to use computational models to anticipate survival, complications, or long-term prognosis. A further 10% focused on risk stratification tools, where, unlike outcome prediction, the models classified or clustered patients into categories of relative risk, providing actionable groupings rather than individualised probabilities. Imaging-based applications, including echocardiography, computed tomography (CT), cardiac magnetic resonance (CMR), and electrocardiogram (ECG), accounted for 15% of the literature. More specialised applications included integration of omics data (6.7%) and benchmarking institutional performance (3.3%). This distribution highlights the breadth of methodological exploration but also illustrates that most AI work to date has concentrated on outcomes and imaging, with fewer studies addressing surgical guidance or perioperative decision-support.

Discussion

This scoping review confirms that interest in AI/ML applications in CHS has surged in recent years, with most included studies being published in the last 5 years. However, it also reveals that the field is at an early stage: many studies are proof-of-concept or retrospective analyses, with relatively few prospective validations or real-world clinical implementations.

Risk stratification

Risk stratification is one of the most intuitive and potentially impactful domains for AI in CHS. Surgeons are familiar with conventional scoring systems such as Aristotle or RACHS (11), which simplify complex cases into linear equations based on a small set of predefined variables. While these tools offer transparency, they often mask the nonlinear, dynamic, and interacting nature of real perioperative physiology. In contrast, AI methods can simultaneously process thousands of perioperative data points, capture nonlinear interactions, and generate predictions that update dynamically in response to new information from the operating room or intensive care unit (ICU) (12). This creates the possibility of replacing static, population-based scores with continuously learning systems that deliver personalised, time-sensitive risk profiles. Crucially, many models now incorporate interpretable features such as highlighting which variables influenced the results significantly or showing rule-based explanations—this not only reveals the prediction but also the reasoning behind it.

Most risk stratification studies employ supervised ML approaches that typically use clinical datasets to predict perioperative or long-term outcomes. For example, a large multicentre study from Australia and New Zealand trained an XGBoost model on the ANZ Congenital Outcomes Registry to predict 30-day mortality (13). The model achieved an area under the receiver operating characteristic curve (AUROC) of 0.90 for high-risk patient identification, outperforming the logistic European Association score (AUC 0.82). While improvements of 0.05–0.10 in AUC may have limited clinical impact, ML’s capacity to capture complex nonlinear interactions offers value in uncovering previously unrecognised high-risk factor combinations.

Beyond mortality, several ML models have focused on predicting specific complications. For example, Li et al. (14) employed an XGBoost model to predict prolonged mechanical ventilation in infants following cardiac surgery. By integrating haemodynamic data from continuous arterial pressure monitoring at multiple ICU time points, the model demonstrated excellent discrimination in identifying infants at risk of delayed extubation. The single strongest predictor was the maximum ventricular pressure (dP/dt max) rise measured 8 hours after ICU admission, which outperformed even complex multivariable ML-derived combinations. It is also important to note that the apparent gain in predictive accuracy partly reflects the fact that the model was using data very close in time to the outcome, just as a very unwell patient is easier to recognise as being at high risk. This finding highlights the ability of ML to not only enhance predictive accuracy but also validate and quantify the prognostic importance of clinically intuitive factors.

Overall, risk prediction studies demonstrate that advanced models equal or modestly exceed conventional scores, but their real value may lie in granularity by capturing subgroup variation and temporal shifts that one-size-fits-all tools miss. Several studies achieved high discrimination (AUC 0.85–0.95) for their specific endpoints, especially when focusing on narrower outcomes (e.g., single complications) or when incorporating intraoperative and ICU data in addition to preoperative factors. However, challenges remain in translating these models to practice as a model trained on one dataset often loses accuracy when externally validated, due to differences in case-mix or practice patterns.

Imaging and diagnostic applications

Imaging is one of the most advanced areas of AI in CHD (9). In this review, we only included studies directly linked to surgical decision-making. This explains why relatively few imaging papers were captured compared with the wider field. We aimed to highlight representative examples with perioperative relevance, and we refer readers to recent reviews for a broader overview (15,16).

DL, particularly convolutional neural networks, can analyse images to automatically detect subtle features with a precision and reproducibility often beyond manual assessment (17,18). Several included studies applied ML to extract quantitative information from images or to derive novel imaging biomarkers that correlate with patient outcomes.

A notable example is the application of DL for cardiac magnetic resonance imaging (MRI) segmentation and quantification (19); the authors developed a fully automated 3D convolutional neural network to segment both left and right ventricles (RVs) on CMR images of repaired TOF patients. This is a non-trivial task because repaired TOF hearts often have patch material, scar and very enlarged RVs that confound generic segmentation models. By training on a combination of general cardiac MRI datasets and specific TOF cases, the authors achieved a much higher accuracy in RV volume measurement than widely used software. This is important as these measurements allow CHD teams to time pulmonary valve replacements for such patients. Therefore, AI-driven image analysis tools could also reduce interobserver variability and allow clinicians to focus on interpreting results rather than laboriously drawing contours.

Several studies have gone beyond just measuring lesions. For example, Wang et al. (20) described an interpretable deep key point stadiometry for not only measuring the size of an atrial septal defect, but based on these measurements, then suggesting whether the lesion is suitable for transcatheter closure and what occlude device size to use. Importantly, the model was designed to be interpretable: rather than a black box saying, “yes/no”, it provides the localised images and measurements that a clinician can verify. In their results, the automated tool’s classification accuracy for recommending transcatheter vs surgical closure was about 94%, significantly higher than a comparison “black box” convoluted neural network that directly predicted an outcome.

Collectively, these approaches illustrate the breadth of AI techniques being applied to imaging in CHS. In diagnostic imaging, adult cardiology has seen the FDA approval of AI tools for echocardiographic flow measurement or ECG-derived diagnoses (21).

Data augmentation and decision support

Generative AI techniques have only recently been applied in the context of CHS, comprising a very small fraction of the literature. Despite this, the approaches represent a novel way to potentially overcome data scarcity and provide new forms of decision support. Two distinct generative AI strategies emerged: one focused on creating synthetic medical data to augment training datasets, and another exploring LLMs for clinical decision guidance.

One study utilised generative adversarial networks (GANs) to address the challenge of limited imaging data in CHD. A GAN works by training two models against each other: one creates new images while the other judges how realistic they are, gradually improving until the generated images look almost identical to real ones. Diller et al. (22) trained a progressive GAN on cardiac MRI scans from 303 patients with TOF, enabling the generation of 100,000 synthetic MR images that were anatomically plausible upon expert review. A model trained exclusively on this GAN-generated dataset achieved segmentation accuracy nearly equivalent to a model trained on real patient images. This finding demonstrates that high-fidelity synthetic images can effectively supplement or substitute real data for training models, with the added benefit of avoiding patient privacy issues and allowing free data sharing across institutions.

Another study investigated the use of LLMs for clinical decision-making (23). The authors deployed the generative AI model, ChatGPT, as a virtual participant in a pediatric cardiac surgery case conference, comparing its suggested diagnoses and surgical plans with expert consensus. Out of 37 CHS cases of varying complexity submitted to the model, ChatGPT’s recommended treatment plans agreed with the expert team’s decisions about 94% of the time for straightforward cases; however, concordance dropped sharply to 25% for the most complex cases. The authors highlighted that while generative AI could serve as a supportive tool in medical decision-making, its performance in complex scenarios was markedly less reliable. The authors cautioned that over-reliance on such models without expert oversight could introduce misinformation or inaccuracies in care planning. Therefore, although generative AI shows promise, the evidence to date is limited to isolated pilot studies and therefore highlights the need for careful validation.

Role of data in shaping performance

A recurring theme in the literature and reflected in our findings is that the input features matter more than the specific algorithm when it comes to improving predictive performance, i.e., adding novel data (biomarkers, ICU trends, etc.) can enhance model accuracy more than switching from logistic regression to XGBoost on the same variables. The models that stood out often incorporated new types of data: continuous ICU monitoring, detailed imaging metrics, or genetic information. This aligns with broader trends where adult studies have begun including frailty or genomics in their risk models with ML to capture dimensions that traditional scores miss. In CHS, leveraging multicentre registries and linking them with non-traditional data (imaging archives, -omics repositories) will likely yield the biggest advances in predictive accuracy, rather than modifying existing algorithms.

LLMs also offer several practical opportunities in CHS. Their ability to rapidly synthesise large volumes of unstructured clinical notes, imaging ECGs, haemodynamic monitoring, and guidelines could help streamline perioperative workflows, for example, by generating structured operation notes or supporting multi-disciplinary team discussions with explainable evidence backing the decisions (23,24). Unlike traditional ML, which requires curated datasets and structured variables, LLMs are uniquely positioned to work with the fragmented and text-heavy records.

Finally, across medicine, there is also recognition that for AI to be adopted, clinicians need to trust its outputs. This is especially true in high-risk fields, including CHS. Our findings show that many CHS studies incorporated interpretable elements (e.g., feature importance, visual overlays for imaging). This mirrors a broader push for “explainable AI” in healthcare, because black-box algorithms, no matter how accurate, might face resistance or even produce unsafe recommendations if their reasoning is inscrutable (25). The ideal is a balance where AI provides answers and explanations, which a clinician can then accept or contest. Clinician involvement throughout algorithm development and validation is essential to ensure clinical relevance and ethical integrity (26).

Hype versus reality, and the path forward

Finally, we must address the hype vs. reality of AI in CHS. It is tempting to view AI as a revolutionary solution that will automatically yield better outcomes. While our review shows substantial promise, it also highlights that AI is an augmentative technology, not a solution per se. In practice, successful AI adoption often requires redesigning processes and continuous monitoring of the AI’s performance. CHS is no exception; we should approach AI implementation with the same rigour as a new surgical technique or drug: test it, refine it, monitor outcomes, and iterate.

Many studies had relatively small sample sizes, which raises concern about overfitting ML models. It is telling that the most robust models were those tapping large databases. Additionally, few studies engaged with the problem of causality versus correlation. ML excels at pattern recognition, but patterns are not necessarily causal. For example, a variable such as ‘prolonged bypass time’ may emerge as a strong predictor, but whether the model distinguishes cause from consequence is unclear. Emerging technical frameworks such as causal inference modelling or counterfactual prediction could push AI from simply describing risk associations to suggesting which factors, if modified, might actually change outcomes. This step is critical if AI is to move from prognostication into guiding operative or perioperative choices.

AI in CHS remains at an early stage of development, and this contrasts with fields such as adult cardiology, where aggregate evaluations of LLMs and other AI tools are already being implemented in the clinical setting. This also reflects the unique challenges of CHS with smaller patient volumes, heterogeneous anatomies, and fragmented data, all of which inevitably slow progress, but also highlight the importance of collaborative registries and shared validation pipelines.

Conclusions

AI in CHS is no longer a speculative concept but an emerging field with tangible, if early, applications. Our review shows that most studies remain small-scale and exploratory, yet the breadth of approaches, from risk prediction and imaging to generative models and large language tools, demonstrates a rapidly diversifying landscape. The key challenge now is to move beyond proof-of-concept work towards robust, multicentre, prospective validation, ensuring that algorithms trained on limited or fragmented data can generalise across diverse patient populations. At the same time, technical advances such as causal inference modelling, multimodal integration, and explainable frameworks provide opportunities for AI to transition from descriptive analytics to actionable decision support. The next phase will not simply be about building more accurate models, but about embedding these systems into surgical workflows in a way that augments expertise, enhances safety, and ultimately improves outcomes for patients with CHD.

Acknowledgments

None.

Footnote

Provenance and Peer Review: This article was commissioned by the editorial office, Translational Pediatrics for the series “Role of Artificial Intelligence in Pediatric Cardiology and Cardiac Surgery”. The article has undergone external peer review.

Reporting Checklist: The authors have completed the PRISMA-ScR reporting checklist. Available at https://tp.amegroups.com/article/view/10.21037/tp-2026-0345/rc

Peer Review File: Available at https://tp.amegroups.com/article/view/10.21037/tp-2026-0345/prf

Funding: None.

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tp.amegroups.com/article/view/10.21037/tp-2026-0345/coif). The series “Role of Artificial Intelligence in Pediatric Cardiology and Cardiac Surgery” was commissioned by the editorial office without any funding or sponsorship. J.F. served as an unpaid Guest Editor of the series. S.S. received NIHR grant on unrelated subject. The authors have no other conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Francis J, George J, Peng E, et al. The application of artificial intelligence in tissue repair and regenerative medicine related to pediatric and congenital heart surgery: a narrative review. Regenerative Medicine Reports 2024;1:131-6.
Sarker IH. Machine Learning: Algorithms, Real-World Applications and Research Directions. SN Comput Sci 2021;2:160. [Crossref] [PubMed]
Mall PK, Singh PK, Srivastav S, et al. A comprehensive review of deep neural networks for medical image processing: Recent developments and future opportunities. Healthcare Analytics 2023;4:100216.
Timilsina M, Buosi S, Razzaq MA, et al. Harmonizing foundation models in healthcare: A comprehensive survey of their roles, relationships, and impact in artificial intelligence's advancing terrain. Comput Biol Med 2025;189:109925. [Crossref] [PubMed]
Hashimoto DA, Rosman G, Rus D, et al. Artificial Intelligence in Surgery: Promises and Perils. Ann Surg 2018;268:70-6. [Crossref] [PubMed]
Leivaditis V, Beltsios E, Papatriantafyllou A, et al. Artificial Intelligence in Cardiac Surgery: Transforming Outcomes and Shaping the Future. Clin Pract 2025;15:17. [Crossref] [PubMed]
Sharabiani M, Mahani A, Bottle A, et al. GenAI exceeds clinical experts in predicting acute kidney injury following paediatric cardiopulmonary bypass. Sci Rep 2025;15:20847. [Crossref] [PubMed]
Moor M, Banerjee O, Abad ZSH, et al. Foundation models for generalist medical artificial intelligence. Nature 2023;616:259-65. [Crossref] [PubMed]
Jone PN, Gearhart A, Lei H, et al. Artificial Intelligence in Congenital Heart Disease: Current State and Prospects. JACC Adv 2022;1:100153. [Crossref] [PubMed]
Ouzzani M, Hammady H, Fedorowicz Z, et al. Rayyan-a web and mobile app for systematic reviews. Syst Rev 2016;5:210. [Crossref] [PubMed]
Bojan M, Gerelli S, Gioanni S, et al. Comparative study of the Aristotle Comprehensive Complexity and the Risk Adjustment in Congenital Heart Surgery scores. Ann Thorac Surg 2011;92:949-56. [Crossref] [PubMed]
Chang J Junior, Caneo LF, Turquetto ALR, et al. Predictors of in-ICU length of stay among congenital heart defect patients using artificial intelligence model: A pilot study. Heliyon 2024;10:e25406. [Crossref] [PubMed]
Betts KS, Marathe SP, Chai K, et al. A machine learning approach to predicting 30-day mortality following paediatric cardiac surgery: findings from the Australia New Zealand Congenital Outcomes Registry for Surgery (ANZCORS). Eur J Cardiothorac Surg 2023;64:ezad160. [Crossref] [PubMed]
Li M, Wang S, Zhang H, et al. The predictive value of pressure recording analytical method for the duration of mechanical ventilation in children undergoing cardiac surgery with an XGBoost-based machine learning model. Front Cardiovasc Med 2022;9:1036340. [Crossref] [PubMed]
Dey D, Slomka PJ, Leeson P, et al. Artificial Intelligence in Cardiovascular Imaging: JACC State-of-the-Art Review. J Am Coll Cardiol 2019;73:1317-35. [Crossref] [PubMed]
Holt DB, El-Bokl A, Stromberg D, et al. Role of Artificial Intelligence in Congenital Heart Disease and Interventions. J Soc Cardiovasc Angiogr Interv 2025;4:102567. [Crossref] [PubMed]
Valente J, António J, Mora C, et al. Developments in Image Processing Using Deep Learning and Reinforcement Learning. J Imaging 2023;9:207. [Crossref] [PubMed]
Vinodkumar PK, Karabulut D, Avots E, et al. Deep Learning for 3D Reconstruction, Augmentation, and Registration: A Review Paper. Entropy (Basel) 2024;26:235. [Crossref] [PubMed]
Tilborghs S, Liang T, Raptis S, et al. Automated biventricular quantification in patients with repaired tetralogy of Fallot using a three-dimensional deep learning segmentation model. J Cardiovasc Magn Reson 2024;26:101092. [Crossref] [PubMed]
Wang J, Xie W, Cheng M, et al. Assessment of Transcatheter or Surgical Closure of Atrial Septal Defect using Interpretable Deep Keypoint Stadiometry. Research (Wash D C) 2022;2022:9790653. [Crossref] [PubMed]
Krishna H, Desai K, Slostad B, et al. Fully Automated Artificial Intelligence Assessment of Aortic Stenosis by Echocardiography. J Am Soc Echocardiogr 2023;36:769-77. [Crossref] [PubMed]
Diller GP, Vahle J, Radke R, et al. Utility of deep learning networks for the generation of artificial cardiac magnetic resonance images in congenital heart disease. BMC Med Imaging 2020;20:113. [Crossref] [PubMed]
Mehta R, Reitz JG, Venna A, et al. Navigating the future of pediatric cardiovascular surgery: Insights and innovation powered by Chat Generative Pre-Trained Transformer (ChatGPT). J Thorac Cardiovasc Surg 2025;170:353-8. [Crossref] [PubMed]
Ferreira Santos J, Ladeiras-Lopes R, Leite F, et al. Applications of large language models in cardiovascular disease: a systematic review. Eur Heart J Digit Health 2025;6:540-53. [Crossref] [PubMed]
Feuerriegel S, Frauen D, Melnychuk V, et al. Causal machine learning for predicting treatment outcomes. Nat Med 2024;30:958-68. [Crossref] [PubMed]
Rashidi P, Kilic A, Kline A, et al. Artificial intelligence and machine learning in cardiothoracic surgery: Future prospects and ethical issues. J Thorac Cardiovasc Surg 2025;170:1859-1866.e1. [Crossref] [PubMed]

Cite this article as: Francis J, Singhania A, Dawson S, Caputo M, Savovic J, Stoica S. Artificial intelligence in congenital heart surgery: a scoping review and primer for surgeons. Transl Pediatr 2026;15(6):244. doi: 10.21037/tp-2026-0345

Artificial intelligence in congenital heart surgery: a scoping review and primer for surgeons

Highlight box

Introduction

Table 1

Methods

Results

Discussion

Risk stratification

Imaging and diagnostic applications

Data augmentation and decision support

Role of data in shaping performance

Hype versus reality, and the path forward

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share