Medical AI

Medical Abstract Cardiovascular

Serial changes in artificial intelligence enabled electrocardiogram probability scores as predictors of ejection fraction improvement in heart failure with reduced ejection fraction

Artificial intelligence–enabled electrocardiography (AI-ECG) accurately detects left ventricular systolic dysfunction (LVSD). Prior studies suggest higher AI-ECG LVSD scores predict adverse outcomes in heart failure with reduced ejection fraction (HFrEF). Whether serial changes in these scores predict recovery to heart failure with improved ejection fraction (HFiEF) is unclear. If validated, AI-ECG could offer a non-invasive alternative to frequent transthoracic echocardiography (ECHO) for monitoring ejection fraction (EF).We investigated whether sequential 12-lead AI-ECG LVSD scores are associated with EF improvement in HFrEF, potentially enabling clinicians to detect HFiEF without relying solely on serial ECHO.This single-center, retrospective cohort study included all adults (≥19 years) with at least one ECHO-confirmed LVEF ≤40% (2017–2025). Exclusion criteria included left ventricular assist device, heart transplantation, ECMO, or IABP. Each ECG was analyzed by an AI-ECG model, yielding an LVSD probability (0.0–100.0). ECHOs within 14 days of an ECG formed ECG-ECHO pairs. Baseline HFrEF was defined by the first LVEF ≤40%. HFiEF required a ≥10% absolute increase in LVEF from baseline after ≥90 days. We defined the “Delta score” as (baseline AI-ECG probability − follow-up AI-ECG probability). Associations were assessed using Cox proportional hazards regression, adjusted for age, sex, obesity, hypertension, diabetes, and ischemic heart disease (IHD). Kaplan-Meier analysis was conducted to compare the probability of achieving HFiEF among the three distinct Delta score groups (1st, 2nd, and 3rd tertile), with statistical significance assessed using the log-rank test.Among 832 patients (mean age 64.0±14.0 years; 66.6% male), 426 (51.2%) achieved HFiEF. They were younger (62.1±13.7 vs. 66.1±14.0 years, p<0.001) and had lower IHD prevalence (38.3% vs. 58.1%, p<0.001). Baseline LVEF was lower in the HFiEF group (29.15% vs. 32.17%, p<0.001), with a significant rise at follow-up (49.89% vs. 33.26%, p<0.001). Baseline AI-ECG scores were similar (57.56 vs. 53.63, p=0.069) but dropped substantially in the HFiEF group at follow-up (19.52 vs. 48.45, p<0.001). Each 1-point higher baseline AI-ECG score predicted a 0.9% lower chance of HFiEF (aHR 0.991, p<0.001), while each 1-point increase in Delta score predicted a 3.9% higher HFiEF likelihood (aHR 1.039, p<0.001), with a significant interaction (p=0.004). Kaplan-Meier analysis demonstrated significant differences among the three distinct Delta score groups in predicting recovery to HFiEF (p < 0.0001).Baseline AI-ECG LVSD scores and their serial decreases both predict EF recovery in HFrEF. Incorporating AI-ECG into routine care could offer a simple, non-invasive strategy to track LV function improvement—complementing or reducing the need for repeated ECHO.

ESC 2025

August, 2025

View original text(in a new window)

Tech Conference Non-cardiovascular

CoFE: A Framework Generating Counterfactual ECG for Explainable Cardiac AI-Diagnostics

Recognizing the need for explainable AI (XAI) approaches to enable the successful integration of AI-based ECG prediction models(AI-ECG) into clinical practice, we introduce a framework generating CounterFactual ECGs (i,e., named CoFE) to illustrate howspecific features, such as amplitudes and intervals, influence themodel’s predictive decisions. To demonstrate the applicability ofthe CoFE, we present two case studies: atrial fibrillation classification and potassium level regression models. The CoFE revealsfeature changes in ECG signals that align with the establishedclinical knowledge. By clarifying both where valid features appear in the ECG and how they influence the model’s predictions, we anticipate that our framework will enhance the interpretability of AI-ECG models and support more effective clinicaldecision-making. Our demonstration video is available at: https://www.youtube.com/watch?v=YoW0bNBPglQ.

arXiv

August, 2025

View original text(in a new window)

Medical Abstract Cardiovascular

Pilot study on AI-enhanced smartwatch ECG for prospective monitoring for detecting left ventricular systolic dysfunction in real-world settings

Artificial intelligence-enhanced electrocardiogram (AI-ECG) using single-lead ECG can detect Left Ventricular Systolic Dysfunction (LVSD). However, most models are trained on Lead I from 12-lead ECGs, which differs fundamentally from smartwatch ECGs. In real-world settings, smartwatch ECGs not only differ structurally and physically from standard ECGs but are also affected by noise and user-related artifacts. Furthermore, validation remains limited due to the scarcity of data from self-recorded smartwatch ECGs.This study investigates whether self-recorded smartwatch ECGs can reliably monitor LVSD in real-world settings by comparing AI-ECG scores with ejection fraction from echocardiography.From July to October 2024, we enrolled participants who had recently undergone or were scheduled for echocardiography. Eligible individuals were instructed to record ECGs using their smartwatches (Samsung Galaxy or Apple Watch) at least twice daily for over a week, with a paired echocardiogram performed within 14 days. We adapted our previously developed convolutional neural network-based AI-ECG model to analyze smartwatch ECGs. The model was fine-tuned using smartwatch ECG data, leveraging the foundation model architecture with an integrated preprocessing module to manage signal noise inherent to smartwatch-derived data. It outputs a score between 0 and 100, with higher scores indicating a greater likelihood of LVSD. We evaluated model performance using two approaches: (1) Approach 1: All available ECGs were analyzed individually to generate scores and assess overall performance (2) Approach 2: Three ECGs per day were randomly selected for each participant, and their median score was used as the representative value for performance evaluation. All ECGs were processed without explicit adjustment for signal noise.A total of 27 participants were included, with 77.4% using Samsung Galaxy Watches and 22.6% using Apple Watches. Echocardiography, performed at a median interval of 6 days, identified 7 participants (36.5%) with LVSD. The median AI-ECG score was 55.0 in the LVSD group and 6.5 in the non-LVSD group. Overall, 1,497 ECGs were collected, including 866 from the LVSD group. When analyzing all available ECGs, the area under the receiver operating characteristic curve (AUROC) was 0.915 (95% confidence interval: 0.900–0.927). When analyzing only three randomly selected ECGs per day, the AUROC was 0.864.Our study demonstrates that an AI-based single-lead ECG approach can reliably monitor LVSD when applied to self-recorded smartwatch data in real-world settings. These findings provide important evidence supporting the extension of our AI-ECG model to analyze smartwatch ECGs.

ESC 2025

August, 2025

View original text(in a new window)

Medical Journal Cardiovascular

Artificial Intelligence–Enabled ECG Screening for LVSD in LBBB: Evaluating Model Development and Transfer Learning Approaches

Left bundle branch block (LBBB) is a common electrocardiogram (ECG) abnormality associated with left ventricular systolic dysfunction (LVSD). Although artificial intelligence (AI)–driven ECG analysis shows promise for LVSD screening, it remains unclear if a general AI-ECG model or one tailored for LBBB patients yields better performance.This study evaluates 4 AI-ECG models for detecting LVSD in LBBB patients and examines the impact of training cohort definitions.We developed 4 models using 364,845 ECGs from 4 hospitals: 1) a general AI-ECG model; 2) a model trained on automatically extracted LBBB cases; 3) a model trained on a well-curated single-center LBBB data set with expert review; and 4) a hybrid model employing transfer learning by fine-tuning the general model with single-center LBBB data. LVSD was defined as an ejection fraction #40%. All models were externally validated on 1,334 ECGs from another hospital, with performance assessed by area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, and predictive values.In external validation, the transfer learning model achieved the highest AUROC (0.903; 95% CI: 0.887-0.918), closely followed by the general model (0.899; 95% CI: 0.883-0.915); the difference was not significant. Models using automated or expert-based LBBB extraction had lower AUROCs (0.879 and 0.841, respectively). The general model demonstrated high sensitivity, whereas the transfer learning model exhibited superior specificity.Our findings indicate that a broad AI-ECG model reliably detects LVSD in LBBB patients, and transfer learning offers modest improvements without requiring curated LBBB data sets. Evaluating algorithms in representative clinical populations is essential.

JACC: Advances

September, 2025

View original text(in a new window)

Tech Conference Non-cardiovascular

ALFRED: Ask a Large-language model For Reliable ECG Diagnosis

Leveraging Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) for analyzing medical data, particularly Electrocardiogram (ECG), offers high accuracy and convenience. However, generating reliable, evidence-based results in specialized fields like healthcare remains a challenge, as RAG alone may not suffice. We propose a Zero-shot ECG diagnosis framework based on RAG for ECG analysis that incorporates expert-curated knowledge to enhance diagnostic accuracy and explainability. Evaluation on the PTB-XL dataset demonstrates the framework’s effectiveness, highlighting the value of structured domain expertise in automated ECG interpretation. Our framework is designed to support comprehensive ECG analysis, addressing diverse diagnostic needs with potential applications beyond the tested dataset.

arXiv

April 30, 2025

View original text(in a new window)

Tech Journal Cardiovascular

Transparent and Robust Artificial Intelligence-Driven Electrocardiogram Model for Left Ventricular Systolic Dysfunction

Heart failure (HF) is a growing global health burden, yet early detection remains challenging due to the limitations of traditional diagnostic tools such as electrocardiograms (ECGs). Recent advances in deep learning offer new opportunities to identify left ventricular systolic dysfunction (LVSD), a key indicator of HF, from ECG data. This study validates AiTiALVSD, our previously developed artificial intelligence (AI)-enabled ECG Software as a Medical Device, for its accuracy, transparency, and robustness in detecting LVSD. Methods: This retrospective single-center cohort study involved patients suspected of LVSD. The AiTiALVSD model, based on a deep learning algorithm, was evaluated against echocardiographic ejection fraction values. To enhance model transparency, the study employed Testing with Concept Activation Vectors (TCAV), clustering analysis, and robustness testing against ECG noise and lead reversals. Results: The study involved 688 participants and found AiTiALVSD to have a high diagnostic performance, with an AUROC of 0.919. There was a significant correlation between AiTiALVSD scores and left ventricular ejection fraction values, confirming the model’s predictive accuracy. TCAV analysis showed the model’s alignment with medical knowledge, establishing its clinical plausibility. Despite its robustness to ECG artifacts, there was a noted decrease in specificity in the presence of ECG noise. Conclusions: AiTiALVSD’s high diagnostic accuracy, transparency, and resilience to common ECG discrepancies underscore its potential for early LVSD detection in clinical settings. This study highlights the importance of transparency and robustness in AI-ECG, setting a new benchmark in cardiac care.

Diagnostics

July 22, 2025

View original text(in a new window)

Tech Journal Non-cardiovascular

A novel XAI framework for explainable AI-ECG using generative counterfactual XAI (GCX)

Generative Counterfactual Explainable Artificial Intelligence (XAI) offers a novel approach to understanding how AI models interpret electrocardiograms (ECGs). Traditional explanation methods focus on highlighting important ECG segments but often fail to clarify why these segments matter or how their alteration affects model predictions. In contrast, the proposed framework explores “what-if” scenarios, generating counterfactual ECGs that increase or decrease a model’s predictive values. This approach has the potential to increase clinicians’ trust specific changes—such as increased T wave amplitude or PR interval prolongation—influence the model’s decisions. Through a series of validation experiments, the framework demonstrates its ability to produce counterfactual ECGs that closely align with established clinical knowledge, including characteristic alterations associated with potassium imbalances and atrial fibrillation. By clearly visualizing how incremental modifications in ECG morphology and rhythm affect artificial intelligence-applied ECG (AI-ECG) predictions, this generative counterfactual method moves beyond static attribution maps and has the potential to increase clinicians’ trust in AI-ECG systems. As a result, this approach offers a promising path toward enhancing the explainability and clinical reliability of AI-based tools for cardiovascular diagnostics.

Scientific Reports

July 02, 2025

View original text(in a new window)

Medical Abstract Cardiovascular

Artificial Intelligence–Enabled Electrocardiography for Detecting Risk of Rehospitalization in patients with Heart Failure

We hypothesized that AI-enabled ECG scores would show distinct temporal patterns after hospital discharge in patients with HF, and that these patterns would differ between patients who experienced rehospitalization and those who did not. This single-center retrospective study analyzed ECG data from patients hospitalized for HF between March 2017 and January 2025 in South Korea. Post-discharge, ECGs were processed using AI-ECG models for left ventricular systolic dysfunction (LVSD), diastolic dysfunction (LVDD), and myocardial infarction (MI). We compared AI-ECG patterns in patients readmitted within six months vs. those not (Figure 1). Temporal trends in AI-ECG scores were assessed using a mixed-effects linear regression model with group and time as fixed effects, and patient as a random effect. Among 1,007 patients, 1,539 hospitalization events were identified. A total of 1,674 ECGs from 269 rehospitalized and 4,066 ECGs from 917 non-rehospitalized patients were collected from 180 days before to 60 days after the index readmission or follow-up end. The mean age was 65.2 years, and 63.1% were male. Diabetes mellitus and chronic kidney disease were significantly more prevalent in the rehospitalization group, whereas other comorbidities were comparable. Significant differences in ECG intervals and axes were also observed, with no notable difference in heart rate. In the LVSD model, rehospitalized patients showed higher scores overall (β = 7.96, 95% CI: 3.18–12.75, p = 0.001) (Figure 2). Time since discharge was associated with decreasing scores (β = –0.096/day, 95% CI: –0.104 to –0.087, p<0.001), but this decline was attenuated in the rehospitalization group (interaction β = 0.092, 95% CI: 0.069–0.115, p<0.001). The LVDD model demonstrated a similar trend, while the MI model exhibited no statistically significant differences in scores (Figure 3). AI-ECG models show potential as dynamic biomarkers for detecting early physiological deterioration and predicting readmission risk in HF patients. These findings support their use in future patient monitoring strategies.

AHA

June 05, 2025

View original text(in a new window)

Medical Abstract Cardiovascular

EARLY ACUTE MYOCARDIAL INFARCTION RISK STRATIFICATION IN THE EMERGENCY DEPARTMENT: AI-ENHANCED ELECTROCARDIOGRAM AND THE 10-MINUTE RULE

Our team previously developed an AI-ECG method for diagnosing ST-segment elevation myocardial infarction (STEMI) and non-ST-segment elevation myocardial infarction (NSTEMI) using 12-lead electrocardiograms (ECGs), demonstrating superior performance compared to cardiologists (Sci Rep 10, 20495 [2020]). In 2023, this approach was approved as an innovative technology in South Korea (AiTAMI v1.00.00). External validation was conducted across 18 emergency centers (ROMIAE study). Building on these findings, we introduce the “10-minute rule” for early risk assessment of acute myocardial infarction (AMI). We trained AiTAMI v2.00.00 using a foundation model and ECG data from the ROMIAE cohort collected across 14 hospitals. The model was validated at four additional centers, comprising 1,480 patients (Non-AMI = 1,150; NSTEMI = 198; STEMI = 132). Model performance and risk stratification were evaluated using AUROC, clinical endpoints, and decision rule performance. The updated model improved AUROC from 0.887 to 0.906 and AUPRC from 0.760 to 0.795. The 10-minute rule-out strategy identified 23.2% of patients with a negative predictive value (NPV) of 99.7%, while the rule-in strategy identified 24.4% of patients with a positive predictive value (PPV) of 68.5%. AI-ECG utilizing the 10-minute rule can classify 47.6% of chest pain patients early in emergency settings, indicating a potential paradigm shift in the management of AMI.

JACC

April 01, 2025

View original text(in a new window)

Tech Conference Non-cardiovascular

Benchmarking ECG Delineation using Deep Neural Network-based Semantic Segmentation Models

Accurate electrocardiogram (ECG) delineation is essential for automated cardiac diagnosis, enabling the precise identification of key waveforms such as the P wave, QRS complex, and T wave. This study presents the first comprehensive benchmarking of neural network-based semantic segmentation models for ECG delineation, evaluating their accuracy, resource efficiency, and robustness across both public and private datasets. Our results demonstrate that convolutional neural network (CNN)-based approaches consistently achieve superior accuracy compared to Transformer-based approaches. Additionally, we observed the presence of fragmented segments in the delineation results. To address this issue, we explored post-processing techniques to consolidate or eliminate fragmented segments using an optimal configuration, leading to performance improvements. Furthermore, by analyzing performance variations across different waveform labels, we provide critical insights into key considerations for ECG segmentation tasks. Notably, our findings also reveal that larger model sizes do not necessarily correlate with better performance. Based on our findings, we propose a set of practical guidelines for leveraging segmentation models in ECG delineation, offering valuable direction for future research and clinical applications.

Proceedings of the Conference on Health, Inference, and Learning

June 25, 2025

View original text(in a new window)

Publications