Analysis and comparative study on the influencing factors of bronchopulmonary dysplasia in premature infants with gestational age ≤32 weeks based on logistic regression and decision tree models
Original Article

Analysis and comparative study on the influencing factors of bronchopulmonary dysplasia in premature infants with gestational age ≤32 weeks based on logistic regression and decision tree models

Pu Zhao1#, Lijin Zhao1#, You Zhang2 ORCID logo, Min Peng1

1Neonatology Department, Shanghai Key Laboratory of Maternal Fetal Medicine, Shanghai Institute of Maternal-Fetal Medicine and Gynecologic Oncology, Shanghai First Maternity and Infant Hospital, School of Medicine, Tongji University, Shanghai, China; 2Information Center, Shanghai Key Laboratory of Maternal Fetal Medicine, Shanghai Institute of Maternal-Fetal Medicine and Gynecologic Oncology, Shanghai First Maternity and Infant Hospital, School of Medicine, Tongji University, Shanghai, China

Contributions: (I) Conception and design: Y Zhang, P Zhao; (II) Administrative support: M Peng, L Zhao; (III) Provision of study materials or patients: P Zhao, M Peng; (IV) Collection and assembly of data: Y Zhang, L Zhao; (V) Data analysis and interpretation: Y Zhang, P Zhao; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

#These authors contributed equally to this work as co-first authors.

Correspondence to: You Zhang, MSc. Engineer, Information Center, Shanghai Key Laboratory of Maternal Fetal Medicine, Shanghai Institute of Maternal-Fetal Medicine and Gynecologic Oncology, Shanghai First Maternity and Infant Hospital, School of Medicine, Tongji University, 2699 Gaoke West Road, Pudong New Area, Shanghai 200092, China. Email: 122710487@qq.com; Min Peng, MSc. Attending Physician, Neonatology Department, Shanghai Key Laboratory of Maternal Fetal Medicine, Shanghai Institute of Maternal-Fetal Medicine and Gynecologic Oncology, Shanghai First Maternity and Infant Hospital, School of Medicine, Tongji University, 2699 Gaoke West Road, Pudong New Area, Shanghai 200092, China. Email: pengmin2014@51mch.com.

Background: Bronchopulmonary dysplasia (BPD), as a common and serious complication in premature infants, has a high mortality rate, and the prognosis of most premature infants with BPD is poor. Early prediction and intervention treatment can improve the prognosis of children to a certain extent. This study aimed to explore the risk factors affecting BPD in premature infants with gestational age ≤32 weeks, provide scientific basis for the prevention and control of BPD.

Methods: We retrospectively collected clinical data from 478 premature infants with gestational age ≤32 weeks admitted to the Neonatology Department of Tongji University Affiliated Obstetrics and Gynecology Hospital from January 2023 to December 2024. The collected data were divided into a training set and a validation set in an 8:2 ratio, logistic regression and classification decision tree models were used to study the influencing factors of BPD in premature infants with gestational age ≤32 weeks. Receiver operating characteristic (ROC) curves and calibration curves [area under the curve (AUC)] were used to evaluate the effectiveness of the two prediction models.

Results: A total of 75 cases (15.69%) of premature infants with gestational age ≤32 weeks developed BPD. Both models showed that ventilator assisted breathing, birth weight, and neonatal respiratory distress syndrome (NRDS) were risk factors for developing BPD in premature infants with gestational age ≤32 weeks. In the logistic regression model, neonatal asphyxia was also included as one of the influencing factors. Ventilator assisted breathing was the main influencing factor for the occurrence of BPD in premature infants aged ≤32 weeks. The comparison of the analysis results of the two models showed that, in the training set, the AUC of the binary logistic regression model was 0.901 [95% confidence interval (CI): 0.868–0.945], with an accuracy of 90.48%, sensitivity of 62.26%, specificity of 95.76%, and F1 score of 0.67. The AUC of the classification decision tree model based on Chi-squared Automatic Interaction Detection (CHAID) algorithm was 0.809 (95% CI: 0.870–0.910), with an accuracy of 88.10%, sensitivity of 43.39%, specificity of 96.47%, and F1 score of 0.53. The area under the ROC curve of the logistic regression analysis model was larger than that of the decision tree model (Z=9.568, P<0.001). In the validation set, the AUC of the logistic regression model was 0.912, and the AUC of the decision tree model was 0.871.

Conclusions: Both logistic regression and decision tree models have certain classification prediction value, among which the logistic regression model has better predictive ability than the decision tree model. Birth weight, total duration of mechanical ventilation, asphyxia, and NRDS are independent risk factors affecting the occurrence of BPD in premature infants ≤32 weeks. Clinical medical staff can develop predictive plans based on the predicted results, conduct individualized risk assessments for infants in the early stages, achieve early warning and precise intervention, and improve the overall quality of life of premature infants with gestational age ≤32 weeks.

Keywords: Bronchopulmonary dysplasia (BPD); premature infants with gestational age ≤32 weeks; logistic regression; decision tree model; prediction model


Submitted Jan 02, 2026. Accepted for publication Mar 31, 2026. Published online Apr 29, 2026.

doi: 10.21037/tp-2026-1-0003


Highlight box

Key findings

• Birth weight, total duration of mechanical ventilation, neonatal asphyxia, and neonatal respiratory distress syndrome (NRDS) were independent risk factors for bronchopulmonary dysplasia (BPD).

• The logistic regression model showed better predictive performance than the Chi-squared Automatic Interaction Detection decision tree model (area under the curve: 0.901 vs. 0.809 in the training set; P<0.001).

What is known and what is new?

• Low birth weight, prolonged mechanical ventilation, and NRDS are well‑recognized risk factors for BPD in preterm infants.

• This study used least absolute shrinkage and selection operator regression to reduce collinearity and compared logistic regression and decision tree models in the same cohort. The logistic regression model was superior in discrimination and clinical utility, and a nomogram was developed for individualized risk assessment in infants ≤32 weeks.

What is the implication, and what should change now?

• Clinicians can use the nomogram based on four easily obtainable variables (birth weight, mechanical ventilation duration, asphyxia, NRDS) to quantify BPD risk early after birth.


Introduction

Bronchopulmonary dysplasia (BPD) is one of the common chronic lung diseases in premature infants, and its incidence gradually decreases with the increase of gestational age, but it is still at a high level (1). BPD can cause respiratory diseases, such as bronchitis, pneumonia, and other respiratory diseases (2), leading to an increase in mechanical ventilation time, difficulty in weaning from oxygen, etc. (3), prolonging hospital stay and increasing family burden; with the increase of the child’s age, it is prone to complications of respiratory, nervous, and cardiovascular diseases, and the patients may face growth and development problems (4). The pathogenesis of BPD has not been fully elucidated. Existing studies show that perinatal infection, inflammatory response, mechanical ventilation, abnormal repair process after lung tissue damage, and other factors together promote the occurrence and development of BPD (5). Premature infants with a gestational age of ≤32 weeks belong to early preterm infants, and their lung development is more imperfect compared to late preterm infants (6). At present, there is no effective prevention plan for premature infants with BPD, although there are some prediction models, their predictive performance is low and lacks external validation. Therefore, this study intends to analyze the influencing factors of BPD occurrence within 28 days in premature infants with a gestational age of ≤32 weeks, eliminate the collinearity problem between predictors through least absolute shrinkage and selection operator (LASSO) regression, construct a prediction model and verify it, in order to provide a theoretical basis for early clinical intervention to prevent BPD. We present this article in accordance with the TRIPOD reporting checklist (available at https://tp.amegroups.com/article/view/10.21037/tp-2026-1-0003/rc).


Methods

Research object

The study selected 478 premature infants admitted to the Neonatal Department of the Tongji University Affiliated Obstetrics and Gynecology Hospital from January 2023 to December 2024. Based on whether BPD occurred, they were divided into the BPD group (n=75) and the non-BPD group (n=403). Inclusion criteria: (I) gestational age ≤32 weeks; (II) born and treated in this center; (III) clear birth history and complete case information; (IV) meeting the BPD diagnostic criteria [i.e., requiring oxygen support for life, with an oxygen concentration >21% and exceeding 28 days (7)]. Exclusion criteria: (I) those who died within 28 days after birth, were transferred to another hospital, or whose parents gave up treatment (missing BPD outcomes); (II) those with complex congenital heart diseases, genetic metabolic diseases, or congenital respiratory system malformations that affect BPD diagnosis; (III) those with incomplete clinical data. This research strictly screened cases according to the inclusion and exclusion criteria and has been approved by the Ethics Committee of Tongji University Affiliated Obstetrics and Gynecology Hospital (ethical No. KS25352). Individual consent for this retrospective analysis was waived by the ethics committee.This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Research methods

Retrospective collection of general clinical data and laboratory-related indicators for the child and mother. (I) Mother’s clinical data: includes age, placental abnormalities, umbilical cord abnormalities, amniotic fluid contamination, premature rupture of membranes, pregnancy-induced hypertension (PIH), gestational diabetes mellitus (GDM), pregnancy-associated anemia, and pregnancy-associated thyroid disease; (II) child’s clinical data: includes hospital stay (days), birth weight, gender, parity, gravidity, whether twins, delivery method, conception method, Apgar score at 1 and 5 minutes, whether infected, neonatal respiratory distress syndrome (NRDS), respiratory failure, neonatal pneumonia (NP), neonatal sepsis, neonatal asphyxia, neonatal necrotizing enterocolitis (NEC), neonatal hypoglycemia, pneumothorax, and incidence of BPD; (III) child’s laboratory data (results of the first examination upon admission): white blood cell (WBC), neutrophils (NE), platelets (PLT), hemoglobin (Hb), C-reactive protein (CRP), hydrogen ion concentration (PH), arterial carbon dioxide partial pressure (PCO2); (IV) child’s treatment details include: total duration of mechanical ventilation (days), duration of assist/control (A/C) mode ventilation (days), duration of Biphasic mode ventilation (days), duration of continuous positive airway pressure (CPAP) mode ventilation (days), duration of HFO mode ventilation (days), duration of oxygen use with head cover (days), duration of intravenous hyperalimentation (days), age on starting feeding (days), cumulative time of nasal feeding (days), and usage of pulmonary surfactant (PS).

Statistical analysis

The SPSS26.0 and R4.2.1 software were used for statistical data analysis. The Kolmogorov-Smirnov test was employed to examine the normal distribution of the data. Based on the distribution of the data, measurement data that conforms to a normal distribution is expressed as mean ± standard deviation (x¯±s); data with a non-normal distribution is presented as median (interquartile range). For comparisons between groups, the two independent sample t-test was utilized. Categorical data is presented as count (percentage), and for comparisons between groups, the χ2 test or Fisher’s exact test was applied. After conducting univariate analysis, variables with statistical differences were further screened for feature variables using LASSO regression and 10-fold cross validation. Different variable combinations were fitted according to the lambda.min variable selection criteria to validate the model. Establish binary logistic regression and classification decision tree models based on Chi-squared Automatic Interaction Detection (CHAID) with the occurrence of BPD as the dependent variable and the selected variables in LASSO regression as independent variables. The collected data were divided into training and validation sets with an 8:2 ratio. Among them, the classification decision tree based on CHAID algorithm uses the results of chi square test or likelihood ratio Chi-squared test to determine the optimal grouping variables and segmentation points of the decision tree, ultimately forming a classification tree (8). To prevent overfitting, pre pruning techniques are applied to control the sufficient growth of decision trees: the maximum tree depth is 3, the minimum sample size for parent nodes is 100, and the minimum sample size for child nodes is 50. And ten-fold cross-validation was implemented, and calibration curves were drawn by repeatedly sampling 500 times through the Bootstrap method to internally validate the model. The accuracy of the model was evaluated by drawing the receiver operating characteristic curve (ROC), calculating the area under the curve (AUC), calibration curve, decision analysis curve. A P value of less than 0.05 was considered statistically significant.


Results

Comparison of clinical data between BPD and non-BPD groups

The single-factor results show that there are statistical differences in the maternal clinical data regarding age, placental abnormalities, and amniotic fluid contamination (P<0.05). In the neonatal clinical data, there are statistical differences in birth weight, parity, whether it’s a twin birth, Apgar scores at 1 and 5 minutes, NRDS, and neonatal asphyxia (P<0.05). In the neonatal treatment situation, there are statistical differences in whether PS was used, total duration of mechanical ventilation, duration of ventilation in A/C mode, Biphasic mode, CPAP mode, duration of oxygen use with head cover, duration of intravenous hyperalimentation, and the day age when feeding started (P<0.05). For details, see Table 1.

Table 1

Comparison of clinical data between the BPD and non-BPD groups

Item BPD groups (n=75) Non-BPD groups (n=403) χ2/Z P
Clinical data of the mother
   Placental abnormality 5.880 0.02
    Yes 20 (26.67) 64 (15.88)
    No 55 (73.33) 339 (84.12)
   Umbilical cord abnormality 0.704 0.40
    Yes 13 (17.33) 55 (13.65)
    No 62 (82.67) 348 (86.35)
   Amniotic fluid contamination 8.125 0.004
    Yes 15 (20.0) 36 (8.94)
    No 60 (80.0) 367 (91.06)
   Premature rupture of fetal membranes 3.510 0.06
    Yes 17 (22.67) 57 (14.14)
    No 58 (77.33) 346 (85.86)
   PIH 0.016 0.90
    Yes 6 (8.0) 34 (8.44)
    No 69 (92.0) 369 (91.56)
   GDM 0.238 0.63
    Yes 61 (81.33) 337 (83.62)
    No 14 (18.67) 60 (16.38)
   Anemia complicating pregnancy 0.296 0.59
    Yes 2 (2.67) 16 (3.97)
    No 73 (97.33) 387 (96.03)
   Pregnancy complicated by thyroid diseases 0.053 0.82
    Yes 4 (5.33) 19 (4.71)
    No 71 (94.67) 384 (95.29)
Clinical data of pediatric patients
   Length of hospital stay (d) 36.00 (25.00, 59.00) 38.00 (26.00, 52.00) −0.047 0.96
   Birth weight (g) 1,255.00 (950.00, 1,520.00) 1,407.50 (1,141.25, 1,655.00) −3.486 <0.001
   Gender 0.038 0.85
    Boy 40 (53.33) 210 (52.11)
    Girl 35 (46.67) 193 (47.89)
   Parity (frequency) 2 (1, 3) 2 (1, 3) −1.810 0.07
   Para (frequency) 1 (1, 2) 1 (1, 2) −2.136 0.03
   Twins 4.784 0.03
    Yes 35 (46.67) 135 (33.50)
    No 40 (53.33) 268 (66.50)
   Delivery pattern 1.407 0.24
    Eutocia 11 (14.67) 83 (20.60)
    Cesarean 64 (85.33) 320 (79.40)
   Mode of conception 2.214 0.14
    Natural conception 59 (78.67) 283 (70.22)
    Conception through assisted reproductive technology 16 (21.33) 120 (29.78)
   Apgar 1 minute score 8 (8, 9) 9 (8, 9) −5.252 <0.001
   Apgar 5 minutes score 8 (8, 9) 9 (8, 9) −3.634 <0.001
   Infection 0.505 0.48
    Yes 5 (6.67) 19 (4.71)
    No 70 (93.33) 384 (95.29)
   NRDS 22.658 <0.001
    Yes 55 (73.33) 175 (43.42)
    No 20 (26.67) 228 (56.58)
   Respiratory failure 1.974 0.16
    Yes 6 (8.0) 17 (4.22)
    No 69 (92.0) 386 (95.78)
   NP 3.667 0.055
    Yes 27 (36.0) 102 (25.31)
    No 48 (64.0) 301 (74.69)
   Septicemia of newborn 0.187 0.67
    Yes 4 (5.33) 17 (4.22)
    No 71 (94.67) 386 (95.78)
   Neonatal asphyxia 6.879 0.009
    Yes 10 (13.33) 21 (5.21)
    No 65 (86.67) 382 (94.79)
   NEC 0.009 0.93
    Yes 2 (2.67) 10 (2.48)
    No 73 (97.33) 393 (97.52)
   Neonatal hypoglycemia 0.759 0.38
    Yes 2 (2.67) 20 (4.96)
    No 73 (97.33) 383 (95.04)
   Pneumothorax 0.511 0.48
    Yes 2 (2.76) 18 (4.47)
    No 73 (97.33) 385 (95.53)
Laboratory data of pediatric patients
   WBC (×109/L) 11.58 (8.97, 16.50) 12.90 (9.83, 16.48) −1.068 0.29
   NE (%) 46.60 (35.00, 59.30) 48.30 (38.30, 59.80) −0.701 0.48
   PLT (×109/L) 245.00 (192.00, 288.00) 250.50 (198.00, 291.00) −0.053 0.96
   Hb (g/L) 183.00 (165.00, 221.00) 186.00 (166.00, 216.00) −0.341 0.73
   CRP (mg/L) 0.74 (0.50, 1.49) 0.78 (0.50, 1.46) −0.076 0.94
   PH 7.33 (7.29, 7.37) 7.34 (7.29, 7.38) −0.605 0.55
   PCO2 (mmHg) 45.90 (37.50, 53.50) 45.10 (38.90, 52.10) −0.252 0.80
Treatment status of pediatric patients
   PS use 9.586 0.002
    Yes 18 (24.0) 44 (10.92)
    No 57 (76.0) 359 (89.08)
   A/C mode ventilation duration (d) 0.00 (0.00, 3.40) 0.00 (0.00, 1.00) −5.189 <0.001
   Biphasic mode ventilation duration (d) 2.00 (0.00, 8.00) 0.00 (0.00, 2.00) −7.201 <0.001
   CPAP mode ventilation duration (d) 8.00 (0.00, 19.00) 0.00 (0.00, 2.00) −7.468 <0.001
   HFO mode ventilation duration (d) 0.00 (0.00, 0.00) 0.00 (0.00, 0.00) −0.926 0.35
   Total duration of ventilator assisted breathing (d) 17.00 (3.00, 37.00) 1.00 (0.00, 4.00) −8.510 <0.001
   Duration of oxygen use with a headgear (d) 3.00 (0.00, 8.00) 2.00 (0.00, 4.00) −2.314 0.02
   Time of intravenous high nutrition treatment (d) 9.00 (0.00, 17.00) 0.00 (0.00, 5.00) −6.360 <0.001
   Cumulated nasogastric feeding time (d) 24.00 (5.00, 40.00) 2.00 (0.00, 8.00) −8.135 <0.001
   The age at the start of feeding (d) 2.00 (0.00, 14.00) 3.00 (0.00, 14.00) −0.600 0.55

Data are presented as n (%) or median (P25, P75). , Chi-squared value; , Z value. A/C, assist/control; BPD, bronchopulmonary dysplasia; CPAP, continuous positive airway pressure; CRP, C-reactive protein; GDM, gestational diabetes mellitus; Hb, hemoglobin; HFO, high-frequency oscillation ventilation; NE, neutrophils; NEC, necrotizing enterocolitis; NP, neonatal pneumonia; NRDS, neonatal respiratory distress syndrome; PCO2, partial pressure of carbon dioxide; PIH, pregnancy-induced hypertension; PLT, platelets; PS, pulmonary surfactant; WBC, white blood cell.

Feature variable screening of BPD prediction model

Variables with statistical significance in univariate analysis were included for feature variable screening using LASSO regression. Through 10-fold cross-validation, the lambda with the minimum mean square error was selected, namely lambda.min, as the optimal value (0.0287). Using LASSO regression, 15 influencing factors were reduced to 7 potential predictors, namely PS, duration of ventilation in CPAP mode, cumulated nasogastric feeding time, NRDS, neonatal asphyxia, birth weight, and total duration of ventilator assisted breathing, thereby ensuring the optimal model complexity and generalization ability. See Figures 1,2.

Figure 1 Screening of risk factors based on LASSO regression. LASSO, least absolute shrinkage and selection operator.
Figure 2 Coefficient distribution. CPAP, continuous positive airway pressure; NRDS, neonatal respiratory distress syndrome.

Logistic regression analysis of BPD occurrence in preterm infants with a gestational age ≤32 weeks

Using whether BPD occurred as the dependent variable, the seven variables selected by LASSO regression were applied to multivariate logistic regression for analysis. The variable assignments are shown in Table 2. The results indicate that birth weight, total duration of mechanical ventilation, presence of asphyxia, and presence of NRDS are the four independent risk factors affecting the occurrence of BPD in premature infants born at ≤32 weeks (P<0.05). Using the R4.2.1 software and based on logistic regression analysis, the four screened predictive factors were included in the model to construct a nomogram for the occurrence of BPD in premature infants ≤32 weeks’ gestation. The total score ranges from 0 to 140 points, with the probability of BPD occurrence in premature infants ≤32 weeks’ gestation at the bottom line. A higher total score indicates a higher risk value. See Figure 3.

Table 2

Multivariate logistic regression analysis of BPD in premature infants with gestational age ≤32 weeks

Variable B SE Wald χ2 P OR 95% CI
Constant −1.957 0.724 7.294 0.007 0.141
PS (using No as reference) 0.504 0.453 1.238 0.266 1.655 0.681–4.023
CPAP mode ventilation duration (using No as reference) 0.052 0.036 2.094 0.148 1.054 0.982–1.131
Accumulated nasogastric feeding time (original value) 0.006 0.025 0.060 0.806 1.006 0.957–1.058
NRDS (using No as reference) 1.555 0.385 16.343 <0.001 4.735 2.228–10.063
Neonatal asphyxia (original value) 1.750 0.486 12.956 <0.001 5.753 2.219–14.917
Birth weight (original value) −0.001 0.001 8.667 0.003 0.999 0.998–1.000
Ventilator assisted breathing (original value) 0.079 0.031 6.634 0.010 1.082 1.019–1.149

BPD, bronchopulmonary dysplasia; CPAP, continuous positive airway pressure; CI, confidence interval; NRDS, neonatal respiratory distress syndrome; OR, odds ratio; PS, pulmonary surfactant; SE, standard error.

Figure 3 Nomogram model for the occurrence of BPD in preterm infants with a gestational age ≤32 weeks. 0 represents no illness; 1 represents illness. BPD, bronchopulmonary dysplasia; NRDS, neonatal respiratory distress syndrome.

Classification decision tree analysis of CHAID algorithm for influencing factors of BPD in premature infants with gestational age ≤32 weeks

Establish a classification decision tree model based on the set growth and construction rules, including 3 layers, 7 nodes, and 4 terminal nodes, as shown in Figure 4. As can be seen from the model diagram, Ventilator assisted breathing,Birth weight and NRDS are the main factors affecting the occurrence of BPD in premature infants with gestational age ≤32 weeks. Among them, the root node is ventilator assisted breathing, indicating the highest correlation with the occurrence of BPD, Ventilator assisted breathing was divided into two subgroups: ≤9 and >9 days, among them, 5.6% of premature infants with gestational age ≤32 weeks developed BPD in the subgroup of ventilator-associated breathing ≤9 days. In addition, this subgroup is influenced by birth weight, with a probability of 18.6% for premature infants with birth weight ≤1,060 g and only 2.4% for those with birth weight >1,060 g. This subgroup is also influenced by whether NRDS occurs.

Figure 4 CHAID classification decision diagram of influencing factors for BPD in premature infants with gestational age ≤32 weeks. BPD, bronchopulmonary dysplasia; CHAID, Chi-squared Automatic Interaction Detection; df, degree of freedom; NRDS, neonatal respiratory distress syndrome.

Comparison of binary logistic regression model and classification decision tree model analysis results of BPD in premature infants with gestational age ≤32 weeks

Based on the predicted probabilities obtained from the two models as state variables, ROC curves were plotted separately, and the results are shown in Figure 5. The classification performance is shown in Table 3. The ROC curves of both models are far from the diagonal. In the training set, the AUC of the binary logistic regression model is 0.901 [95% confidence interval (CI): 0.868–0.945], with an accuracy of 90.48%, sensitivity of 62.26%, specificity of 95.76%, and F1 score of 0.67. The AUC of the classification decision tree model based on CHAID algorithm is 0.809 (95% CI: 0.870–0.910), with an accuracy of 88.10%, sensitivity of 43.39%, specificity of 96.47%, and F1 score of 0.53. In the validation set, the AUC of the logistic regression model is 0.912, and the AUC of the decision tree model is 0.871, as shown in Figure 5. By comparing the ROC curves of the two models, Z=9.568 and P<0.001 were obtained. The test results showed statistical differences, indicating that there were differences in the predictions of the two models.

Figure 5 Comparison of the area under the ROC curve between the two models. AUC, area under the curve; DT, decision tree; LR, logistic regression; ROC, receiver operating characteristic.

Table 3

Comparison of classification effects between logistic regression model and decision tree model

Model AUC Accuracy Sensitivity Specificity F1 score
Logistic regression (train) 0.901 0.905 0.623 0.957 0.673
Logistic regression (validation) 0.912 0.915 0.773 0.942 0.739
Decision tree (train) 0.809 0.881 0.434 0.964 0.535
Decision tree (validation) 0.871 0.873 0.409 0.958 0.500

AUC, area under the curve.


Discussion

The clinical significance of constructing a prediction model for the occurrence of bpd in preterm infants with a gestational age ≤32 weeks

BPD is a common respiratory complication in preterm infants, not only significantly increasing the mortality rate during the neonatal period but also being closely related to long-term chronic lung diseases and neurodevelopmental disorders. Due to the immature lung development of preterm infants with a gestational age ≤32 weeks, their risk of developing BPD is significantly increased. Studies have reported that the incidence rates of BPD in preterm infants with gestational ages of 30–32, 28+1–29+6, and ≤28 weeks are 19.62%, 39.24%, and 56.12% respectively (9). The results of this study show that the incidence rate of BPD in preterm infants with a gestational age ≤32 weeks is 15.69% (75/478), slightly lower than previous studies, which may be related to the high degree of attention paid to extremely preterm infants in the department. By establishing a comprehensive respiratory management mechanism early on in the birth of preterm infants, adopting a refined care model, simulating the extrauterine environment of the mother, and ensuring the physiological development of the lungs of extremely preterm infants, the occurrence of BPD has been reduced to a certain extent. However, the occurrence of BPD is still a common problem in preterm infants, and the mortality rate and complication rate of preterm infants with BPD are significantly higher than those of general preterm infants, with prolonged hospitalization time and a high incidence of adverse neurodevelopmental outcomes (10), bringing heavy burdens to families and society. Reducing the occurrence of BPD in preterm infants has become one of the goals of neonatal medical staff. Although there are already studies on BPD risk prediction models at home and abroad (11,12), some prediction indicators of the models are not easy to obtain, the applicability needs to be strengthened, and external verification is lacking. Therefore, it is necessary to construct a special risk prediction model for the occurrence of BPD in preterm infants with a gestational age ≤32 weeks, so that clinicians can conduct individualized risk assessments of the children in the early stage, optimize resource allocation, achieve early warning and precise intervention, and reduce the occurrence of BPD through scientific and standardized intervention methods.

The logistic regression model has better predictive performance than the classification decision tree model

In this study, the area under the ROC curve of logistic regression was 0.901, and the area under the ROC curve of the classification decision tree was 0.801. The performance of the logistic regression model was better than that of the classification decision tree model. Logistic, as a traditional statistical model, has been widely studied in clinical research of neonatology. Logistic regression can calculate the quantitative dependence relationship between each meaningful independent variable and the dependent variable, and the results are easy to interpret. The effect of the independent variable on the dependent variable can be quantified by the OR value, which better reflects the information about the relationship between the independent variable and the dependent variable than decision trees.In addition, logistic regression has strong robustness and is not prone to overfitting (13). In Xia et al.’s study (14), a column chart prediction model for BPD in preterm infants was constructed based on partial clinical information, which also demonstrated good accuracy, discriminative ability, and clinical applicability. Unlike this model, which lists delivery mode, gender, and length of hospital stay as the main predictors of BPD, this model uses a combination of univariate and LASSO regression to screen the predictor variables, greatly avoiding multicollinearity among variables and making the prediction results more accurate. This study used multiple methods to ultimately select four objective variables and constructed a prediction model using logistic regression. A Nomogram Model was also drawn to facilitate clinical doctors to explain the risks faced by the child to parents more clearly and based on evidence, and to jointly discuss individualized diagnosis and treatment plans and long-term follow-up plans. This helps alleviate parental anxiety, enhance treatment compliance, reduce the incidence of BPD, improve the short-term and long-term respiratory prognosis of the child, improve the quality of life, and optimize the overall efficiency of medical resource utilization.

Predictive risk factors for the occurrence of BPD in preterm infants with a gestational age ≤32 weeks

In this study, birth weight, total duration of mechanical ventilation, presence of asphyxia, and presence of NRDS are the 4 independent risk factors affecting the development of BPD in preterm infants ≤32 weeks (P<0.05). In recent years, with the increasing proportion of preterm and low birth weight infants, low birth weight is one of the high-risk factors for preterm infants to develop BPD, which is similar to the results of this study (15). Alveolar development goes through five stages: embryonic period, glandular period, tubular period, saccular period and alveolar period. Due to preterm birth, the lung development of low-birth-weight infants is often immature, and their lung development is still in the pseudoglandular period, where distal primitive epithelial cells cannot continue to branch to form conducting departments, and subsequently cannot enter the pulmonary alveolar differentiation period, hindering the normal differentiation of alveolar epithelial progenitor cells into type I and type II epithelial cells. The relative deficiency of primary surfactant synthesis during this period leads to alveolar atelectasis, resulting in respiratory distress and an increased risk of BPD (16). Previous studies have found that mechanical ventilation >7 days is an independent risk factor for BPD (17). Based on the immature lung development of low birth weight infants, they often need to receive long-term oxygen therapy and mechanical ventilation assistance, which also increases the chances of lung infection and the risk of causing disease; mechanical ventilation induces inflammatory cascade reactions, damaging vascular endothelial cells and alveolar epithelial cells through excessive immune activation and inflammatory cell release of a large number of pro-inflammatory cytokines and inflammatory mediators, destroying the integrity of the alveolar-capillary barrier structure, leading to interstitial pulmonary edema, extracellular matrix reconstruction, inhibiting the maturation of alveolar type II epithelial cells, thus causing delayed alveolarization and progressive lung parenchymal damage, causing significant destructive effects in the immature lung tissue of preterm infants and becoming an important pathogenic mechanism of BPD (18,19); in addition, when mechanical ventilation is prolonged, excessive tidal volume and oxygen concentration can trigger inflammatory reactions, and continuous inflammatory reactions and high oxygen exposure can damage the immature pulmonary vessels and alveoli of the children, thus causing BPD (9); prolonged mechanical ventilation also indicates that preterm infants require repeated endotracheal intubation, which is highly likely to cause lung infections and respiratory tract damage in preterm infants, exacerbating the occurrence of BPD. Classic BPD mostly occurs in RDS children, mainly due to the lack of PS, showing progressively worsening respiratory difficulties in early life. The need for mechanical ventilation or oxygen therapy in preterm infants with lung maturation leads to oxidative stress injury, pulmonary vascular remodeling and pulmonary fibrosis, eventually progressing to BPD (20); while RDS in preterm infants increases the consumption of PS, coupled with prolonged mechanical ventilation, it exacerbates lung injury to some extent, making preterm infants more prone to developing BPD. The European consensus guidelines recommend that premature infants with RDS need to prioritize lung-protective ventilation strategies, shorten the duration of oxygen therapy as much as possible, and administer therapeutic PS early to promote lung maturation and reduce the incidence of BPD in premature infants (21). Furthermore, hypoxia caused by asphyxia can exacerbate lung inflammatory responses and OS damage, lead to systemic organ hypoperfusion, increase pulmonary vascular resistance, significantly decrease the efficiency of gas exchange per unit time, and cause electrolyte metabolic disturbances such as calcium and phosphorus imbalances. These further damages lung tissue, thereby inducing pulmonary vascular remodeling and adverse lung development (22,23).

This study used LASSO regression for variable screening, aiming to reduce model overfitting and enhance generalization ability. However, this approach may exclude some variables that are important in traditional clinical cognition. For example, in the univariate analysis of this study, factors such as maternal age, placental abnormalities, amniotic fluid contamination, use of PS, and duration of CPAP ventilation were significantly associated with the occurrence of BPD. However, in the LASSO regression process, these variables were compressed to coefficients of zero and did not enter the final model. This may be due to collinearity between these variables and the four strong predictive factors ultimately selected (such as mechanical ventilation duration, NRDS), or their effects being mediated by these factors. For example, the use of PS is highly correlated with the severity of NRDS. The duration of CPAP ventilation is often a part of the total duration of mechanical ventilation. From a clinical perspective, this does not mean that these factors are unimportant, but rather suggests that in the presence of more direct injury factors (such as prolonged mechanical ventilation), the marginal contribution of these upstream or indirect factors to the model is limited. Therefore, clinical doctors still need to comprehensively consider the potential impact of these excluded variables when applying this model, especially for children who have not reached the indications for mechanical ventilation.

Limitations

Although the predictive model constructed in this study has good predictive efficacy, it is still subject to some limitations. Firstly, due to the influence of the investigation time, the sample size included in this study is relatively small, and the population characteristics are singular. In addition, this study only involves cross-sectional data during the hospitalization of preterm infants, lacking dynamic data capture of preterm infant development. In the future, it is possible to dynamically assess the progress of BPD in preterm infants, and to externally verify the model through large samples and multiple centers to enhance the extrapolation of the model.


Conclusions

Both logistic regression and decision tree models have certain classification prediction value, among which the logistic regression model has better predictive ability than the decision tree model. Our study constructed a nomogram prediction model for the occurrence of BPD in preterm infants with a gestational age of ≤32 weeks through birth weight, duration of mechanical ventilation, asphyxia, and NRDS. The risk factors are easy for clinical medical staff to detect, with strong operability. The nomogram can be used to quantitatively assess the risk probability of BPD occurrence in preterm infants with a gestational age of ≤32 weeks, providing a scientific basis for early identification of BPD and the implementation of preventive intervention strategies, which helps to improve the survival quality of extremely preterm and ultra-preterm infants. This study still has certain limitations, and subsequent research is needed to continue mining data and improve the extrapolation of the model.


Acknowledgments

None.


Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tp.amegroups.com/article/view/10.21037/tp-2026-1-0003/rc

Data Sharing Statement: Available at https://tp.amegroups.com/article/view/10.21037/tp-2026-1-0003/dss

Peer Review File: Available at https://tp.amegroups.com/article/view/10.21037/tp-2026-1-0003/prf

Funding: The study was supported by the Shanghai First Maternity and Infant Hospital Hospital-Level Research Project (2024HL25).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tp.amegroups.com/article/view/10.21037/tp-2026-1-0003/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study has been approved by the Ethics Committee of Tongji University Affiliated Obstetrics and Gynecology Hospital (ethical No. KS25352). Individual consent for this retrospective analysis was waived by the ethics committee. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Cao Y, Jiang S, Sun J, et al. Assessment of Neonatal Intensive Care Unit Practices, Morbidity, and Mortality Among Very Preterm Infants in China. JAMA Netw Open 2021;4:e2118904. [Crossref] [PubMed]
  2. Yin Y, Qi Y, Hong D, et al. Study on risk factors and follow-up outcome at 2 years in preterm infants with bronchopulmonary dysplasia. Chinese Journal of Evidence-Based Pediatric 2016;11:113-7.
  3. Rutkowska M, Hożejowski R, Helwich E, et al. Severe bronchopulmonary dysplasia - incidence and predictive factors in a prospective, multicenter study in very preterm infants with respiratory distress syndrome. J Matern Fetal Neonatal Med 2019;32:1958-64. [Crossref] [PubMed]
  4. Resch B, Kurath-Koller S, Eibisberger M, et al. Prematurity and the burden of influenza and respiratory syncytial virus disease. World J Pediatr 2016;12:8-18. [Crossref] [PubMed]
  5. Shukla VV, Ambalavanan N. Recent Advances in Bronchopulmonary Dysplasia. Indian J Pediatr 2021;88:690-5. [Crossref] [PubMed]
  6. Pallás Alonso C, García González P, Jimenez Moya A, et al. Follow-up protocol for newborns of birthweight less than 1500 g or less than 32 weeks gestation. An Pediatr (Barc) 2018;88:229.e1-229.e10.
  7. Jobe AH, Bancalari E. Bronchopulmonary dysplasia. Am J Respir Crit Care Med 2001;163:1723-9. [Crossref] [PubMed]
  8. Jiao LP, Zhang XZ, Yang YY, et al. Application and comparison of decision tree and Logistic regression model in analysis of factors affecting drinking water. Chinese Journal of Health Statistics 2020;37:874-877, 882.
  9. Gupta BK, Saha AK, Mukherjee S, et al. Minimally invasive surfactant therapy versus InSurE in preterm neonates of 28 to 34 weeks with respiratory distress syndrome on non-invasive positive pressure ventilation-a randomized controlled trial. Eur J Pediatr 2020;179:1287-93. [Crossref] [PubMed]
  10. Chinese Society of Pediatrics, Neonatology Group, Editorial Board of Chinese Journal of Pediatrics. Expert consensus on clinical management of premature infants with brochopulmonary dysplasia. Chinese Journal of Pediatrics 2020;58358-65.
  11. Romijn M, Dhiman P, Finken MJJ, et al. Prediction Models for Bronchopulmonary Dysplasia in Preterm Infants: A Systematic Review and Meta-Analysis. J Pediatr 2023;258:113370. [Crossref] [PubMed]
  12. Xu L, Mao L, Yuan S, et al. Establishment and evaluation of a nomogram model for predicting bronchopulmonary dysplasia in preterm infants with low birth weight. Maternal & Child Health Care of China 2023;38:1834-7.
  13. Wu W, Tan X, Sun D, et al. Application of Logistic regression analysis model and decision tree analysis in early warning indicators of hypertension and diabetes comorbidity. Chinese Journal of Disease Control & Prevention 2022;26:827-33.
  14. Xia L, Lv R, Zhao J, et al. The construction and application of a nomogram prediction model for the risk of bronchopulmonary dysplasia in extremely preterm infants. Journal of Chinese Practical Diagnosis and Therapy 2025;39:249-56.
  15. Yao Q, Shen QL, Huang GY, et al. Relationship between bronchopulmonary dysplasia phenotypes with high-resolution computed tomography score in early preterm infants. Front Pediatr 2022;10:935733. [Crossref] [PubMed]
  16. Shin JE, Yoon SJ, Lim J, et al. Pulmonary Surfactant Replacement Therapy for Respiratory Distress Syndrome in Neonates: a Nationwide Epidemiological Study in Korea. J Korean Med Sci 2020;35:e253. [Crossref] [PubMed]
  17. Solevåg AL, Cheung PY, Schmölzer GM. Bi-Level Noninvasive Ventilation in Neonatal Respiratory Distress Syndrome. A Systematic Review and Meta-Analysis. Neonatology 2021;118:264-73.
  18. Cannavò L, Perrone S, Viola V, et al. Oxidative Stress and Respiratory Diseases in Preterm Newborns. Int J Mol Sci 2021;22:12504. [Crossref] [PubMed]
  19. Kalikkot Thekkeveedu R, El-Saie A, Prakash V, et al. Ventilation-Induced Lung Injury (VILI) in Neonates: Evidence-Based Concepts and Lung-Protective Strategies. J Clin Med 2022;11:557. [Crossref] [PubMed]
  20. Welch B, Rose R, Myers J, et al. Decreasing early invasive mechanical ventilation exposure in preterm infants: a quality improvement initiative. J Perinatol 2025;45:149-56. [Crossref] [PubMed]
  21. Sweet DG, Carnielli V, Greisen G, et al. European Consensus Guidelines on the Management of Respiratory Distress Syndrome - 2019 Update. Neonatology 2019;115:432-50. [Crossref] [PubMed]
  22. Cai H, Jiang L, Liu Y, et al. Development and verification of a risk prediction model for bronchopulmonary dysplasia in very low birth weight infants. Transl Pediatr 2021;10:2533-43. [Crossref] [PubMed]
  23. Valenzuela-Stutman D, Marshall G, Tapia JL, et al. Bronchopulmonary dysplasia: risk prediction models for very-low- birth-weight infants. J Perinatol 2019;39:1275-81. [Crossref] [PubMed]
Cite this article as: Zhao P, Zhao L, Zhang Y, Peng M. Analysis and comparative study on the influencing factors of bronchopulmonary dysplasia in premature infants with gestational age ≤32 weeks based on logistic regression and decision tree models. Transl Pediatr 2026;15(5):184. doi: 10.21037/tp-2026-1-0003

Download Citation