Analysis and comparative study on the influencing factors of bronchopulmonary dysplasia in premature infants with gestational age ≤32 weeks based on logistic regression and decision tree models
Highlight box
Key findings
• Birth weight, total duration of mechanical ventilation, neonatal asphyxia, and neonatal respiratory distress syndrome (NRDS) were independent risk factors for bronchopulmonary dysplasia (BPD).
• The logistic regression model showed better predictive performance than the Chi-squared Automatic Interaction Detection decision tree model (area under the curve: 0.901 vs. 0.809 in the training set; P<0.001).
What is known and what is new?
• Low birth weight, prolonged mechanical ventilation, and NRDS are well‑recognized risk factors for BPD in preterm infants.
• This study used least absolute shrinkage and selection operator regression to reduce collinearity and compared logistic regression and decision tree models in the same cohort. The logistic regression model was superior in discrimination and clinical utility, and a nomogram was developed for individualized risk assessment in infants ≤32 weeks.
What is the implication, and what should change now?
• Clinicians can use the nomogram based on four easily obtainable variables (birth weight, mechanical ventilation duration, asphyxia, NRDS) to quantify BPD risk early after birth.
Introduction
Bronchopulmonary dysplasia (BPD) is one of the common chronic lung diseases in premature infants, and its incidence gradually decreases with the increase of gestational age, but it is still at a high level (1). BPD can cause respiratory diseases, such as bronchitis, pneumonia, and other respiratory diseases (2), leading to an increase in mechanical ventilation time, difficulty in weaning from oxygen, etc. (3), prolonging hospital stay and increasing family burden; with the increase of the child’s age, it is prone to complications of respiratory, nervous, and cardiovascular diseases, and the patients may face growth and development problems (4). The pathogenesis of BPD has not been fully elucidated. Existing studies show that perinatal infection, inflammatory response, mechanical ventilation, abnormal repair process after lung tissue damage, and other factors together promote the occurrence and development of BPD (5). Premature infants with a gestational age of ≤32 weeks belong to early preterm infants, and their lung development is more imperfect compared to late preterm infants (6). At present, there is no effective prevention plan for premature infants with BPD, although there are some prediction models, their predictive performance is low and lacks external validation. Therefore, this study intends to analyze the influencing factors of BPD occurrence within 28 days in premature infants with a gestational age of ≤32 weeks, eliminate the collinearity problem between predictors through least absolute shrinkage and selection operator (LASSO) regression, construct a prediction model and verify it, in order to provide a theoretical basis for early clinical intervention to prevent BPD. We present this article in accordance with the TRIPOD reporting checklist (available at https://tp.amegroups.com/article/view/10.21037/tp-2026-1-0003/rc).
Methods
Research object
The study selected 478 premature infants admitted to the Neonatal Department of the Tongji University Affiliated Obstetrics and Gynecology Hospital from January 2023 to December 2024. Based on whether BPD occurred, they were divided into the BPD group (n=75) and the non-BPD group (n=403). Inclusion criteria: (I) gestational age ≤32 weeks; (II) born and treated in this center; (III) clear birth history and complete case information; (IV) meeting the BPD diagnostic criteria [i.e., requiring oxygen support for life, with an oxygen concentration >21% and exceeding 28 days (7)]. Exclusion criteria: (I) those who died within 28 days after birth, were transferred to another hospital, or whose parents gave up treatment (missing BPD outcomes); (II) those with complex congenital heart diseases, genetic metabolic diseases, or congenital respiratory system malformations that affect BPD diagnosis; (III) those with incomplete clinical data. This research strictly screened cases according to the inclusion and exclusion criteria and has been approved by the Ethics Committee of Tongji University Affiliated Obstetrics and Gynecology Hospital (ethical No. KS25352). Individual consent for this retrospective analysis was waived by the ethics committee.This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.
Research methods
Retrospective collection of general clinical data and laboratory-related indicators for the child and mother. (I) Mother’s clinical data: includes age, placental abnormalities, umbilical cord abnormalities, amniotic fluid contamination, premature rupture of membranes, pregnancy-induced hypertension (PIH), gestational diabetes mellitus (GDM), pregnancy-associated anemia, and pregnancy-associated thyroid disease; (II) child’s clinical data: includes hospital stay (days), birth weight, gender, parity, gravidity, whether twins, delivery method, conception method, Apgar score at 1 and 5 minutes, whether infected, neonatal respiratory distress syndrome (NRDS), respiratory failure, neonatal pneumonia (NP), neonatal sepsis, neonatal asphyxia, neonatal necrotizing enterocolitis (NEC), neonatal hypoglycemia, pneumothorax, and incidence of BPD; (III) child’s laboratory data (results of the first examination upon admission): white blood cell (WBC), neutrophils (NE), platelets (PLT), hemoglobin (Hb), C-reactive protein (CRP), hydrogen ion concentration (PH), arterial carbon dioxide partial pressure (PCO2); (IV) child’s treatment details include: total duration of mechanical ventilation (days), duration of assist/control (A/C) mode ventilation (days), duration of Biphasic mode ventilation (days), duration of continuous positive airway pressure (CPAP) mode ventilation (days), duration of HFO mode ventilation (days), duration of oxygen use with head cover (days), duration of intravenous hyperalimentation (days), age on starting feeding (days), cumulative time of nasal feeding (days), and usage of pulmonary surfactant (PS).
Statistical analysis
The SPSS26.0 and R4.2.1 software were used for statistical data analysis. The Kolmogorov-Smirnov test was employed to examine the normal distribution of the data. Based on the distribution of the data, measurement data that conforms to a normal distribution is expressed as mean ± standard deviation (); data with a non-normal distribution is presented as median (interquartile range). For comparisons between groups, the two independent sample t-test was utilized. Categorical data is presented as count (percentage), and for comparisons between groups, the χ2 test or Fisher’s exact test was applied. After conducting univariate analysis, variables with statistical differences were further screened for feature variables using LASSO regression and 10-fold cross validation. Different variable combinations were fitted according to the lambda.min variable selection criteria to validate the model. Establish binary logistic regression and classification decision tree models based on Chi-squared Automatic Interaction Detection (CHAID) with the occurrence of BPD as the dependent variable and the selected variables in LASSO regression as independent variables. The collected data were divided into training and validation sets with an 8:2 ratio. Among them, the classification decision tree based on CHAID algorithm uses the results of chi square test or likelihood ratio Chi-squared test to determine the optimal grouping variables and segmentation points of the decision tree, ultimately forming a classification tree (8). To prevent overfitting, pre pruning techniques are applied to control the sufficient growth of decision trees: the maximum tree depth is 3, the minimum sample size for parent nodes is 100, and the minimum sample size for child nodes is 50. And ten-fold cross-validation was implemented, and calibration curves were drawn by repeatedly sampling 500 times through the Bootstrap method to internally validate the model. The accuracy of the model was evaluated by drawing the receiver operating characteristic curve (ROC), calculating the area under the curve (AUC), calibration curve, decision analysis curve. A P value of less than 0.05 was considered statistically significant.
Results
Comparison of clinical data between BPD and non-BPD groups
The single-factor results show that there are statistical differences in the maternal clinical data regarding age, placental abnormalities, and amniotic fluid contamination (P<0.05). In the neonatal clinical data, there are statistical differences in birth weight, parity, whether it’s a twin birth, Apgar scores at 1 and 5 minutes, NRDS, and neonatal asphyxia (P<0.05). In the neonatal treatment situation, there are statistical differences in whether PS was used, total duration of mechanical ventilation, duration of ventilation in A/C mode, Biphasic mode, CPAP mode, duration of oxygen use with head cover, duration of intravenous hyperalimentation, and the day age when feeding started (P<0.05). For details, see Table 1.
Table 1
| Item | BPD groups (n=75) | Non-BPD groups (n=403) | χ2/Z | P |
|---|---|---|---|---|
| Clinical data of the mother | ||||
| Placental abnormality | 5.880† | 0.02 | ||
| Yes | 20 (26.67) | 64 (15.88) | ||
| No | 55 (73.33) | 339 (84.12) | ||
| Umbilical cord abnormality | 0.704† | 0.40 | ||
| Yes | 13 (17.33) | 55 (13.65) | ||
| No | 62 (82.67) | 348 (86.35) | ||
| Amniotic fluid contamination | 8.125† | 0.004 | ||
| Yes | 15 (20.0) | 36 (8.94) | ||
| No | 60 (80.0) | 367 (91.06) | ||
| Premature rupture of fetal membranes | 3.510† | 0.06 | ||
| Yes | 17 (22.67) | 57 (14.14) | ||
| No | 58 (77.33) | 346 (85.86) | ||
| PIH | 0.016† | 0.90 | ||
| Yes | 6 (8.0) | 34 (8.44) | ||
| No | 69 (92.0) | 369 (91.56) | ||
| GDM | 0.238† | 0.63 | ||
| Yes | 61 (81.33) | 337 (83.62) | ||
| No | 14 (18.67) | 60 (16.38) | ||
| Anemia complicating pregnancy | 0.296† | 0.59 | ||
| Yes | 2 (2.67) | 16 (3.97) | ||
| No | 73 (97.33) | 387 (96.03) | ||
| Pregnancy complicated by thyroid diseases | 0.053† | 0.82 | ||
| Yes | 4 (5.33) | 19 (4.71) | ||
| No | 71 (94.67) | 384 (95.29) | ||
| Clinical data of pediatric patients | ||||
| Length of hospital stay (d) | 36.00 (25.00, 59.00) | 38.00 (26.00, 52.00) | −0.047‡ | 0.96 |
| Birth weight (g) | 1,255.00 (950.00, 1,520.00) | 1,407.50 (1,141.25, 1,655.00) | −3.486‡ | <0.001 |
| Gender | 0.038† | 0.85 | ||
| Boy | 40 (53.33) | 210 (52.11) | ||
| Girl | 35 (46.67) | 193 (47.89) | ||
| Parity (frequency) | 2 (1, 3) | 2 (1, 3) | −1.810‡ | 0.07 |
| Para (frequency) | 1 (1, 2) | 1 (1, 2) | −2.136‡ | 0.03 |
| Twins | 4.784† | 0.03 | ||
| Yes | 35 (46.67) | 135 (33.50) | ||
| No | 40 (53.33) | 268 (66.50) | ||
| Delivery pattern | 1.407† | 0.24 | ||
| Eutocia | 11 (14.67) | 83 (20.60) | ||
| Cesarean | 64 (85.33) | 320 (79.40) | ||
| Mode of conception | 2.214† | 0.14 | ||
| Natural conception | 59 (78.67) | 283 (70.22) | ||
| Conception through assisted reproductive technology | 16 (21.33) | 120 (29.78) | ||
| Apgar 1 minute score | 8 (8, 9) | 9 (8, 9) | −5.252‡ | <0.001 |
| Apgar 5 minutes score | 8 (8, 9) | 9 (8, 9) | −3.634‡ | <0.001 |
| Infection | 0.505† | 0.48 | ||
| Yes | 5 (6.67) | 19 (4.71) | ||
| No | 70 (93.33) | 384 (95.29) | ||
| NRDS | 22.658† | <0.001 | ||
| Yes | 55 (73.33) | 175 (43.42) | ||
| No | 20 (26.67) | 228 (56.58) | ||
| Respiratory failure | 1.974† | 0.16 | ||
| Yes | 6 (8.0) | 17 (4.22) | ||
| No | 69 (92.0) | 386 (95.78) | ||
| NP | 3.667† | 0.055 | ||
| Yes | 27 (36.0) | 102 (25.31) | ||
| No | 48 (64.0) | 301 (74.69) | ||
| Septicemia of newborn | 0.187† | 0.67 | ||
| Yes | 4 (5.33) | 17 (4.22) | ||
| No | 71 (94.67) | 386 (95.78) | ||
| Neonatal asphyxia | 6.879† | 0.009 | ||
| Yes | 10 (13.33) | 21 (5.21) | ||
| No | 65 (86.67) | 382 (94.79) | ||
| NEC | 0.009† | 0.93 | ||
| Yes | 2 (2.67) | 10 (2.48) | ||
| No | 73 (97.33) | 393 (97.52) | ||
| Neonatal hypoglycemia | 0.759† | 0.38 | ||
| Yes | 2 (2.67) | 20 (4.96) | ||
| No | 73 (97.33) | 383 (95.04) | ||
| Pneumothorax | 0.511† | 0.48 | ||
| Yes | 2 (2.76) | 18 (4.47) | ||
| No | 73 (97.33) | 385 (95.53) | ||
| Laboratory data of pediatric patients | ||||
| WBC (×109/L) | 11.58 (8.97, 16.50) | 12.90 (9.83, 16.48) | −1.068‡ | 0.29 |
| NE (%) | 46.60 (35.00, 59.30) | 48.30 (38.30, 59.80) | −0.701‡ | 0.48 |
| PLT (×109/L) | 245.00 (192.00, 288.00) | 250.50 (198.00, 291.00) | −0.053‡ | 0.96 |
| Hb (g/L) | 183.00 (165.00, 221.00) | 186.00 (166.00, 216.00) | −0.341‡ | 0.73 |
| CRP (mg/L) | 0.74 (0.50, 1.49) | 0.78 (0.50, 1.46) | −0.076‡ | 0.94 |
| PH | 7.33 (7.29, 7.37) | 7.34 (7.29, 7.38) | −0.605‡ | 0.55 |
| PCO2 (mmHg) | 45.90 (37.50, 53.50) | 45.10 (38.90, 52.10) | −0.252‡ | 0.80 |
| Treatment status of pediatric patients | ||||
| PS use | 9.586† | 0.002 | ||
| Yes | 18 (24.0) | 44 (10.92) | ||
| No | 57 (76.0) | 359 (89.08) | ||
| A/C mode ventilation duration (d) | 0.00 (0.00, 3.40) | 0.00 (0.00, 1.00) | −5.189‡ | <0.001 |
| Biphasic mode ventilation duration (d) | 2.00 (0.00, 8.00) | 0.00 (0.00, 2.00) | −7.201‡ | <0.001 |
| CPAP mode ventilation duration (d) | 8.00 (0.00, 19.00) | 0.00 (0.00, 2.00) | −7.468‡ | <0.001 |
| HFO mode ventilation duration (d) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | −0.926‡ | 0.35 |
| Total duration of ventilator assisted breathing (d) | 17.00 (3.00, 37.00) | 1.00 (0.00, 4.00) | −8.510‡ | <0.001 |
| Duration of oxygen use with a headgear (d) | 3.00 (0.00, 8.00) | 2.00 (0.00, 4.00) | −2.314‡ | 0.02 |
| Time of intravenous high nutrition treatment (d) | 9.00 (0.00, 17.00) | 0.00 (0.00, 5.00) | −6.360‡ | <0.001 |
| Cumulated nasogastric feeding time (d) | 24.00 (5.00, 40.00) | 2.00 (0.00, 8.00) | −8.135‡ | <0.001 |
| The age at the start of feeding (d) | 2.00 (0.00, 14.00) | 3.00 (0.00, 14.00) | −0.600‡ | 0.55 |
Data are presented as n (%) or median (P25, P75). †, Chi-squared value; ‡, Z value. A/C, assist/control; BPD, bronchopulmonary dysplasia; CPAP, continuous positive airway pressure; CRP, C-reactive protein; GDM, gestational diabetes mellitus; Hb, hemoglobin; HFO, high-frequency oscillation ventilation; NE, neutrophils; NEC, necrotizing enterocolitis; NP, neonatal pneumonia; NRDS, neonatal respiratory distress syndrome; PCO2, partial pressure of carbon dioxide; PIH, pregnancy-induced hypertension; PLT, platelets; PS, pulmonary surfactant; WBC, white blood cell.
Feature variable screening of BPD prediction model
Variables with statistical significance in univariate analysis were included for feature variable screening using LASSO regression. Through 10-fold cross-validation, the lambda with the minimum mean square error was selected, namely lambda.min, as the optimal value (0.0287). Using LASSO regression, 15 influencing factors were reduced to 7 potential predictors, namely PS, duration of ventilation in CPAP mode, cumulated nasogastric feeding time, NRDS, neonatal asphyxia, birth weight, and total duration of ventilator assisted breathing, thereby ensuring the optimal model complexity and generalization ability. See Figures 1,2.
Logistic regression analysis of BPD occurrence in preterm infants with a gestational age ≤32 weeks
Using whether BPD occurred as the dependent variable, the seven variables selected by LASSO regression were applied to multivariate logistic regression for analysis. The variable assignments are shown in Table 2. The results indicate that birth weight, total duration of mechanical ventilation, presence of asphyxia, and presence of NRDS are the four independent risk factors affecting the occurrence of BPD in premature infants born at ≤32 weeks (P<0.05). Using the R4.2.1 software and based on logistic regression analysis, the four screened predictive factors were included in the model to construct a nomogram for the occurrence of BPD in premature infants ≤32 weeks’ gestation. The total score ranges from 0 to 140 points, with the probability of BPD occurrence in premature infants ≤32 weeks’ gestation at the bottom line. A higher total score indicates a higher risk value. See Figure 3.
Table 2
| Variable | B | SE | Wald χ2 | P | OR | 95% CI |
|---|---|---|---|---|---|---|
| Constant | −1.957 | 0.724 | 7.294 | 0.007 | 0.141 | – |
| PS (using No as reference) | 0.504 | 0.453 | 1.238 | 0.266 | 1.655 | 0.681–4.023 |
| CPAP mode ventilation duration (using No as reference) | 0.052 | 0.036 | 2.094 | 0.148 | 1.054 | 0.982–1.131 |
| Accumulated nasogastric feeding time (original value) | 0.006 | 0.025 | 0.060 | 0.806 | 1.006 | 0.957–1.058 |
| NRDS (using No as reference) | 1.555 | 0.385 | 16.343 | <0.001 | 4.735 | 2.228–10.063 |
| Neonatal asphyxia (original value) | 1.750 | 0.486 | 12.956 | <0.001 | 5.753 | 2.219–14.917 |
| Birth weight (original value) | −0.001 | 0.001 | 8.667 | 0.003 | 0.999 | 0.998–1.000 |
| Ventilator assisted breathing (original value) | 0.079 | 0.031 | 6.634 | 0.010 | 1.082 | 1.019–1.149 |
BPD, bronchopulmonary dysplasia; CPAP, continuous positive airway pressure; CI, confidence interval; NRDS, neonatal respiratory distress syndrome; OR, odds ratio; PS, pulmonary surfactant; SE, standard error.
Classification decision tree analysis of CHAID algorithm for influencing factors of BPD in premature infants with gestational age ≤32 weeks
Establish a classification decision tree model based on the set growth and construction rules, including 3 layers, 7 nodes, and 4 terminal nodes, as shown in Figure 4. As can be seen from the model diagram, Ventilator assisted breathing,Birth weight and NRDS are the main factors affecting the occurrence of BPD in premature infants with gestational age ≤32 weeks. Among them, the root node is ventilator assisted breathing, indicating the highest correlation with the occurrence of BPD, Ventilator assisted breathing was divided into two subgroups: ≤9 and >9 days, among them, 5.6% of premature infants with gestational age ≤32 weeks developed BPD in the subgroup of ventilator-associated breathing ≤9 days. In addition, this subgroup is influenced by birth weight, with a probability of 18.6% for premature infants with birth weight ≤1,060 g and only 2.4% for those with birth weight >1,060 g. This subgroup is also influenced by whether NRDS occurs.
Comparison of binary logistic regression model and classification decision tree model analysis results of BPD in premature infants with gestational age ≤32 weeks
Based on the predicted probabilities obtained from the two models as state variables, ROC curves were plotted separately, and the results are shown in Figure 5. The classification performance is shown in Table 3. The ROC curves of both models are far from the diagonal. In the training set, the AUC of the binary logistic regression model is 0.901 [95% confidence interval (CI): 0.868–0.945], with an accuracy of 90.48%, sensitivity of 62.26%, specificity of 95.76%, and F1 score of 0.67. The AUC of the classification decision tree model based on CHAID algorithm is 0.809 (95% CI: 0.870–0.910), with an accuracy of 88.10%, sensitivity of 43.39%, specificity of 96.47%, and F1 score of 0.53. In the validation set, the AUC of the logistic regression model is 0.912, and the AUC of the decision tree model is 0.871, as shown in Figure 5. By comparing the ROC curves of the two models, Z=9.568 and P<0.001 were obtained. The test results showed statistical differences, indicating that there were differences in the predictions of the two models.
Table 3
| Model | AUC | Accuracy | Sensitivity | Specificity | F1 score |
|---|---|---|---|---|---|
| Logistic regression (train) | 0.901 | 0.905 | 0.623 | 0.957 | 0.673 |
| Logistic regression (validation) | 0.912 | 0.915 | 0.773 | 0.942 | 0.739 |
| Decision tree (train) | 0.809 | 0.881 | 0.434 | 0.964 | 0.535 |
| Decision tree (validation) | 0.871 | 0.873 | 0.409 | 0.958 | 0.500 |
AUC, area under the curve.
Discussion
The clinical significance of constructing a prediction model for the occurrence of bpd in preterm infants with a gestational age ≤32 weeks
BPD is a common respiratory complication in preterm infants, not only significantly increasing the mortality rate during the neonatal period but also being closely related to long-term chronic lung diseases and neurodevelopmental disorders. Due to the immature lung development of preterm infants with a gestational age ≤32 weeks, their risk of developing BPD is significantly increased. Studies have reported that the incidence rates of BPD in preterm infants with gestational ages of 30–32, 28+1–29+6, and ≤28 weeks are 19.62%, 39.24%, and 56.12% respectively (9). The results of this study show that the incidence rate of BPD in preterm infants with a gestational age ≤32 weeks is 15.69% (75/478), slightly lower than previous studies, which may be related to the high degree of attention paid to extremely preterm infants in the department. By establishing a comprehensive respiratory management mechanism early on in the birth of preterm infants, adopting a refined care model, simulating the extrauterine environment of the mother, and ensuring the physiological development of the lungs of extremely preterm infants, the occurrence of BPD has been reduced to a certain extent. However, the occurrence of BPD is still a common problem in preterm infants, and the mortality rate and complication rate of preterm infants with BPD are significantly higher than those of general preterm infants, with prolonged hospitalization time and a high incidence of adverse neurodevelopmental outcomes (10), bringing heavy burdens to families and society. Reducing the occurrence of BPD in preterm infants has become one of the goals of neonatal medical staff. Although there are already studies on BPD risk prediction models at home and abroad (11,12), some prediction indicators of the models are not easy to obtain, the applicability needs to be strengthened, and external verification is lacking. Therefore, it is necessary to construct a special risk prediction model for the occurrence of BPD in preterm infants with a gestational age ≤32 weeks, so that clinicians can conduct individualized risk assessments of the children in the early stage, optimize resource allocation, achieve early warning and precise intervention, and reduce the occurrence of BPD through scientific and standardized intervention methods.
The logistic regression model has better predictive performance than the classification decision tree model
In this study, the area under the ROC curve of logistic regression was 0.901, and the area under the ROC curve of the classification decision tree was 0.801. The performance of the logistic regression model was better than that of the classification decision tree model. Logistic, as a traditional statistical model, has been widely studied in clinical research of neonatology. Logistic regression can calculate the quantitative dependence relationship between each meaningful independent variable and the dependent variable, and the results are easy to interpret. The effect of the independent variable on the dependent variable can be quantified by the OR value, which better reflects the information about the relationship between the independent variable and the dependent variable than decision trees.In addition, logistic regression has strong robustness and is not prone to overfitting (13). In Xia et al.’s study (14), a column chart prediction model for BPD in preterm infants was constructed based on partial clinical information, which also demonstrated good accuracy, discriminative ability, and clinical applicability. Unlike this model, which lists delivery mode, gender, and length of hospital stay as the main predictors of BPD, this model uses a combination of univariate and LASSO regression to screen the predictor variables, greatly avoiding multicollinearity among variables and making the prediction results more accurate. This study used multiple methods to ultimately select four objective variables and constructed a prediction model using logistic regression. A Nomogram Model was also drawn to facilitate clinical doctors to explain the risks faced by the child to parents more clearly and based on evidence, and to jointly discuss individualized diagnosis and treatment plans and long-term follow-up plans. This helps alleviate parental anxiety, enhance treatment compliance, reduce the incidence of BPD, improve the short-term and long-term respiratory prognosis of the child, improve the quality of life, and optimize the overall efficiency of medical resource utilization.
Predictive risk factors for the occurrence of BPD in preterm infants with a gestational age ≤32 weeks
In this study, birth weight, total duration of mechanical ventilation, presence of asphyxia, and presence of NRDS are the 4 independent risk factors affecting the development of BPD in preterm infants ≤32 weeks (P<0.05). In recent years, with the increasing proportion of preterm and low birth weight infants, low birth weight is one of the high-risk factors for preterm infants to develop BPD, which is similar to the results of this study (15). Alveolar development goes through five stages: embryonic period, glandular period, tubular period, saccular period and alveolar period. Due to preterm birth, the lung development of low-birth-weight infants is often immature, and their lung development is still in the pseudoglandular period, where distal primitive epithelial cells cannot continue to branch to form conducting departments, and subsequently cannot enter the pulmonary alveolar differentiation period, hindering the normal differentiation of alveolar epithelial progenitor cells into type I and type II epithelial cells. The relative deficiency of primary surfactant synthesis during this period leads to alveolar atelectasis, resulting in respiratory distress and an increased risk of BPD (16). Previous studies have found that mechanical ventilation >7 days is an independent risk factor for BPD (17). Based on the immature lung development of low birth weight infants, they often need to receive long-term oxygen therapy and mechanical ventilation assistance, which also increases the chances of lung infection and the risk of causing disease; mechanical ventilation induces inflammatory cascade reactions, damaging vascular endothelial cells and alveolar epithelial cells through excessive immune activation and inflammatory cell release of a large number of pro-inflammatory cytokines and inflammatory mediators, destroying the integrity of the alveolar-capillary barrier structure, leading to interstitial pulmonary edema, extracellular matrix reconstruction, inhibiting the maturation of alveolar type II epithelial cells, thus causing delayed alveolarization and progressive lung parenchymal damage, causing significant destructive effects in the immature lung tissue of preterm infants and becoming an important pathogenic mechanism of BPD (18,19); in addition, when mechanical ventilation is prolonged, excessive tidal volume and oxygen concentration can trigger inflammatory reactions, and continuous inflammatory reactions and high oxygen exposure can damage the immature pulmonary vessels and alveoli of the children, thus causing BPD (9); prolonged mechanical ventilation also indicates that preterm infants require repeated endotracheal intubation, which is highly likely to cause lung infections and respiratory tract damage in preterm infants, exacerbating the occurrence of BPD. Classic BPD mostly occurs in RDS children, mainly due to the lack of PS, showing progressively worsening respiratory difficulties in early life. The need for mechanical ventilation or oxygen therapy in preterm infants with lung maturation leads to oxidative stress injury, pulmonary vascular remodeling and pulmonary fibrosis, eventually progressing to BPD (20); while RDS in preterm infants increases the consumption of PS, coupled with prolonged mechanical ventilation, it exacerbates lung injury to some extent, making preterm infants more prone to developing BPD. The European consensus guidelines recommend that premature infants with RDS need to prioritize lung-protective ventilation strategies, shorten the duration of oxygen therapy as much as possible, and administer therapeutic PS early to promote lung maturation and reduce the incidence of BPD in premature infants (21). Furthermore, hypoxia caused by asphyxia can exacerbate lung inflammatory responses and OS damage, lead to systemic organ hypoperfusion, increase pulmonary vascular resistance, significantly decrease the efficiency of gas exchange per unit time, and cause electrolyte metabolic disturbances such as calcium and phosphorus imbalances. These further damages lung tissue, thereby inducing pulmonary vascular remodeling and adverse lung development (22,23).
This study used LASSO regression for variable screening, aiming to reduce model overfitting and enhance generalization ability. However, this approach may exclude some variables that are important in traditional clinical cognition. For example, in the univariate analysis of this study, factors such as maternal age, placental abnormalities, amniotic fluid contamination, use of PS, and duration of CPAP ventilation were significantly associated with the occurrence of BPD. However, in the LASSO regression process, these variables were compressed to coefficients of zero and did not enter the final model. This may be due to collinearity between these variables and the four strong predictive factors ultimately selected (such as mechanical ventilation duration, NRDS), or their effects being mediated by these factors. For example, the use of PS is highly correlated with the severity of NRDS. The duration of CPAP ventilation is often a part of the total duration of mechanical ventilation. From a clinical perspective, this does not mean that these factors are unimportant, but rather suggests that in the presence of more direct injury factors (such as prolonged mechanical ventilation), the marginal contribution of these upstream or indirect factors to the model is limited. Therefore, clinical doctors still need to comprehensively consider the potential impact of these excluded variables when applying this model, especially for children who have not reached the indications for mechanical ventilation.
Limitations
Although the predictive model constructed in this study has good predictive efficacy, it is still subject to some limitations. Firstly, due to the influence of the investigation time, the sample size included in this study is relatively small, and the population characteristics are singular. In addition, this study only involves cross-sectional data during the hospitalization of preterm infants, lacking dynamic data capture of preterm infant development. In the future, it is possible to dynamically assess the progress of BPD in preterm infants, and to externally verify the model through large samples and multiple centers to enhance the extrapolation of the model.
Conclusions
Both logistic regression and decision tree models have certain classification prediction value, among which the logistic regression model has better predictive ability than the decision tree model. Our study constructed a nomogram prediction model for the occurrence of BPD in preterm infants with a gestational age of ≤32 weeks through birth weight, duration of mechanical ventilation, asphyxia, and NRDS. The risk factors are easy for clinical medical staff to detect, with strong operability. The nomogram can be used to quantitatively assess the risk probability of BPD occurrence in preterm infants with a gestational age of ≤32 weeks, providing a scientific basis for early identification of BPD and the implementation of preventive intervention strategies, which helps to improve the survival quality of extremely preterm and ultra-preterm infants. This study still has certain limitations, and subsequent research is needed to continue mining data and improve the extrapolation of the model.
Acknowledgments
None.
Footnote
Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tp.amegroups.com/article/view/10.21037/tp-2026-1-0003/rc
Data Sharing Statement: Available at https://tp.amegroups.com/article/view/10.21037/tp-2026-1-0003/dss
Peer Review File: Available at https://tp.amegroups.com/article/view/10.21037/tp-2026-1-0003/prf
Funding: The study was supported by
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tp.amegroups.com/article/view/10.21037/tp-2026-1-0003/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study has been approved by the Ethics Committee of Tongji University Affiliated Obstetrics and Gynecology Hospital (ethical No. KS25352). Individual consent for this retrospective analysis was waived by the ethics committee. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Cao Y, Jiang S, Sun J, et al. Assessment of Neonatal Intensive Care Unit Practices, Morbidity, and Mortality Among Very Preterm Infants in China. JAMA Netw Open 2021;4:e2118904. [Crossref] [PubMed]
- Yin Y, Qi Y, Hong D, et al. Study on risk factors and follow-up outcome at 2 years in preterm infants with bronchopulmonary dysplasia. Chinese Journal of Evidence-Based Pediatric 2016;11:113-7.
- Rutkowska M, Hożejowski R, Helwich E, et al. Severe bronchopulmonary dysplasia - incidence and predictive factors in a prospective, multicenter study in very preterm infants with respiratory distress syndrome. J Matern Fetal Neonatal Med 2019;32:1958-64. [Crossref] [PubMed]
- Resch B, Kurath-Koller S, Eibisberger M, et al. Prematurity and the burden of influenza and respiratory syncytial virus disease. World J Pediatr 2016;12:8-18. [Crossref] [PubMed]
- Shukla VV, Ambalavanan N. Recent Advances in Bronchopulmonary Dysplasia. Indian J Pediatr 2021;88:690-5. [Crossref] [PubMed]
- Pallás Alonso C, García González P, Jimenez Moya A, et al. Follow-up protocol for newborns of birthweight less than 1500 g or less than 32 weeks gestation. An Pediatr (Barc) 2018;88:229.e1-229.e10.
- Jobe AH, Bancalari E. Bronchopulmonary dysplasia. Am J Respir Crit Care Med 2001;163:1723-9. [Crossref] [PubMed]
- Jiao LP, Zhang XZ, Yang YY, et al. Application and comparison of decision tree and Logistic regression model in analysis of factors affecting drinking water. Chinese Journal of Health Statistics 2020;37:874-877, 882.
- Gupta BK, Saha AK, Mukherjee S, et al. Minimally invasive surfactant therapy versus InSurE in preterm neonates of 28 to 34 weeks with respiratory distress syndrome on non-invasive positive pressure ventilation-a randomized controlled trial. Eur J Pediatr 2020;179:1287-93. [Crossref] [PubMed]
- Chinese Society of Pediatrics, Neonatology Group, Editorial Board of Chinese Journal of Pediatrics. Expert consensus on clinical management of premature infants with brochopulmonary dysplasia. Chinese Journal of Pediatrics 2020;58358-65.
- Romijn M, Dhiman P, Finken MJJ, et al. Prediction Models for Bronchopulmonary Dysplasia in Preterm Infants: A Systematic Review and Meta-Analysis. J Pediatr 2023;258:113370. [Crossref] [PubMed]
- Xu L, Mao L, Yuan S, et al. Establishment and evaluation of a nomogram model for predicting bronchopulmonary dysplasia in preterm infants with low birth weight. Maternal & Child Health Care of China 2023;38:1834-7.
- Wu W, Tan X, Sun D, et al. Application of Logistic regression analysis model and decision tree analysis in early warning indicators of hypertension and diabetes comorbidity. Chinese Journal of Disease Control & Prevention 2022;26:827-33.
- Xia L, Lv R, Zhao J, et al. The construction and application of a nomogram prediction model for the risk of bronchopulmonary dysplasia in extremely preterm infants. Journal of Chinese Practical Diagnosis and Therapy 2025;39:249-56.
- Yao Q, Shen QL, Huang GY, et al. Relationship between bronchopulmonary dysplasia phenotypes with high-resolution computed tomography score in early preterm infants. Front Pediatr 2022;10:935733. [Crossref] [PubMed]
- Shin JE, Yoon SJ, Lim J, et al. Pulmonary Surfactant Replacement Therapy for Respiratory Distress Syndrome in Neonates: a Nationwide Epidemiological Study in Korea. J Korean Med Sci 2020;35:e253. [Crossref] [PubMed]
- Solevåg AL, Cheung PY, Schmölzer GM. Bi-Level Noninvasive Ventilation in Neonatal Respiratory Distress Syndrome. A Systematic Review and Meta-Analysis. Neonatology 2021;118:264-73.
- Cannavò L, Perrone S, Viola V, et al. Oxidative Stress and Respiratory Diseases in Preterm Newborns. Int J Mol Sci 2021;22:12504. [Crossref] [PubMed]
- Kalikkot Thekkeveedu R, El-Saie A, Prakash V, et al. Ventilation-Induced Lung Injury (VILI) in Neonates: Evidence-Based Concepts and Lung-Protective Strategies. J Clin Med 2022;11:557. [Crossref] [PubMed]
- Welch B, Rose R, Myers J, et al. Decreasing early invasive mechanical ventilation exposure in preterm infants: a quality improvement initiative. J Perinatol 2025;45:149-56. [Crossref] [PubMed]
- Sweet DG, Carnielli V, Greisen G, et al. European Consensus Guidelines on the Management of Respiratory Distress Syndrome - 2019 Update. Neonatology 2019;115:432-50. [Crossref] [PubMed]
- Cai H, Jiang L, Liu Y, et al. Development and verification of a risk prediction model for bronchopulmonary dysplasia in very low birth weight infants. Transl Pediatr 2021;10:2533-43. [Crossref] [PubMed]
- Valenzuela-Stutman D, Marshall G, Tapia JL, et al. Bronchopulmonary dysplasia: risk prediction models for very-low- birth-weight infants. J Perinatol 2019;39:1275-81. [Crossref] [PubMed]

