Constructing a precise prediction model for screening growth hormone deficiency in children in a single laboratory setting: using the serum IGF-1-to-age ratio as a promising predictive marker
Original Article

Constructing a precise prediction model for screening growth hormone deficiency in children in a single laboratory setting: using the serum IGF-1-to-age ratio as a promising predictive marker

Shichao Qiu, Ting Zhao, Yihua Lian, Chao Liu

Department of Endocrinology, Genetics and Metabolism, Xi’an Children’s Hospital, Xi’an, China

Contributions: (I) Conception and design: C Liu; (II) Administrative support: None; (III) Provision of study materials or patients: S Qiu; (IV) Collection and assembly of data: S Qiu, T Zhao, Y Lian, C Liu; (V) Data analysis and interpretation: S Qiu; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Chao Liu, MD. Department of Endocrinology, Genetics and Metabolism, Xi’an Children’s Hospital, No. 69, Xijuyuan Road, Lianhu District, Xi’an 710002, China. Email: leo_2599@126.com.

Background: Growth hormone deficiency (GHD) requires accurate diagnosis, but the current gold standard, the growth hormone (GH) stimulation test, is highly invasive and resource-intensive, thus necessitating effective screening methods. Insulin-like growth factor 1 (IGF-1) is the most commonly utilized predictive marker for GHD. However, the consistency and accuracy of serum IGF-1 tests can be affected by patient characteristics and technical issues in blood analysis, which complicates its use as a reliable standalone marker. This study investigated the performance of the IGF-1/age ratio as a predictive marker for GHD and constructed a precise prediction model to mitigate these sources of variability in a single laboratory setting.

Methods: This cross-sectional study included 352 children aged 3 to 9 years who presented to Xi’an Children’s Hospital for short stature and underwent the GH stimulation testing, which served as the gold standard for GHD diagnosis (peak GH cutoff of 10 µg/L). Key exclusion criteria were body mass index (BMI) ≥25 kg/m2, history of GH therapy, secondary sexual characteristics, or chronic systemic diseases. Serum IGF-1 levels were measured using the IMMULITE® 2000 Immunoassay System at our single center. The predictive performance of the IGF-1/age ratio was evaluated both alone and in combination with other markers. A precise prediction model was developed using a logistic regression algorithm, with feature selection guided by the Lasso method, and its performance was assessed using area under the curve (AUC) and calibration analysis. We designed a reference heatmap to facilitate its clinical use.

Results: A total of 117 participants (33.2%) exhibited GHD based on the GH stimulation test. The GHD group exhibited significantly lower levels of IGF-1 (82.19 vs. 155.69 ng/mL, P<0.001) and IGFBP-3 (3.63 vs. 4.09 µg/mL, P<0.001) and a significantly greater bone age delay (calculated as bone age − chronological age; −1.64 vs. −1.06 years, P<0.001) compared to the non-GHD group. Among all evaluated markers, the IGF-1/age ratio demonstrated the highest predictive performance, with an AUC of 0.921 [95% confidence interval (CI): 0.894–0.947]. Combined use with other markers further improved the performance. The prediction model using insulin-like growth factor binding protein 3 (IGFBP-3), IGF-1/IGFBP-3 ratio, IGF-1/age ratio, and BMI showed good discrimination (AUC of 0.936, 95% CI: 0.913–0.960) and good calibration (Hosmer-Lemeshow test P value of 0.74).

Conclusions: Based on data from this single-center study, our results suggest that the IGF-1/age ratio is a promising predictive marker that may be used to effectively stratify GHD risk and facilitate fast screening prior to definitive GH stimulation testing.

Keywords: Growth hormone deficiency (GHD); insulin-like growth factor 1 (IGF-1); insulin-like growth factor binding protein 3 (IGFBP-3); biomarkers; prediction model


Submitted Jul 18, 2025. Accepted for publication Nov 07, 2025. Published online Dec 26, 2025.

doi: 10.21037/tp-2025-481


Highlight box

Key findings

• This study identified the insulin-like growth factor 1 (IGF-1)/age ratio as a strong predictive marker for growth hormone deficiency (GHD) in children with short stature, demonstrating high discriminative power (area under the curve =0.921).

• The study further developed a clinically applicable heatmap for the stratification of GHD risk based on IGF-1 levels and age.

What is known and what is new?

• IGF-1 is a commonly used biomarker for GHD, but its predictive accuracy is limited by biological variability and assay inconsistencies; the IGF-1 standard deviation score (SDS) is widely used but requires large population data and standardized reference ranges.

• The IGF-1/age ratio offers a practical alternative to IGF-1 SDS without the need for large datasets. It performs comparably or better in predicting GHD and is easier to implement in single-laboratory settings.

What is the implication, and what should change now?

• This study supports the IGF-1/age ratio as a reliable and accessible tool for GHD screening, especially in settings lacking standardized reference data for SDS. Clinical laboratories should consider adopting this predictive marker and establishing local calibrations for it. The proposed prediction model and heatmap can guide pre-test risk assessment, reduce unnecessary growth hormone stimulation tests and improve diagnostic efficiency.


Introduction

Short stature may be caused by a number of conditions, including growth hormone deficiency (GHD), idiopathic short stature, children born small for gestational age, Turner syndrome, and other disorders (1). Among these, GHD is a condition caused by insufficient production of growth hormone (GH) in the pituitary gland and is the most common defect in the GH/insulin-like growth factor-1 (IGF-1) axis. Accurate diagnosis of GHD in children with short stature is essential, as it may indicate the presence of concomitant pituitary hormone deficiencies or central nervous system tumors. Additionally, children with GHD generally exhibit a more favorable response to GH treatment than those with short stature from other etiologies (2).

The gold standard test for diagnosing GHD is the GH stimulation test (3), which measures the response of the hypothalamus and pituitary gland to various stimuli. However, children and their families may have low tolerance for the GH stimulation test due to the frequent blood sampling the procedure requires. Additionally, the test is associated with various adverse reactions and risks, making diagnosis challenging (4). It has been suggested that GH stimulation tests should only be performed when the results would decisively influence the decision to initiate treatment (5).

The first step in assessing the GH/IGF axis is typically to measure IGF-1 levels, as it is the most commonly used marker for screening GHD and monitoring GH therapy in clinical practice. Children with normal IGF-1 levels may not require GH stimulation tests, thus reducing risks, saving medical resources, lowering costs, and avoiding unnecessary testing (6). Other auxiliary predictive markers can also complement GH stimulation tests. Insulin-like growth factor binding protein-3 (IGFBP-3) is another useful marker for assessing abnormal GH secretion (7). According to a meta-analysis of 12 studies (8), the area under the summary receiver operating characteristic (ROC) curve for IGFBP-3 in diagnosing GHD is 0.80. The serum IGF-1 to IGFBP-3 ratio has also been proposed as a potential predictive marker for GHD (9,10).

However, the results of individual publications on these markers showed large variations and were conflicting. These discrepancies may be attributable to different diagnostic criteria, lack of agreement in IGF-1 and IGFBP-3 measurements, or differences in the study populations. Regarding the lack of agreement in measurements, a recent report that evaluated the consistency of mass spectrometry and chemiluminescence immunoassay for serum IGF-1 measurement revealed a Cohen’s Kappa statistic of 0.68, indicating substantial interobserver difference (11). The consistency and accuracy of IGF-1 tests can be affected by both patient characteristics and technical issues in blood analysis. The level of IGF-1 is influenced by various patient characteristics, including age, gender, pubertal stage, peak height velocity, and nutritional status (6), as well as physiological factors such as circadian variations in GH levels (12). Variability in IGF-1 levels has also been linked to the use of different immunoassay platforms (13), which may yield poor agreement due to factors such as variations in antibody specificity, calibration techniques, extraction methods, discrepancies in international reference standards among assay kits (14,15), and inconsistencies in results across different batches (16).

The use of IGF-1 as a predictive marker necessitates addressing the factors that influence its measurement consistency and accuracy. Variability arising from patient characteristics can be mitigated by constructing prediction models that incorporate IGF-1 along with other relevant patient factors, rather than relying solely on these markers. Additionally, challenges posed by technical variations in blood analysis can be addressed by recalibrating the constructed prediction model to align with the specific settings of the analysis. It has also been suggested that it is crucial to emphasize the need for each laboratory to establish its own assay-specific and population-standardized cut-off limits for the biomarkers (10,17).

The IGF-1 standard deviation score (SDS) is a commonly used indirect measure of the GH secretory status considering the dependency of serum IGF-1 levels on age and gender, though it is not recommended in a consensus statement on standardization of IGF-1 assays (18). However, to establish the skewness-median-coefficient of variation parameters required for SDS calculation, a large sample size is required to be tested for IGF-1 levels, making it not feasible to establish localized reference values in a single laboratory setting. To address this critical limitation and offer a practical alternative for local settings, we propose the IGF-1/age ratio as a surrogate marker for IGF-1 SDS. This ratio eliminates the requirement for extensive population reference data and complex parameter calculation, highlighting its potential as a reliable and practical predictive marker in clinical practice.

The current study aimed to address the factors that influence the measurement consistency and accuracy of IGF-1 by constructing a prediction model for screening GHD prior to conducting GH stimulation tests. Data were collected from a single center to create localized models, thereby minimizing variability that may arise from differences in equipment and techniques across multiple sites. By focusing on a single laboratory, we established a tailored prediction model specifically calibrated to the local setting and its unique characteristics. This design intentionally prioritizes high local precision and reliability over broad external generalizability, recognizing the substantial site-specific variability inherent in IGF-1 measurement. We present this article in accordance with the TRIPOD reporting checklist (available at https://tp.amegroups.com/article/view/10.21037/tp-2025-481/rc).


Methods

Study design

This cross-sectional observational study used data from January 2022 to January 2023 in the pediatric endocrinology department of a tertiary hospital in China, which were retrospectively collected. The study involved no interaction with subjects and no access to identifiable private information and was granted exemption by the Institutional Review Board of Xi’an Children’s Hospital. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. Individual consent for this analysis was waived due to the retrospective nature.

Children admitted for short stature were included if they were 3 to 9 years old. Children were excluded if: (I) IGF-1 or IGFBP-3 levels were not measured; (II) they were exposed to GH therapy previously; (III) body mass index (BMI) ≥25 kg/m2; (IV) they had developed secondary sexual characteristics; (V) they had special facial features often associated with genetic syndromes, atypical body shapes, skeletal deformities, or chondrodysplasia; (VI) they had chronic systemic diseases (malnutrition, heart disease, liver disease, kidney failure, or lung disease); (VII) short stature due to psychosomatic factors or familial dwarfism; or (VIII) short stature due to intrauterine growth retardation, being born small for gestational age, or hypothyroidism.

Short stature was defined as height below −2 standard deviations (SD) or the third percentile of the normal growth curve for children of the same age and gender in China (19). GHD was further diagnosed using the following criteria (20): (I) height velocity ≤5 cm/year; (II) serum GH peak <10 µg/L in two different GH stimulation tests (stimulation with arginine or clonidine); (III) delayed bone age compared to chronological age (bone age delay = bone age − chronological age); (IV) serum IGF-1 level below normal; (V) symmetric short stature and infantile facial features; and (VI) normal intellectual development.

Biochemical assays

The assays used to determine IGF-1 levels were provided by the IMMULITE® 2000 Immunoassay System (Siemens, Germany). Assays were performed according to the manufacturer’s instructions. The IGF-1 assay was calibrated, with intra-assay and inter-assay variation coefficients of 2.4–6.3% and 3.0–7.6%, respectively. The stimulation test was started at 6:00 after overnight fasting. Samples were taken at 0, 30, 60, 90, and 120 minutes. Clonidine (4 µg/kg, capped at 150 µg) and 10% arginine (0.5 g/kg, capped at 30 g) were administered as stimuli for the GH secretion test.

SDS calculation

The SDS values for height, body weight, and BMI were calculated by comparing individual measurements to age- and sex-specific reference values established in China (19,21). The SDS of serum IGF-1 was calculated based on data and method from a study in China (6), which included a sample of 3,753 children aged 1 to 19 years from 11 provinces and municipalities. The calculation utilized the skewness (L), median (M), and coefficient of variation (S) parameters using the formula below, where YIGF-1 denotes the measured serum IGF-1 level.

IGF-1SDS=(YIGF-1/M)L1L×S

Sample size and missing data

As a cross-sectional observational study, no a priori power analysis was conducted for sample size estimation. The final cohort comprised 352 pediatric patients, representing all eligible cases that met the predefined inclusion and exclusion criteria and were evaluated in the pediatric endocrinology clinic of our institution from 2022 to 2023. All variables in the study were derived from the unified export of hospital electronic medical records and laboratory information systems. Cases with missing values were excluded, and the cases included in the analysis had complete data; therefore, no missing data imputation was performed.

Statistical analysis

All statistical analyses were performed using R (version 4.1.2). Continuous variables were presented as mean ± SD and compared using unpaired t-tests. Categorical variables were presented as count (%) and compared using χ2 tests. A P value of <0.05 was used as a statistically significant threshold.

The relationships between IGF-1, IGFBP-3, and age were examined through linear regression models for both GHD and non-GHD participants. Specifically, we analyzed the following dependent-independent variable pairs: IGF-1 vs. IGFBP-3, IGF-1 vs. age, and IGFBP-3 vs. age. To compare the slopes between GHD and non-GHD groups, we combined the participants and included a categorical variable indicating diagnosis (GHD vs. non-GHD), along with an interaction term between diagnosis and the independent variable. The significance of the interaction terms was assessed to determine whether the slopes differed significantly between the two groups.

ROC analysis was conducted to evaluate the discrimination of each of the five individual predictive markers: IGF-1, IGFBP-3, IGF-1/IGFBP-3 ratio, IGF-1/age ratio, and IGF-1 SDS. Sensitivity analysis was performed to assess the consistency of the predictive markers’ discrimination across different subgroups. Two strata were evaluated: by gender for patient characteristics and by admission period to account for potential inter-batch variability. Participants were ordered based on their admission date and were divided into three chunks of equal size, which is the admission period.

To generate the heatmap for GHD risk using IGF-1, the IGF-1 levels were categorized into groups using cut-off points of 75, 100, 125, 150, and 200 ng/mL. Children were grouped into 1-year age categories. For each IGF-1 level group, we calculated the proportion of children with GHD at each age. The cut-off values for risk stratification were determined based on the principle of the pyramid. A risk group was assigned to each IGF-1 level group at each age: 0–20% indicating low risk, 21–75% indicating moderate risk, 76–95% indicating high risk, and 96% to 100% indicating very high risk. Using age as the x-axis and IGF-1 level group as the y-axis, each cell in the heatmap represents a group of children jointly determined by age and IGF-1 level. The color in each cell denotes the risk group.

To evaluate the combined use of the predictive markers, we generated all possible combinations of the markers, excluding instances where both the IGF-1/age ratio and IGF-1 SDS were included simultaneously. We assessed the IGF-1/age ratio as an alternative marker to IGF-1 SDS, opting to preclude their concurrent use to avoid redundancy. For each combination, a logistic regression model was fitted, after which the area under the curve (AUC) from the ROC analysis was used to evaluate the performance.

To construct the prediction model, lasso regression was conducted for feature selection due to its ability to prevent overfitting by constraining the model complexity and its ability to effectively handle multicollinearity by selecting one variable from a group of correlated predictors. From the variables IGF-1, IGFBP-3, IGF-1/IGFBP-3 ratio, IGF-1/age ratio, height SDS, BMI, and gender, four features were selected: IGFBP-3, IGF-1/IGFBP-3 ratio, IGF-1/age ratio, and BMI. ROC analysis was conducted to evaluate the discrimination of the model. A calibration plot with the Hosmer-Lemeshow test was used to show the calibration. The optimal cut-off point was defined by maximizing the Youden index. Using the cut-off point, model performance measures including sensitivity, specificity, positive predictive value, and negative predictive value were calculated. Sensitivity analyses were conducted to evaluate the model’s performance using alternative GH peak cut-offs of 7 µg/L and 5 µg/L for diagnosing GHD. ROC analysis was employed to assess the robustness of the model’s discriminatory ability across these varying diagnostic criteria.


Results

Baseline characteristics

Overall, 352 children were included in the analysis, with 117 (33.2%) in the GHD group and 235 (66.8%) in the non-GHD group. Baseline characteristics of the study participants are shown in Table 1, which shows statistically significant between-group differences, including IGF-1, IGFBP-3, and IGF-1/IGFBP-3 ratio.

Table 1

Baseline characteristics of the study population

Characteristics Overall (N=352) GHD (N=117) Non-GHD (N=235) P value
Age (years) 5.91±1.49 6.18±1.47 5.77±1.48 0.01
Gender 0.01
   Female 157 (44.6) 41 (35.0) 116 (49.4)
   Male 195 (55.4) 76 (65.0) 119 (50.6)
Height (m) 1.06±0.09 1.07±0.09 1.05±0.09 0.03
Height SDS −2.01±0.54 −2.07±0.54 −1.99±0.54 0.21
Body weight (kg) 17.00±3.17 17.63±3.23 16.68±3.09 0.008
Body weight SDS −1.36±0.48 −1.33±0.43 −1.37±0.50 0.48
BMI (kg/m2) 15.11±0.96 15.26±0.84 15.03±1.01 0.03
BMI SDS −0.17±0.68 −0.08±0.57 −0.22±0.72 0.07
Bone age (years) 4.66±1.50 4.54±1.50 4.72±1.50 0.30
BA delay (BA – CA) (years) −1.25±0.97 −1.64±0.74 −1.06±1.01 <0.001
Growth hormone peak (ng/mL) 10.65±5.31 6.62±2.36 12.66±5.23 <0.001
IGF-1 (ng/mL) 131.26±57.13 82.19±27.76 155.69±52.06 <0.001
IGFBP-3 (μg/mL) 3.94±1.00 3.63±0.84 4.09±1.05 <0.001
IGF-1/IGFBP-3 ratio (×1,000) 33.73±13.29 23.04±6.88 39.06±12.48 <0.001

Data are presented as n (%) or mean ± standard deviation. BA, bone age; BMI, body mass index; CA, chronological age; GHD, growth hormone deficiency; IGF-1, insulin-like growth factor 1; IGFBP-3, insulin-like growth factor binding protein 3; SDS, standard deviation score.

Varying relationships between IGF-1 and age in GHD versus non-GHD

As shown in Figure 1, relationships between variables were modeled using linear regression in GHD and non-GHD participants, respectively. The results for IGF-1 plotted against age indicated slopes of 3.9 for GHD participants and 12 for non-GHD participants, with a statistically significant difference in slope (P for interaction =0.02). For IGF-1 against IGFBP-3, the slope was 13 for GHD participants and 23 for non-GHD participants; however, this difference was not statistically significant (P for interaction =0.06). Similarly, the results for IGFBP-3 against age revealed no significant difference in slopes between GHD and non-GHD participants (P for interaction =0.42).

Figure 1 Modeling of linear relationships between variables in GHD vs. non-GHD participants: IGF-1 against IGFBP-3 (A), IGF-1 against age (B), and IGFBP-3 against age (C). GHD, growth hormone deficiency; IGF-1, insulin-like growth factor 1; IGFBP-3, insulin-like growth factor binding protein 3.

IGF-1/age ratio is a good predictive marker of GHD

We evaluated the predictive value of different indicators using ROC analysis. The ROC curves (Figure 2) indicate that the IGF-1/age ratio has the highest discrimination, with an AUC of 0.921 [95% confidence interval (CI): 0.894–0.947], followed by IGF-1 SDS with an AUC of 0.909 (95% CI: 0.880–0.938) and IGF-1, also with an AUC of 0.909 (95% CI: 0.879–0.939). Sensitivity analyses were performed to assess the consistency of the markers’ predictive value in different subgroups. The results (Figure S1) showed that the prediction discriminations were consistent among participants of different genders and admitted in different time periods.

Figure 2 Receiver operating characteristic curves of different predictive markers for predicting GHD. GHD, growth hormone deficiency; IGF-1, insulin-like growth factor 1; IGFBP-3, insulin-like growth factor binding protein 3; SDS, standard deviation score.

To facilitate the clinical use of the IGF-1/age ratio as a GHD predictive marker, we designed a reference heatmap. We categorized the participants into groups using their age and serum IGF-1 levels and calculated the proportion of GHD participants in each group. Cut-offs were set at 20%, 75%, and 95% to classify the proportions into GHD risk groups of low risk, moderate risk, high risk, and very high risk. The constructed reference heatmap is shown in Figure 3. For a short stature child, measurements of the IGF-1 can be easily mapped to the GHD risk after considering the age.

Figure 3 Reference heatmap for GHD risk. GHD, growth hormone deficiency; IGF-1, insulin-like growth factor 1.

Combined use of the predictive markers improves the discrimination

For combined use of the markers, the AUC and 95% CIs were compared and shown in Figure 4. Except for IGFBP-3 and the IGF-1/IGFBP-3 ratio when used alone, all combinations of predictors showed good discrimination (AUCs >0.90). Models including the IGF-1/age ratio as a predictor showed good discrimination, ranging from 0.921 (95% CI: 0.894–0.947) when used alone to 0.936 (95% CI: 0.912–0.959) when used in combination with IGF-1, IGFBP-3, and IGF-1/IGFBP-3 ratio. The results confirmed the predictive value of the IGF-1/age ratio. Furthermore, the combined use of the predictive markers can enhance discrimination.

Figure 4 AUCs of logistic regression models predicting GHD vs. non-GHD using different combinations of predictive markers. AUC, area under the curve; CI, confidence interval; GHD, growth hormone deficiency; IGF-1, insulin-like growth factor 1; IGFBP-3, insulin-like growth factor binding protein 3; SDS, standard deviation score.

Prediction model for GHD screening

IGFBP-3, IGF-1/IGFBP-3 ratio, IGF-1/age ratio, and BMI were selected as features for prediction using lasso regression. A logistic regression model was fit, and the parameters are shown in Table 2. Lower IGFBP-3, IGF-1/IGFBP-3 ratio, and IGF-1/age ratio were significantly associated with increased odds of GHD. A higher BMI was associated with increased odds of GHD; however, the association is not statistically significant. Of note, children with BMI ≥25 kg/m2 were excluded because GH responses to stimulation testing may be blunted (10). Based on the model, the risk of a short stature child being diagnosed with GHD can be calculated using the following formula:

P=11+e(0.8165×IGF-1/Ageratio+0.5024×IGFBP-3+0.8774×IGF-1/IGFBP-3ratio+1.3442×BMI+188.5490)

Table 2

Logistic regression model parameters

Predictors Odds ratio 95% CI P value
IGFBP-3 (μg/mL) 0.5024 (0.3195, 0.7636) 0.002
IGF-1/IGFBP-3 ratio (×1,000) 0.8774 (0.8136, 0.9402) <0.001
IGF-1/age ratio (ng/mL/year) 0.8165 (0.7402, 0.8919) <0.001
BMI (kg/m2) 1.3442 (0.9166, 1.9886) 0.13
Intercept 188.5490 (0.4424, 108,267.0811) 0.09

BMI, body mass index; CI, confidence interval; IGF-1, insulin-like growth factor 1; IGFBP-3, insulin-like growth factor binding protein 3.

The model shows good discrimination (Figure 5A; AUC of 0.936, 95% CI: 0.913–0.960) and good calibration (Figure 5B, Hosmer-Lemeshow test P value of 0.74). At a cut-off value of 0.232, the model has a sensitivity of 0.957, a specificity of 0.796, a positive predictive value of 0.700, and a negative predictive value of 0.974. Sensitivity analyses evaluating the model’s performance with alternative GH peak cut-offs revealed an AUC of 0.822 (95% CI: 0.862–0.903) for the 7 µg/L cut-off and 0.814 (95% CI: 0.866–0.917) for the 5 µg/L cut-off, indicating strong, although somewhat diminished, discriminatory ability for both thresholds.

Figure 5 Model evaluation: receiver operating characteristic curve analysis for discrimination (A) and calibration plot for calibration (B). In (A), the red lines indicate the sensitivity and specificity at the selected cut-off.

Discussion

In this study, we examined the predictive utility of the IGF-1/age ratio for GHD and developed a prediction model based on its performance in a single laboratory setting. Our results indicate that the IGF-1/age ratio is a reliable predictive marker. Furthermore, our prediction model, which incorporates the IGF-1/age ratio alongside other markers, demonstrated excellent discrimination and robust calibration.

Although IGF-1 is the most widely studied predictive marker for GHD, its clinical usefulness is not undisputed. Studies have shown inconsistent and conflicting results (8), which may be explained by the different diagnostic criteria for GHD, a lack of agreement in IGF-1 measurements, or differences in the study populations. A recent review highlighted the uncertainty in diagnosing GHD using stimulated peak GH levels (22). The authors noted that proposed diagnostic thresholds range from 5 to 10 µg/L and emphasize that none of these cut-off values are supported by strong evidence. This undermines the consistency of the gold standard across studies. The most widely accepted cut-off value is 10 µg/L, which is also recommended by clinical guidelines in China. Our study therefore adopted a cut-off value of 10 µg/L (20), which is in line with the current state of research. Unlike anthropometric measures, such as height and weight, which can be measured accurately, IGF-1 measurement is subject to methodological and biological variability (23). One study investigated the performance of the IGF-1 assay by sending the same sample to 23 centers for measurement and found a 2.5-fold variation between laboratories (14). Both the diagnosis of GHD and the measurement of IGF-1 are influenced by population characteristics. IGF-1 levels have been shown to be affected by factors such as age, gender, pubertal stage, peak height velocity, and nutrition (6). GH levels are affected by both physiological factors such as age, gender, pubertal status, and body composition, as well as pathological factors such as diabetes, renal failure, and hyperthyroidism (24). These underscores the necessity for a more precise approach to diagnosing GHD using IGF-1, both in more specific laboratory settings and in more precise cohorts. It is thus crucial to emphasize the need for each laboratory to establish its own assay-specific and population-standardized cut-off limits for biomarkers (10,17).

In clinical research, there is a constant balance between the aims of precision and generalizability. Precision is about the accuracy of study findings within specific clinical contexts. Generalization, meanwhile, is the ability to extend study findings to a wider population beyond the immediate sample. There’s a trade-off between these two goals: broadening the sample to enhance generalizability can increase variability, potentially diminishing precision. To achieve better generalization, the model should be developed in a representative cohort and extensively validated in a large population. Despite efforts, few models have achieved the objective of being generalizable with well-preserved discrimination and calibration (25). Some researchers believe that prediction models are never truly validated (26), as there may be variations in their performance across different locations, settings, and times. A paradigm of recurring local validation has been proposed to validate models while protecting against performance-disruptive data variability (27), which relies on regular and recurrent site-specific reliability tests. Instead of focusing on generalizable models, some authors suggested researchers share protocols to train and evaluate models locally (28). This paradigm, geared towards precision, addressed the importance of taking site variability into account when deploying prediction models. Following the idea, the present study presents a prediction model in a single laboratory setting.

SDS is widely used for growth monitoring. IGF-1 SDS has been used to adjust for the factors of age and gender. However, the calculation of SDS is parametric and requires reference values to be established using large population data, which is not appropriate in single laboratory settings. According to previous research (6), median IGF-1 levels show a complex relationship with age. However, the relationship is almost linear before the development of secondary sexual characteristics. Furthermore, the median IGF-1 levels were similar in girls and boys. These observations justify our belief that the IGF-1/age ratio can be used to reflect IGF-1 levels in our study population of children aged 3 to 9 years. It does not require additional parameters for calculation and is therefore easy to use. Our study suggests that the IGF-1/age ratio is a promising predictive marker for GHD with high discrimination (AUC =0.921). The difference in AUC between the IGF-1/age ratio and IGF-1 SDS was 0.0115 (95% CI: −0.0032 to 0.0263) with a P value of 0.1254. The difference between the IGF-1/age ratio and IGF-1 was 0.0117 (95% CI: −0.0134 to 0.0369) with a P value of 0.36. The results indicated that the discriminative power of IGF-1/age ratio is numerically higher but not statistically significant. Sensitivity analyses demonstrate robust discriminatory ability across different GH peak cut-offs for GHD diagnosis, with AUC values of 0.822 and 0.814 for the 7 and 5 µg/L thresholds, respectively. Calibration and accuracy were not evaluated in the sensitivity analyses, as they are significantly influenced by the incidence rate and the model’s cut-off point. Thus, recalibration may be necessary when applying the model in different clinical scenarios. Using the IGF-1/age ratio, together with IGFBP-3 and BMI, we developed our prediction model for GHD screening, which showed good discrimination and calibration, demonstrating the feasibility of developing site-specific models with limited data. This shows that the IGF-1/age ratio, whether used alone or in combination with other characteristics, is useful for GHD screening. Considering the low tolerance for the GH stimulation test, this marker is valuable as a simple blood marker for screening.

Our findings suggest that the IGF-1/age ratio serves as a robust marker, demonstrating strong discriminatory power across various diagnostic criteria for GHD. However, while our sensitivity analyses confirm the model’s robustness, it is essential to acknowledge that recalibration will be necessary for each individual site to ensure accurate application of the model. Recalibration is crucial, as the incidence of GHD and the population characteristics can vary significantly across different clinical settings. We recognize that the generalizability of our findings to other sites remains an open question. Nevertheless, retrospective validation of the IGF-1/age ratio as a diagnostic tool can be pursued without requiring additional data collection, making it a feasible option for various laboratories. In our study, we validated the idea of building a site-specific model in a precise population (short stature Chinese children aged 3 to 9 years). We believe that as time and conditions change, the model should be repeatedly and regularly calibrated and even refitted to remain useful. For laboratories that have not established their own reference standards for calculating IGF-1 SDS, we suggest an evaluation of the discriminative power of the IGF-1/age ratio and its usefulness in predicting GHD. This may contribute to its wider applicability and enhance the collective understanding of its effectiveness in diagnosing GHD across diverse populations. While further work is needed to evaluate the generalizability of our models, we believe that the combination of adaptable markers like IGF-1/age and precise, site-specific calibrations has the potential to improve upon current approaches to GHD screening.

In our study, IGFBP-3 when used alone, showed low discrimination (AUC =0.635), which is consistent with previous reports that IGFBP-3 has low sensitivity in diagnosing childhood-onset GHD and is not a good predictive marker for GHD (29).

A key clinical goal in developing this model was to safely reduce the burden of unnecessary GH stimulation tests. We quantified this utility by selecting an optimal screening threshold (cut-off =0.232). At this point, the model demonstrated a specificity of 0.796, which safely excluded 187 out of all 235 non-GHD cases from receiving the invasive test. Simultaneously, the model maintained a high sensitivity of 0.957, resulting in only 5 out of 117 GHD cases being incorrectly excluded. This combined performance strongly validates the model’s potential to minimize invasive procedures and optimize healthcare resource utilization in the diagnostic pathway.

We recognize that our study possesses several limitations. First, the retrospective and cross-sectional design makes the study susceptible to selection bias inherent in cohorts from a single tertiary center. This single-center design, although strategically chosen to maximize local precision by controlling for inter-assay IGF-1 variability, introduces an unavoidable trade-off with external generalizability and limits the model’s immediate applicability to settings with different assay platforms or patient demographics. Second, although the number of GHD cases (N=117) is statistically adequate relative to the four variables used in our final parsimonious model, the total sample size (N=352) is relatively small for large-scale prediction model development. This may affect the certainty of risk estimations at the boundaries of the predictor space and highlights the current lack of external validation. Finally, while the model’s discriminatory power (AUC) is robust across varying cutoffs, its calibration must be reassessed and potentially adjusted upon implementation in external populations, as required by the fundamental statistical property of prediction models when facing different prevalences.

These limitations highlight important avenues for future research. While the current single-center design maximized internal precision, the definitive next step requires external validation of the IGF-1/age ratio in diverse external cohorts. Specifically, further studies are necessary to evaluate whether the IGF-1/age ratio maintains its predictive performance in children outside the 3–9-year age range, and to assess its transferability to settings with different patient demographics or assay platforms. We also aim to assess the model’s performance when integrated with dynamic clinical factors such as growth velocity or additional detailed bone age delay parameters, and finally, to develop standardized recalibration protocols critical for adapting the model to settings that utilize different assay platforms.


Conclusions

Recent research has highlighted the variability in serum IGF-1 measurements due to patient characteristics and technical aspects of blood analysis. Consequently, there is a growing consensus that each laboratory should establish its own assay-specific and population-standardized reference ranges for biomarkers. In this context, the present study describes the IGF-1/age ratio as a promising predictive marker for GHD. This ratio demonstrated robust performance and consistency within a single laboratory setting. Moreover, it allows for age adjustment without requiring extensive reference data, which is typically necessary for calculating IGF-1 SDS. Using the IGF-1/age ratio, we developed a prediction model for rapid GHD screening, showing good discrimination and calibration. To optimize precision in specific clinical environments, we suggest that individual laboratories recalibrate this model to their particular settings.


Acknowledgments

None.


Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tp.amegroups.com/article/view/10.21037/tp-2025-481/rc

Data Sharing Statement: Available at https://tp.amegroups.com/article/view/10.21037/tp-2025-481/dss

Peer Review File: Available at https://tp.amegroups.com/article/view/10.21037/tp-2025-481/prf

Funding: None.

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tp.amegroups.com/article/view/10.21037/tp-2025-481/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study involved no interaction with subjects and no access to identifiable private information and was granted exemption by the Institutional Review Board of Xi’an Children’s Hospital. Individual consent for this analysis was waived due to the retrospective nature.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Collett-Solberg PF, Jorge AAL, Boguszewski MCS, et al. Growth hormone therapy in children; research and practice - A review. Growth Horm IGF Res 2019;44:20-32. [Crossref] [PubMed]
  2. Cianfarani S. From Replacement to Tailoring: Evolving Concepts in the Therapy for Short Stature. Horm Res Paediatr 2025; Epub ahead of print. [Crossref]
  3. Grimberg A, DiVall SA, Polychronakos C, et al. Guidelines for Growth Hormone and Insulin-Like Growth Factor-I Treatment in Children and Adolescents: Growth Hormone Deficiency, Idiopathic Short Stature, and Primary Insulin-Like Growth Factor-I Deficiency. Horm Res Paediatr 2016;86:361-97. [Crossref] [PubMed]
  4. Albrecht A, Penger T, Marx M, et al. Short-term adverse effects of testosterone used for priming in prepubertal boys before growth hormone stimulation test. J Pediatr Endocrinol Metab 2018;31:21-4. [Crossref] [PubMed]
  5. Yau M, Rapaport R. Growth Hormone Stimulation Testing: To Test or Not to Test? That Is One of the Questions. Front Endocrinol (Lausanne) 2022;13:902364.
  6. Cao B, Peng Y, Song W, et al. Pediatric Continuous Reference Intervals of Serum Insulin-like Growth Factor 1 Levels in a Healthy Chinese Children Population - Based on PRINCE Study. Endocr Pract 2022;28:696-702. [Crossref] [PubMed]
  7. Ranke MB. Insulin-like growth factor binding-protein-3 (IGFBP-3). Best Pract Res Clin Endocrinol Metab 2015;29:701-11. [Crossref] [PubMed]
  8. Shen Y, Zhang J, Zhao Y, et al. Diagnostic value of serum IGF-1 and IGFBP-3 in growth hormone deficiency: a systematic review with meta-analysis. Eur J Pediatr 2015;174:419-27. [Crossref] [PubMed]
  9. Giannakopoulos A, Efthymiadou A, Chrysis D. Insulin-like growth factor ternary complex components as biomarkers for the diagnosis of short stature. Eur J Endocrinol 2021;185:629-35. [Crossref] [PubMed]
  10. Haj-Ahmad LM, Mahmoud MM, Sweis NWG, et al. Serum IGF-1 to IGFBP-3 Molar Ratio: A Promising Diagnostic Tool for Growth Hormone Deficiency in Children. J Clin Endocrinol Metab 2023;108:986-94. [Crossref] [PubMed]
  11. Chen JJ, Gao XY, Cao BY, et al. Consistency evaluation of 2 methods in detecting serum insulin-like growth factorI in children. Zhonghua Er Ke Za Zhi 2022;60:781-5. [Crossref] [PubMed]
  12. Bioletto F, Varaldo E, Gasco V, et al. Central and peripheral regulation of the GH/IGF-1 axis: GHRH and beyond. Rev Endocr Metab Disord 2025;26:321-42. [Crossref] [PubMed]
  13. Moncrieffe D, Cox HD, Carletta S, et al. Inter-Laboratory Agreement of Insulin-like Growth Factor 1 Concentrations Measured Intact by Mass Spectrometry. Clin Chem 2020;66:579-86. [Crossref] [PubMed]
  14. Pokrajac A, Wark G, Ellis AR, et al. Variation in GH and IGF-I assays limits the applicability of international consensus criteria to local practice. Clin Endocrinol (Oxf) 2007;67:65-70. [Crossref] [PubMed]
  15. Huang R, Shi J, Wei R, et al. Challenges of insulin-like growth factor-1 testing. Crit Rev Clin Lab Sci 2024;61:388-403. [Crossref] [PubMed]
  16. Algeciras-Schimnich A, Bruns DE, Boyd JC, et al. Failure of current laboratory protocols to detect lot-to-lot reagent differences: findings and possible solutions. Clin Chem 2013;59:1187-94. [Crossref] [PubMed]
  17. Deng Y, Li Q, Shen H, et al. Effect of different ages of healthy children as controls on the accuracy of IGF-1 and IGFBP-3 in the diagnosis of GHD. Int J Lab Med 2020;41:2021-4.
  18. Clemmons DR. Consensus statement on the standardization and evaluation of growth hormone and insulin-like growth factor assays. Clin Chem 2011;57:555-9. [Crossref] [PubMed]
  19. Li H, Ji CY, Zong XN, et al. Height and weight standardized growth charts for Chinese children and adolescents aged 0 to 18 years. Zhonghua Er Ke Za Zhi 2009;47:487-92.
  20. Chinese Pediatric Endocrine Genetics and Metabolism Group of the Chinese Medical Association Pediatrics Branch. Diagnosis and Treatment Guidelines for Short Stature Children. Chinese Journal of Pediatrics 2008;46:428-30.
  21. Li H, Ji CY, Zong XN, et al. Body mass index growth curves for Chinese children and adolescents aged 0 to 18 years. Zhonghua Er Ke Za Zhi 2009;47:493-8.
  22. Yuen KCJ, Johannsson G, Ho KKY, et al. Diagnosis and testing for growth hormone deficiency across the ages: a global view of the accuracy, caveats, and cut-offs for diagnosis. Endocr Connect 2023;12:e220504. [Crossref] [PubMed]
  23. Kos S, Cobbaert CM, Kuijper TM, et al. IGF-1 and IGF-1 SDS - fit for purpose? Eur J Endocrinol 2019;181:L1-4. [Crossref] [PubMed]
  24. Müller EE, Locatelli V, Cocchi D. Neuroendocrine control of growth hormone secretion. Physiol Rev 1999;79:511-607. [Crossref] [PubMed]
  25. Gulati G, Upshaw J, Wessler BS, et al. Generalizability of Cardiovascular Disease Clinical Prediction Models: 158 Independent External Validations of 104 Unique Models. Circ Cardiovasc Qual Outcomes 2022;15:e008487. [Crossref] [PubMed]
  26. Van Calster B, Steyerberg EW, Wynants L, et al. There is no such thing as a validated prediction model. BMC Med 2023;21:70. [Crossref] [PubMed]
  27. Youssef A, Pencina M, Thakur A, et al. External validation of AI models in health should be replaced with recurring local validation. Nat Med 2023;29:2686-7. [Crossref] [PubMed]
  28. Miller K. Healthcare Algorithms Don’t Always Need to Be Generalizable | Stanford HAI n.d. (accessed September 22, 2023). Available online: https://hai.stanford.edu/news/healthcare-algorithms-dont-always-need-be-generalizable
  29. Tillmann V, Buckler JM, Kibirige MS, et al. Biochemical tests in the diagnosis of childhood growth hormone deficiency. J Clin Endocrinol Metab 1997;82:531-5. [Crossref] [PubMed]
Cite this article as: Qiu S, Zhao T, Lian Y, Liu C. Constructing a precise prediction model for screening growth hormone deficiency in children in a single laboratory setting: using the serum IGF-1-to-age ratio as a promising predictive marker. Transl Pediatr 2025;14(12):3231-3243. doi: 10.21037/tp-2025-481

Download Citation