Extract with care: the imperative of rigor, transparency, and reproducibility in meta-analysis
Dear Editor, I read with great interest the article “Association of serum interleukin-17 level and Mycoplasma pneumoniae pneumonia in children: a systematic review and meta-analysis” by Leerach et al. (1). The premise of the study is certainly compelling—after all, the notion that a single cytokine might help elucidate a common yet clinically challenging condition is both attractive and relevant. However, this promise also demands methodological rigor. In the following comments, I address several concerns that not only pertain to the present work but also highlight broader issues regarding the standards we apply when conducting and interpreting meta-analyses of diagnostic biomarkers. I offer these reflections with the intent of contributing constructively to future research in this evolving field.
The meta-analysis incorporates studies that rely on markedly heterogeneous diagnostic criteria for Mycoplasma pneumoniae pneumonia (MPP)—ranging from serological immunoglobulin M (IgM) or immunoglobulin G (IgG) assays to real-time polymerase chain reaction (PCR), as well as clinical and radiographic findings. In several instances, these modalities are combined without clarification as to which served as the reference standard. Although this variability is acknowledged descriptively in Tab. 1, it is not addressed analytically. The authors proceed as though all diagnostic approaches were equivalent, without stratifying by method or discussing their differing profiles. This lack of methodological consistency may inadvertently introduce uncontrolled heterogeneity, thereby weakening the interpretability of the pooled estimates. A more transparent definition—or at least a thoughtful discussion—of the reference standard would enhance the rigor and clinical applicability of the findings.
While I recognize that the included studies are predominantly exploratory and that the overall aim of the review is to assess potential biomarker associations, the manuscript frequently employs terminology suggestive of clinical utility or predictive performance—terms that typically require a different evidentiary and methodological framework. In this context, closer alignment with established standards for diagnostic test accuracy (DTA) meta-analyses—such as PRISMA-DTA and the Cochrane Handbook—would strengthen the methodological rigor and interpretability of the findings (2,3). For instance, the use of the QUADAS-2 tool, which is specifically designed for evaluating the risk of bias in diagnostic accuracy studies, would be more appropriate than the Newcastle-Ottawa Scale, which lacks validation in this domain (4). These considerations, while technical, are crucial to ensure the transparency, reliability, and proper inference of any claimed diagnostic or prognostic value.
The statistical approach adopted in the review may also blur the line between statistical significance and diagnostic utility. While the comparison of mean IL-17 levels between groups reveals statistically significant differences, this alone does not establish clinical applicability (3,5). Diagnostic accuracy requires a defined threshold and estimates of sensitivity, specificity, or area under the curve (AUC)—none of which are provided. Notably, although the title refers to an “association” between IL-17 and MPP, no statistical measure of association (e.g., odds ratio or regression analysis) is reported. If such modelling was not feasible due to data limitations, an explicit statement to that effect would have helped clarify the scope and interpretation of the findings. Importantly, a strong statistical association between a biomarker and disease status does not necessarily translate into adequate diagnostic performance; a variable may be highly associated with an outcome yet still yield poor sensitivity and specificity when used for classification purposes (6).
The approach to statistical modelling also warrants closer scrutiny. According to the authors, a random-effects model was applied only when heterogeneity exceeded an I2 of 25% and the P value from Cochran’s Q test was below 0.1—an arbitrary threshold that does not align with current meta-analytical standards, particularly in biomarker research where between-study variability is to be expected (3). More fundamentally, this strategy contradicts established guidance. As noted in the Cochrane Handbook (Section 10.10.4.1), “the choice between a fixed-effect and a random-effects meta-analysis should never be made on the basis of a statistical test for heterogeneity.” (7). Applying fixed-effect models by default in the presence of low or borderline heterogeneity risks underestimates variance and overstates precision. Furthermore, the exclusive use of the DerSimonian-Laird method (as shown in the review’s Fig. 4) is suboptimal, especially in settings with limited sample sizes or high heterogeneity—such as in the present case. More robust alternatives—such as restricted maximum likelihood (REML) with the Hartung-Knapp adjustment for confidence intervals—would have provided more reliable variance estimates and a more appropriate reflection of uncertainty in the pooled results (3,8).
The assessment of publication bias in the review also raises methodological concerns. Funnel plots and asymmetry tests are applied, despite the inclusion of fewer than ten studies in the meta-analyses (as shown in the review’s Figs. 8,9), which falls below the recommended threshold for these methods. As outlined in the Cochrane Handbook (Section 13), “As a rule of thumb, tests for funnel plot asymmetry should be used only when there are at least 10 studies included in the meta-analysis, because when there are fewer studies the power of the tests is low” (7). Applying these techniques under suboptimal conditions may obscure rather than clarify the presence of bias. Furthermore, although potential publication bias is acknowledged, it is not explored further—for instance, through a trim-and-fill analysis—resulting in a methodological inconsistency. Recognizing bias without evaluating its impact risks creating an impression of analytical rigor while leaving a key threat to validity unresolved (9).
My greatest concern, however, lies with data extraction and transparency.
The authors report that all IL-17 measurements were standardized to pg/mL, but they do not explain how this was accomplished, despite some studies presenting values in units such as pg/L or ng/mL. Without a clear and explicit conversion method, unit misalignment could seriously distort the pooled estimates and impede reproducibility.
A case in point is Chen et al. (10): the original article reports IL-17A serum levels as 2.0±0.65 pg/mL for healthy controls and 2.4±0.93 pg/mL for MPP patients, with both values expressed as mean ± standard error (SE) and a sample size of n=20 per group. However, the meta-analysis reports these same means with unclarified standard deviations (SD)—2.0±2.71 pg/mL and 2.4±3.87 pg/mL, respectively—without any methodological explanation or transparent conversion strategy. If the reported SE values were converted to SD using the standard formula (), the expected SDs would be approximately 2.91 and 4.16. The mismatch between these expected values and those reported in the meta-analysis raises concerns about potential data extraction errors or misapplication of statistical formulas. Such discrepancies can substantially impact inverse-variance weighting in the DerSimonian-Laird model, ultimately distorting the pooled effect estimates.
An even more striking example comes from Zhang et al. (11). Although the original publication compares refractory and general MPP groups plus healthy controls, none of the numerical values used in the meta-analyses (Figs. 4-7) appear in any tables, figures, or supplementary data of the primary paper (11). This raises the possibility that values were imputed, misattributed, or derived without clear justification or documentation.
Additionally, in the case of Fan et al. (12), the original study reports IL-17 values (median, range in pg/mL) of 0.24 (0.2–0.27) for the control group (n=20) and 0.68 (0.52–12.43) for the MPP group (n=48). However, in Figs. 4 and 5 of the meta-analysis, the review authors report mean ± SD values of 0.24±0.02 for the control group and 3.58±2.97 for the MPP group. Reproducing these conversions using validated methods suggests that these values likely correspond to the approach proposed by Hozo et al. (13). Nevertheless, the authors of the review do not mention in the manuscript that they applied this method—or any alternative approach (e.g., Luo et al., Wan et al.)—nor do they provide any justification for doing so. More importantly, they appear to disregard one of the foundational caveats outlined by Hozo et al., who explicitly state: “Even for the skewed distributions we tested, it seems like that for a larger sample size (usually more than 25) simply replacing sample mean with the reported median is the best estimate of the sample mean. It gives assurance to meta-analysts that simple replacement of mean with medians in meta-analysis is a viable option” (13). Therefore, 0.68±2.97 would likely have been a more appropriate conversion for the MPP group than the one presented. Lastly, following the method proposed by Shi et al. (14), the original MPP data (0.68 [0.52–12.43]) are significantly skewed, and thus it is not recommended to apply normal-based transformation methods in this case.
Lastly, the degree of heterogeneity in some of the meta-analytical models is extreme (I2>99%) yet remains largely unexplored. Several potential sources of this variability are readily identifiable—for instance, the inclusion of adult populations in some studies (e.g., Wang et al.) (15), which directly contradicts the review’s own inclusion criteria (“II: conducted in children; V: age of healthy control group matched with subjects”). Specifically, although Wang et al. state in the abstract that healthy children were used as controls, the main text (research participants section) reveals that the control group consisted of adults with a mean age (SD) of 28.6 (6.81). These issues highlight the critical importance of performing full-text data extraction, ideally by at least two independent reviewers, to ensure consistent application of eligibility criteria and to reduce the risk of misclassification bias. Additionally, there is substantial heterogeneity in the diagnostic methods used to define MPP across studies. Despite these clear sources of heterogeneity, no meta-regression or subgroup analyses—aside from those concerning antibiotic treatment and MPP severity—are performed. Key factors such as assay method, study design, biological sample type, or risk of bias are not explored, even though such analyses could have yielded important insights. Addressing this heterogeneity is essential to ensure the robustness and interpretability of meta-analytic results (3). Additionally, exploratory models such as leave-one-out analysis could have contributed to assessing the potential impact of selectively excluding certain studies. For example, Figs. 4 and 5 clearly show that the study by Wang et al. (15) deviates substantially from the results reported by other studies, which would have warranted further exploration. Furthermore, all included studies were conducted in Asia (nine in China and one in South Korea), which may affect the generalizability of the findings. Differences in healthcare systems, diagnostic protocols, population genetics, and environmental exposures could significantly limit the applicability of these findings to other clinical contexts.
In short, while the review addresses a timely and clinically meaningful question, its methodological execution presents several limitations that undermine the trustworthiness and interpretability of its conclusions. This case highlights how easily secondary data synthesis can stray from the foundations of primary research and how crucial it is to maintain that connection through rigorous methodology, transparent reporting, and safeguards for reproducibility. That said, the authors’ efforts are acknowledged, and the topic they have chosen—highlighting the role of IL-17 in pediatric MPP—is of unquestionable clinical relevance. The comments presented here are offered in a constructive spirit to strengthen future research practices in pediatric meta-analyses and prevent the premature clinical adoption of conclusions not fully supported by the underlying evidence. Upholding high methodological standards will ultimately serve to enhance the reliability, applicability, and impact of evidence synthesis in this important and evolving field.
Acknowledgments
None.
Footnote
Provenance and Peer Review: This article was a standard submission to the journal. The article did not undergo external peer review.
Funding: None.
Conflicts of Interest: The author has completed the ICMJE uniform disclosure form (available at https://tp.amegroups.com/article/view/10.21037/tp-2025-337/coif). The author has no conflicts of interest to declare.
Ethical Statement: The author is accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Leerach N, Sitthisak S, Kitti T, et al. Association of serum interleukin-17 level and Mycoplasma pneumoniae pneumonia in children: a systematic review and meta-analysis. Transl Pediatr 2024;13:1588-99. [Crossref] [PubMed]
- McInnes MDF, Moher D, Thombs BD, et al. Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies: The PRISMA-DTA Statement. JAMA 2018;319:388-96. [Crossref] [PubMed]
- Deeks JJ, Bossuyt PM, Leeflang MM, editors. Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy. 1st ed. Chichester (UK): John Wiley & Sons; 2023.
- Whiting PF, Rutjes AW, Westwood ME, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med 2011;155:529-36. [Crossref] [PubMed]
- Reitsma JB, Glas AS, Rutjes AW, et al. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J Clin Epidemiol 2005;58:982-90. [Crossref] [PubMed]
- Pepe MS, Janes H, Longton G, et al. Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. Am J Epidemiol 2004;159:882-90. [Crossref] [PubMed]
- Higgins JPT, Thomas J, Chandler J, editors. Cochrane Handbook for Systematic Reviews of Interventions. Version 6.5 (updated February 2024). Cochrane; 2024. Available online: https://www.cochrane.org/authors/handbooks-and-manuals/handbook
- IntHout J. Ioannidis JP, Borm GF. The Hartung-Knapp-Sidik-Jonkman method for random effects meta-analysis is straightforward and considerably outperforms the standard DerSimonian-Laird method. BMC Med Res Methodol 2014;14:25. [Crossref] [PubMed]
- Shi L, Lin L. The trim-and-fill method for publication bias: practical guidelines and recommendations based on a large database of meta-analyses. Medicine (Baltimore) 2019;98:e15987. [Crossref] [PubMed]
- Chen X, Liu F, Zheng B, et al. Exhausted and Apoptotic BALF T Cells in Proinflammatory Airway Milieu at Acute Phase of Severe Mycoplasma Pneumoniae Pneumonia in Children. Front Immunol 2021;12:760488. [Crossref] [PubMed]
- Zhang WH, Zhou MP, Zou YY, et al. The predictive values of soluble B7-DC in children with refractory mycoplasma pneumoniae pneumonia. Transl Pediatr 2023;12:396-404. [Crossref] [PubMed]
- Fan H, Lu B, Yang D, et al. Distribution and Expression of IL-17 and Related Cytokines in Children with Mycoplasma pneumoniae Pneumonia. Jpn J Infect Dis 2019;72:387-93. [Crossref] [PubMed]
- Hozo SP, Djulbegovic B, Hozo I. Estimating the mean and variance from the median, range, and the size of a sample. BMC Med Res Methodol 2005;5:13. [Crossref] [PubMed]
- Shi J, Luo D, Wan X, et al. Detecting the skewness of data from the five-number summary and its application in meta-analysis. Stat Methods Med Res 2023;32:1338-60. [Crossref] [PubMed]
- Wang Z, Bao H, Liu Y, Wang Y, Qin J, Yang L. Interleukin-23 derived from CD16+ monocytes drives IL-17 secretion by TLR4 pathway in children with mycoplasma pneumoniae pneumonia. Life Sci 2020;258:118149. [Crossref] [PubMed]