Comprehensive analysis of a lipid metabolism-related gene signature for ulcerative colitis
Highlight box
Key findings
• Identification of lipid metabolism-related genes (LMGs): seven hub LMGs (MTMR2, ABCD3, IMPA1, NR3C2, ETNK1, ACADSB, and MINPP1) were found to be strongly correlated with ulcerative colitis (UC) progression.
• Diagnostic biomarkers: (I) machine learning identified five key genes (i.e., ABCD3, NR3C2, CD38, ALOX15, and PIGN); (II) ABCD3 [area under the curve (AUC) =0.9185] and NR3C2 (AUC =0.9025) demonstrated excellent diagnostic performance.
• Immune dysregulation in UC: immune infiltration and single-cell RNA-sequencing (scRNA-seq) revealed increased T cells and inflammatory populations linked to lipid metabolism and pro-inflammatory pathways (TNF/NF-κB).
What is known, and what is new?
• Previous studies have identified biomarkers but have not focused on LMGs. Machine learning has been used in UC research but not to identify LMGs. We identified ABCD3 and NR3C2 as novel diagnostic biomarkers with high accuracy. We linked lipid metabolism dysregulation to immune cell infiltration and the inflammatory response. We showed the utility of integrative bioinformatics for UC biomarker discovery.
What is the implication, and what should change now?
• ABCD3 and NR3C2 could serve as non-invasive diagnostic biomarkers for UC. Targeting lipid metabolism pathways (e.g., fatty acid oxidation) may offer new therapeutic strategies.
• Multi-omics validation in larger cohorts and experimental validation in UC is needed. Mechanistic studies need to be conducted on how lipid metabolism influences immune dysregulation. Diagnostic panels based on these LMGs need to be developed. Research on the use of lipid-modulating drugs [e.g., peroxisome proliferator activated receptor gamma (PPARγ) agonists] in the treatment of UC needs to be conducted.
Introduction
Ulcerative colitis (UC) is a chronic inflammatory disease of the intestine with a high incidence rate, and represents a substantial global health burden (1). UC is associated with an elevated risk of developing colorectal cancer, and can significantly affect patients’ quality of life (2). Despite ongoing research, the pathophysiology of UC remains largely unclear. Multiple factors have been suggested to contribute to the pathogenesis of UC, including genetic susceptibility, dysfunctional immune responses, environmental influences, inflammation, oxidative stress, and abnormalities in lipid metabolism (3-5). The complex intestinal microenvironment further complicates the diagnosis and treatment of UC (6). Therefore, reliable diagnostic biomarkers need to be identified and the underlying etiology of UC needs to be elucidated to identify which patients might benefit from clinical interventions.
Abnormal lipid metabolism is closely linked to the onset of various inflammatory diseases, including UC. Studies have shown that the inhibition of fatty acid binding protein 5 (FABP5) exerts anti-inflammatory effects in UC, primarily by reducing the infiltration of inflammatory macrophages (7). A high-fat diet has been associated with the induction of inflammation in UC (8). Phosphatidylcholine metabolism disruption has been implicated in the progression of UC (9). Improved lipid metabolism in colonic contents has been shown to result in better clinical outcomes in obese mice with UC (5). In addition, secondary bile acids and short-chain fatty acids are known to regulate colonic cell inflammation and proliferation (10).
In UC, chronic inflammation affects all aspects of the colon, typically beginning with mucosal inflammation in the rectum, and progressing proximally in a continuous manner (11). Reducing inflammation and enhancing the intestinal barrier function are also critical in ameliorating UC symptoms (12). Histological improvements in patients treated with mirikizumab have been correlated with more favorable UC outcomes (13). Previous studies have suggested several biomarkers for UC, such as plasma calprotectin as an indicator of disease activity (14), the systemic immune-inflammation index as a potential diagnostic and disease-monitoring tool (15), and prostaglandin E-major urinary metabolite as a surrogate marker of disease activity (16). However, few studies have specifically explored lipid metabolism-related gene (LMG) signatures as potential biomarkers in UC.
In the present study, we identified and validated novel signature genes using weighted gene co-expression network analysis (WGCNA), machine-learning algorithms, receiver operating characteristic (ROC) curve analysis, and single-cell RNA sequencing (scRNA-seq). Our findings highlight the significant role of LMGs in the pathogenesis of UC, and offer a preliminary framework for combination therapies targeting active UC. We present this article in accordance with the TRIPOD reporting checklist (available at https://tp.amegroups.com/article/view/10.21037/tp-2025-161/rc).
Methods
Data sources
The GSE126124 dataset (17), containing gene expression profiles, was obtained from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/gds), and used as the training dataset in this study. The dataset comprised 18 UC blood samples, 39 control blood samples, 18 UC tissue samples, and 21 control tissue samples. The GSE92415 (18) and GSE87466 (19) datasets were used as validation cohorts. The GSE162335 dataset (20) was used for the scRNA-seq analysis.
LMGs were retrieved from the Molecular Signature Database (MsigDB, https://www.gsea-msigdb.org/gsea/msigdb) using the following keyword categories: “lipid”, “lipid metabolism”, “metabolism of lipid”, “fat metabolism”, “fatty acid metabolism”, and “metabolism of fat”. A total of 744 LMGs were identified and are listed in table available at https://cdn.amegroups.cn/static/public/tp-2025-161-1.xlsx. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.
WGCNA
To construct a weighted correlation network of the prognostic genes, the WGCNA package in R software was used to assess the co-expression relationships between all the genes in the adjacency matrix using Pearson correlation coefficients. The co-expression similarity matrix (Sij) was calculated as the Pearson correlation between each gene pair (xi and yj), and the formula was expressed as follows: Sij=|cor(xi,yj)|. A weighted adjacency matrix (aij) was calculated by raising Sij to a soft-thresholding power β=20, according to the formula: aij=Sβij. After computing the topological overlap matrix, hierarchical clustering was performed on the adjacency matrix with the minimum module size set to 60. Functional modules were identified based on the topological overlap within and between the modules. Gene significance (GS), which was defined as the correlation between each gene and the clinical traits (GS >0.4), and module membership (MM), which was defined as the correlation between each gene and its corresponding module (MM >0.8), were used to identify the hub genes in each module (21).
Immune infiltration analysis
To investigate immune cell infiltration in UC, the cell infiltration by estimation of stromal and immune cells in cancer tissues (CIBERSORT) algorithm was applied to estimate the proportions of 22 immune cell types in each sample using the LM22 leukocyte gene signature with 1,000 permutations. Additionally, the xCell algorithm was used to quantify immune infiltration in the UC microenvironment using the single-sample gene set enrichment analysis method. This approach ranks gene expression levels and estimates immune cell composition. A Spearman correlation analysis was performed to evaluate the association between the identified LMGs and immune cell infiltration. Visualization was conducted using boxplots and heatmaps generated using the “ggplot2” R package.
Screening for novel biomarkers
The gene expression data from the GSE126124 dataset (comprising 18 UC blood samples, 39 control blood samples, 18 UC tissue samples, and 21 control tissue samples) were subjected to data correction. The differentially expressed genes (DEGs) between the UC and control groups were identified based on the following criteria: P<0.05 and |log2 fold change (FC)| >1.5 (which was equivalent to a 2.8-FC). The overlapping genes among the LMGs and DEGs from the blood samples, and the DEGs from the tissue samples were selected for further analysis.
Two machine-learning approaches were used to further screen the key LMGs. The least absolute shrinkage and selection operator (LASSO) logistic regression model was applied using the “glmnet” R package with the minimum lambda criterion for optimal feature selection. The random-forest analysis was performed using the “randomForest” R package to evaluate gene importance. The ROC curve analysis was conducted to assess the diagnostic value of the selected genes, with the area under the curve (AUC) representing diagnostic performance. The final novel genes were identified by intersecting the results using a Venn diagram. GSE92415 and GSE87466 were used to perform the validation of machine-learning models.
Function enrichment analysis
A Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis was conducted using Metascape (http://metascape.org/) with a significance threshold of P<0.05. This analysis provided insight into the functional roles of the DEGs in UC. The “ggplot2” R package was used to visualize the results.
Acquisition and processing of single-cell analysis
To further explore the function of the novel LMGs, scRNA-seq data from the GSE162335 dataset were analyzed. After quality control and filtering, 55,931 cells expressing 24,442 genes were retained using the “Seurat” R package. Dimensionality reduction was performed using principal component analysis and uniform manifold approximation and projection (UMAP) to identify distinct cell clusters. A total of 17 cellular subpopulations were identified with a resolution parameter of 0.3. The cell clusters were annotated based on cell-type-specific markers from CellMarker (http://117.50.127.228/CellMarker/), PanglaoDB (https://panglaodb.se/), and previous literature (3,22). Violin and dot plots were generated using the “ggplot2” R package. The correlation analysis of gene and cell function was performed using the “AUCell” R package. The “AUCell” package scored individual cells using the AUC to calculate the enrichment rank of the target genes in the cell signature. We employed AUCell analysis to assess cellular activity, with a primary focus on two key pathways, including lipid metabolism and the inflammatory response using the scRNA-seq data.
Statistical analysis
All the data are presented as the mean ± standard deviation (SD). The statistical analyses were performed using SAS version 9.4. One-way analysis of variance was applied for group comparisons. Survival analyses were conducted using Kaplan-Meier curves with log-rank tests. The categorical variables were analyzed using the chi-square test. The DEGs between two groups were identified using the Wilcoxon test for continuous variables. Comparisons among multiple groups with equal variances were further evaluated using Kruskal-Wallis test with Dunn’s post-hoc test. A P value <0.05 was considered statistically significant.
Results
LMGs can be considered potential markers for UC
The flowchart of this study is presented in Figure 1A. We retrieved the RNA-sequencing data from a total of 96 UC-related samples, including 57 blood samples (18 UC blood samples and 39 control blood samples) and 39 tissue samples (18 UC tissue samples and 21 control tissue samples). The DEGs were identified based on the following criteria: P<0.05 and |log2FC| ≥1.5. A total of 2,769 DEGs were identified in the tissue samples, of which 1,148 were upregulated and 1,621 were downregulated (Figure 1B and table available at https://cdn.amegroups.cn/static/public/tp-2025-161-2.xlsx). In the blood samples, 1,805 DEGs were detected, of which 665 were upregulated and 1,140 were downregulated (Figure 1C and table available at https://cdn.amegroups.cn/static/public/tp-2025-161-3.xlsx). Among the tissue DEGs, blood DEGs, and LMGs, 16 overlapping genes were identified (as shown in the Venn diagram in Figure 1D). The gene expression heatmap of these 16 novel LMGs revealed that ALOX15, ACSL1, HK2, CD38, FCER1G, and HK3 were highly expressed in UC, while the remaining 10 genes were lowly expressed (Figure 1E). The KEGG analysis showed that the dysregulated LMGs were enriched in pathways such as fatty acid metabolism, natural killer cell-mediated cytotoxicity, calcium signaling, and adipocytokine signaling pathways (Figure 1F).
Co-expression network of the LMGs
The co-expression network was constructed using the “WGCNA” R package. A soft-thresholding power of 20 was selected for dynamic module identification (Figure 2A). Based on this threshold, a total of nine co-expression modules were identified (Figure 2B). The clustering of module eigengenes is shown in Figure 2C, and the number of genes in each module is presented in Figure 2D. The black and red modules were significantly positively correlated with UC progression, while the blue, brown, green, grey, turquoise, and yellow modules were negatively correlated with UC (Figure 2E). Subsequently, a Pearson correlation analysis was performed to examine the relationship between GS and MM. The results revealed significant correlations in the brown module (cor =0.85, P<0.001; Figure 2F), blue module (cor =1.00, P<0.001; Figure 2G), green module (cor =0.85, P<0.001; Figure 2H), and yellow module (cor =0.73, P<0.001; Figure 2I). Ultimately, seven overlapping genes were identified by intersecting the genes from these four modules with the 16 novel LMGs for subsequent analysis (Figure 2J).
Investigation of immunity factors
To further investigate the extent of immune cell infiltration in UC, the CIBERSORT algorithm was applied using the “CIBERSORT” R package. The results indicated that eight types of immune cells (i.e., naïve B cells, plasma cells, CD8 T cells, regulatory T cells, macrophages, resting dendritic cells, resting mast cells, and activated mast cells) exhibited significantly different abundances between the control and UC groups (Figure 3A). A correlation analysis was also conducted between the immune cell populations and key genes, including MTMR2 (Figure 3B), ABCD3 (Figure 3C), IMPA1 (Figure 3D), NR3C2 (Figure 3E), ETNK1 (Figure 3F), ACADSB (Figure 3G), and MINPP1 (Figure 3H). Among them, ABCD3 showed the strongest correlation with immune cells, followed by ETNK1 and NR3C2. The complete results, including the non-significant associations, are set out in Table S1.
To further validate the immune-related findings, the xCell algorithm was used to quantify the immune microenvironment in UC, which was a machine learning-based deconvolution method that infers cell-type abundances from transcriptomic data. The immune cell abundances for the control (Figure 4A) and UC groups (Figure 4B) were compared, and 16 immune-related pathways were found to differ significantly between the two groups. Various immune cells, including astrocytes, CD4 effector memory T cells, dendritic cells, epithelial cells, fibroblasts, macrophages, monocytes, neutrophils, natural killer T cells, plasma cells, smooth muscle cells, T-helper type 2 cells, preadipocytes, and mesenchymal stem cells, exhibited elevated abundance in the UC samples (Figure 4C). Finally, the correlations between gene expression and immune cell abundance are presented in Figure 4D. Notably, the CD8 naïve T cells and natural killer cells showed positive correlations with the expression levels of the seven identified genes (Figure 4D).
Screening hub genes by machine-learning algorithms and ROC curve analysis
To identify the potential hub LMGs relevant to UC, two machine-learning algorithms were applied. LASSO regression was performed on the 16 candidate LMGs using the “glmnet” R package. A minimum lambda value of 0.009229 was determined [log(λ) =−4.685; Figure 5A]. The LASSO model had an AUC value of 0.958, indicating strong predictive performance (Figure 5B). Under the optimal lambda criterion (λ=0.009229), 11 key genes were identified by LASSO. Subsequently, a random-forest algorithm was applied to the same 16 LMGs. This model had an AUC of 0.858 (Figure 5C,5D). The genes were ranked based on their importance scores (median importance =2.2506), and eight genes were identified as key genes. By intersecting the results of both machine-learning approaches, five consensus hub genes (i.e., NR3C2, ABCD3, CD38, ALOX15, and PIGN) were selected for further analysis (Figure 5E,5F).
To validate the classification performance and gene expression patterns, we tested both machine-learning models using an external dataset (GSE87466). The AUC of the random-forest model was 0.783, while that of the LASSO model was 0.925 (Figure S1A-S1D). The expression patterns of the five hub LMGs were consistent with those in the training dataset; that is, ABCD3, NR3C2, and PIGN were significantly downregulated in the UC group, while CD38 and ALOX15 were upregulated (Figure S1E-S1I). Another dataset, GSE92415, was also used to validate the models and gene expression. The LASSO model again showed superior diagnostic performance (AUC =0.923) compared to the random-forest model (AUC =0.831) (Figure S2A-S2D). In this validation set, consistent with the findings from the training set, ABCD3 and NR3C2 were downregulated (Figure S2E,S2F), while CD38 was upregulated (Figure S2G) in UC. The expression level of ALOX15 did not differ significantly between the UC patients and controls (P=0.19; Figure S2H). PIGN demonstrated significantly lower expression in the UC group than the control group (P=0.02; Figure S2I).
To further assess the diagnostic potential of the LMGs, ROC curve analyses were performed using the “pROC” R package. Among the five hub genes, ABCD3 had the highest diagnostic value (AUC =0.9185; Figure 6A). The remaining genes had the following diagnostic values: ACADSB: AUC =0.8508 (Figure 6B), ACSL1: AUC =0.7048 (Figure 6C), ADPRM: AUC =0.8155 (Figure 6D), ALOX15: AUC =0.7439 (Figure 6E), CD38: AUC =0.8648 (Figure 6F), ETNK1: AUC =0.8401 (Figure 6G), FCER1G: AUC =0.7918 (Figure 6H), HK2: AUC =0.7857 (Figure 6I), HK3: AUC =0.8654 (Figure 6J), IMPA1: AUC =0.8142 (Figure 6K), MINPP1: AUC =0.8106 (Figure 6L), MTMR2: AUC =0.8879 (Figure 6M), NR3C2: AUC =0.9025 (Figure 6N), PGAP1: AUC =0.8558 (Figure 6O), and PIGN: AUC =0.8661 (Figure 6P). In terms of clinical diagnostics, the AUC value indicates a meaningful balance between sensitivity and specificity, while an AUC >0.8 indicates excellent discriminative ability, and an AUC >0.9 indicates outstanding diagnostic performance.
Enrichment of the hub LMGs in UC at the single-cell level
To further investigate the biological relevance of the hub LMGs and understand their cell-type-specific functions, scRNA-seq data from the GSE162335 dataset, including both the control and UC samples, were analyzed. The cell types were annotated using canonical markers referenced from CellMarker database, PanglaoDB database, and previous studies (Figure 7A). The total cell number in the UC and control samples was assessed, and the UC samples were significantly enriched in immune cells, including B cells, CD8 T cells, and regulatory T cells. An increase in inflammatory and neutrophil cell populations was also observed in the UC samples compared to the control samples (Figure 7B). Figure 7C shows the expression of the markers in the differential cell subtypes. Among the five hub LMGs, CD38 was highly expressed in dendritic cells, naïve B cells, and mast cells; while NR3C2, IMPA1, and ETNK1 were primarily enriched in CD8 T cells; and ABCD3, IMPA1, ETNK1, and CD38 were highly expressed in inflammatory cells (Figure 7D). A further evaluation of immune and inflammatory cell abundance between the UC and control groups was conducted (Figure 8A). The Z-score analysis by cell subtype showed that T cells and inflammatory cells were significantly enriched in the UC samples (Figure 8B). The Chi-square tests confirmed the significantly higher numbers of inflammatory-related cells (P<0.001; Figure 8C), T cells (P=0.04; Figure 8D), and B cells and plasma cells (P=0.01; Figure 8E) in the UC samples.
To elucidate the functional interplay between the cellular phenotypes and molecular signatures in the pathogenesis of UC, we performed an in-depth analysis of the single-cell data using the “AUCell” R package, a robust computational framework for gene set activity quantification. Our approach specifically focused on the following two critical pathways: (I) lipid metabolism; and (II) the inflammatory response. In the lipid metabolism pathway (Figure S3A,S3B), goblet cells (mean activity ± SD =0.0282±0.0101) neutrophils (mean activity ± SD =0.0280±0.0095), and epithelial cells (mean activity ± SD =0.0279±0.0963, Figure S3C) exhibited significantly higher lipid metabolic activity compared to the other populations (P<0.001, Kruskal-Wallis test with Dunn’s post-hoc test), particularly in genes regulating fatty acid oxidation (CPT1A; Figure S3D) and cholesterol biosynthesis (HMGCR; Figure S3D). In the inflammatory response pathway (Figure S3E,S3F), the fibroblast cells showed the highest cell activity in pro-inflammatory signaling (Figure S3G), including TNF signaling, nuclear factor kappa beta (NF-κB) signaling. and cytokine signaling (Figure S3H-S3J). The fibroblast cells (mean activity ± SD =0.0777±0.0042) showed significant enrichment in TNF signaling; and the inflammatory cells (mean activity ± SD = 0.1071±0.0071) showed significant enrichment in NF-κB signaling, as did the neutrophils (mean activity ± SD =0.0680±0.0053); while the mast cells showed enrichment in cytokine signaling (mean activity ± SD =0.0859±0.0138) (Table S2).
The expression of the signaling genes was examined in the scRNA-seq data, including TNF signaling (Figure S3K), NF-κB signaling (Figure S3L), and cytokine signaling (Figure S3M). Additionally, the expression of the five novel genes was validated in the scRNA-seq data. Notably, CD38 was more highly expressed in multiple cells (Figure S3N). The correlation results revealed a positive correlation between TNF signaling and CD38 (cor =0.0244, P=0.01, Figure S4A). However, no significant difference was found between the lipid metabolism scores and novel genes, including ABCD3 (P=0.51; Figure S4B), ALOX15 (P=0.46; Figure S4C), and CD38 (P=0.94; Figure S4D), NR3C2 (P=0.45, Figure S4E), and PIGN (P=0.92; Figure S4F).
Discussion
Due to the dysregulation of lipid metabolism pathways in UC, multiple mediators cannot be properly regulated and homeostasis cannot be maintained, which in turn triggers pathological processes such as inflammatory cascades, apoptotic signaling, and immune responses (23). Studies have shown that lipid metabolism plays a central role in the differentiation and function of T lymphocytes, contributing to the sustained activation of immune responses and the persistence of adaptive immune responses over time (24). Targeting lipid metabolism could thus offer a promising avenue for the development of novel therapies for active UC. Moreover, a low diagnostic rate often delays remission in the early stages of UC (25). Thus, the identification of effective and diagnostic lipid metabolism-related biomarkers could greatly assist in monitoring the progression of UC and preventing disease exacerbation. In this study, we used comprehensive, integrative bioinformatics approaches to screen potential LMGs and conducted a correlation analysis to evaluate their synergistic effects.
Lipid metabolism plays a crucial role in UC through multiple mechanisms, including immune modulation, inflammatory responses, and microbial interactions. Research has reported that sphingosine-1-phosphate in sphingolipid metabolism triggers pro-inflammatory macrophages and induces T-helper 17 cell differentiation in UC (26). Cholesterol-25-hydroxylase in the lipid metabolism pathway helps to regulate the inflammatory response, promoting cell migration in various types of immune cells by binding to relevant receptors in UC (27). Although cholesterol-25-hydroxylase could regulate the cholesterol metabolism in a tissue-specific fashion, little is known about whether cholesterol-25-hydroxylase can help regulate the inflammatory responses in UC patients by affecting the bile acid pathway. The oleoylethanolamide anti-inflammatory effects are mediated by the TLR4 axis, which causes the downstream inhibition of NF-κB-MyD88-dependent and NLRP3 inflammation pathways (28). Desterke et al. examined the metabolic score to determine the implications of heterogeneous metabolic pathway deregulations in the digestive tract of patients with UC; however, their study was limited by its small cohort (24).
In quiescent UC, endoscopic wound scores were shown to prolong the wound healing process due to altered disease-specific lipidomic trajectories (29). However, it is important to note that the applied wound healing assay does not mimic actual inflammation during regular flare ups of UC. Further, the dysregulation of lipid metabolism drastically affects the physiology of the host and microorganisms in inflammatory bowel disease (IBD) (30). The drug treatment significantly reduced the relative abundances of Helicobacter and Streptococcus, while promoting the proliferation of Limosilactobacillus and Akkermansia. could improve the symptoms of UC (23). Thus, the intimate interactions of intestinal lipids with host cells that have been implicated in the pathogenesis of intestinal inflammation might aid in the identification of novel biomarkers and therapeutic targets in UC.
In this study, the WGCNA and machine-learning approaches, including LASSO regression and random-forest algorithms, revealed that a total of seven hub LMGs were significantly correlated with UC progression. The subsequent ROC curve analyses showed their diagnostic utility, including their strong discriminatory ability to differentiate between the UC and control samples. Consistent with our findings, other studies have used machine learning to identify biomarkers with high diagnostic accuracy (31,32). For example, Hong et al. identified diagnostic biomarkers related to macrophage infiltration and the pathogenesis of UC using LASSO, support vector machine, and Swarm-based Feature Selection (SWSFS) algorithms (33). Machine-learning techniques have been widely used to analyze large, high-dimensional imaging datasets in UC diagnosis (34). Wang et al. selected significantly different biomarkers by identifying the model with the highest predictive accuracy among four machine-learning algorithms (35). To further explore UC-associated gene features, our study intersected genes from four WGCNA modules and identified overlapping genes. Similarly, previous studies have conducted WGCNAs for module classification and the identification of endoplasmic reticulum stress-associated genes in UC (36). The combined application of WGCNA and random-forest algorithms has also been used to identify gene signatures involved in UC exacerbation (37). These findings are consistent with our results, which identified five hub LMGs associated with UC progression through WGCNA and multiple machine-learning approaches.
Our findings suggest that NR3C2, ABCD3, CD38, ALOX15, and PIGN may serve as promising diagnostic biomarkers, and play significant roles in the pathogenesis of UC as validated by multiple machine-learning analyses. The integration of LASSO and random-forest models has notable advantages for IBD prediction and therapeutic development. The LASSO method is particularly effective in selecting sparse yet informative gene sets from high-dimensional data. Our 11-gene signature, identified via LASSO regression, achieved an AUC of 0.958, indicating its strong diagnostic potential and suitability for developing cost-effective diagnostic assays. For instance, in non-alcoholic fatty liver disease, the LASSO method was used to identify feature genes and analyze their relationship with steatohepatitis histology (38). In IBD, LASSO and similar algorithms have been used to identify common diagnostic markers (39), and to screen for key risk factors in both Crohn’s disease and UC (39).
Conversely, the random-forest model effectively captures nonlinear interactions between genes and clinical variables. Its built-in feature importance metrics (e.g., the Gini index) help prioritize biologically interpretable targets. In our study, the random-forest model ranked eight key LMGs by importance in UC. Similarly, the random-forest model has been used to analyze metabolite associations with disease progression in UC (40). Further, random-forest models can stratify patients into molecular subtypes with different drug response profiles, aiding in clinical trial design. For example, Jangi et al. trained a random-forest model using fungal abundance data and successfully classified patients with active versus quiescent UC, highlighting the importance of Candida abundance in model performance (41). This underscores the potential of the random-forest model to identify therapeutic targets in UC. Building on this framework, we intend to extend our analysis to scRNA-seq data to uncover cell-type-specific therapeutic targets, and to validate these models across multi-ethnic cohorts to enhance their generalizability.
NR3C2 encodes the mineralocorticoid receptor and mediates aldosterone’s effects on salt and water balance (42). NR3C2 has been shown to have good diagnostic efficacy (AUC =0.908) in invasive breast carcinoma (43), which aligns with our findings. In our study, the ROC curve analysis showed that NR3C2 had strong diagnostic ability (AUC =0.9025). It has also been reported that NR3C2 is significantly downregulated in colon cancer tissues compared to adjacent non-cancerous tissues, influencing tumor proliferation and invasiveness. Similarly, low expression levels of NR3C2 have been observed in NSCLC tissue samples, supporting its role as a diagnostic biomarker (44). These findings are consistent with our data, which revealed that NR3C2 was lowly expressed in both the UC tissue and blood samples (Figure 1B,1C). However, the role of NR3C2 in UC has been largely unexplored. One study showed that mice lacking NR3C2 in ILC3s exhibited reduced IL-17 expression in IBD (45).
ABCD3 is part of the ABC transporter superfamily (46). It plays a role in the peroxisomal import of fatty acids and contributes to energy metabolism regulation (47). Hepatic peroxisomal ABCD3 has been considered a marker for peroxisome proliferation (48). Recent studies showed that the ABCD3 score combined with brain computed tomography perfusion data improved the prediction of cerebral infarction (49), and had excellent diagnostic accuracy in distinguishing between subtypes of peroxisomal disorders (50). In colon cancer, ABCD3 expression was found to be significantly higher in normal tissues than adenocarcinoma tissues (51), which is consistent with our observations. CD38 encodes a non-lineage-restricted type II transmembrane glycoprotein involved in intracellular calcium signaling (52). Its expression level in B cells has been recognized as a critical factor in clinical diagnostics (53). For instance, CD38 positivity is useful in diagnosing Burkitt lymphoma (54). Active UC has been associated with impaired regulatory control in B cells, particularly with an increase in CD38+ B cells (3). Moreover, samples from IBD patients showed elevated levels of CD38+ T cells, including regulatory T cells that produce inflammatory cytokines, compared to controls (3).
ALOX15 belongs to the lipoxygenase family and acts on polyunsaturated fatty acids to produce bioactive lipid mediators (55). It has been shown to function as a diagnostic gene capable of distinguishing among subtypes in asthma (56), and as a novel marker for chronic endometritis (57). PIGN is involved in the biosynthesis of glycosylphosphatidylinositol anchors (58). The PIGN gene is essential for maintaining mitotic integrity and chromosomal stability (59).
Numerous studies have shown that signaling pathways play pivotal roles in the onset and progression of many diseases, including UC (60,61). Our findings revealed that the dysregulated genes were enriched in pathways such as fatty acid metabolism, calcium signaling, and ABC transporters. It has been reported that calcium signaling, in a Ca2+-dependent manner involving S100 proteins, contributes to inflammation in UC (62). Our scRNA-seq analysis further supports this, demonstrating increased numbers of inflammatory cells in UC samples compared with control samples. Additionally, we found that CD8+ T cells and regulatory T cells were significantly correlated with the hub LMGs (Figure 3). UC-associated CD8+ effector T cells have been shown to induce tissue damage and may adopt regulatory functions to control excessive inflammation (63). An immune infiltration analysis also confirmed that CD8+ T cells were more abundant in UC patients (64). These results provide valuable insights into the complex mechanisms by which these cells and genes contribute to the pathogenesis of UC.
We successfully identified differentially expressed hub LMGs; however, our study had some limitations. First, the analysis was based solely on bioinformatics, and potential biases might exist due to differences in microarray platforms and statistical methods. Second, research linking these hub genes to UC is limited, and their functional roles and molecular mechanisms have not yet been validated through in vivo or in vitro experiments. Therefore, further experimental studies need to be conducted to substantiate these findings.
Conclusions
We initially identified 16 LMGs that were differentially expressed in the UC tissue and blood samples, and enriched in key biological functions and signaling pathways. Using a WGCNA, we further identified seven novel LMGs (i.e., ABCD3, ACADSB, ETNK1, IMPA1, MINPP1, MTMR2, and NR3C2) with immune relevance and potential diagnostic utility. Using LASSO and random forest machine-learning algorithms, five hub genes (i.e., ABCD3, NR3C2, CD38, ALOX15, PIGN) were ultimately identified. The scRNA-seq analysis revealed elevated enrichment of inflammatory and T cell populations in the UC samples. Together, our results suggest that ABCD3 and NR3C2 play vital roles in the pathogenesis of UC and thus hold promise as therapeutic targets.
Acknowledgments
None.
Footnote
Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tp.amegroups.com/article/view/10.21037/tp-2025-161/rc
Peer Review File: Available at https://tp.amegroups.com/article/view/10.21037/tp-2025-161/prf
Funding: This work was supported by
Conflicts of Interest: Both authors have completed the ICMJE uniform disclosure form (available at https://tp.amegroups.com/article/view/10.21037/tp-2025-161/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Voelker R. What Is Ulcerative Colitis? JAMA 2024;331:716. [Crossref] [PubMed]
- Wangchuk P, Yeshi K, Loukas A. Ulcerative colitis: clinical biomarkers, therapeutic targets, and emerging treatments. Trends Pharmacol Sci 2024;45:892-903. [Crossref] [PubMed]
- Mitsialis V, Wall S, Liu P, et al. Single-Cell Analyses of Colon and Blood Reveal Distinct Immune Cell Signatures of Ulcerative Colitis and Crohn's Disease. Gastroenterology 2020;159:591-608.e10. [Crossref] [PubMed]
- Hu L, Wu S, Shu Y, et al. Impact of Maternal Smoking, Offspring Smoking, and Genetic Susceptibility on Crohn's Disease and Ulcerative Colitis. J Crohns Colitis 2024;18:671-8. [Crossref] [PubMed]
- Zhong Y, Xiao Q, Huang J, et al. Ginsenoside Rg1 Alleviates Ulcerative Colitis in Obese Mice by Regulating the Gut Microbiota-Lipid Metabolism-Th1/Th2/Th17 Cells Axis. J Agric Food Chem 2023;71:20073-91. [Crossref] [PubMed]
- Liang Y, Li Y, Lee C, et al. Ulcerative colitis: molecular insights and intervention therapy. Mol Biomed 2024;5:42. [Crossref] [PubMed]
- Xu J, Zheng B, Xie C, et al. Inhibition of FABP5 attenuates inflammatory bowel disease by modulating macrophage alternative activation. Biochem Pharmacol 2024;219:115974. [Crossref] [PubMed]
- Basson AR, Chen C, Sagl F, et al. Regulation of Intestinal Inflammation by Dietary Fats. Front Immunol 2020;11:604989. [Crossref] [PubMed]
- Yu T, Wu L, Zhang T, et al. Insights into Q-markers and molecular mechanism of Sanguisorba saponins in treating ulcerative colitis based on lipid metabolism regulation. Phytomedicine 2023;116:154870. [Crossref] [PubMed]
- Kumar SS, Fathima A, Srihari P, et al. Host-gut microbiota derived secondary metabolite mediated regulation of Wnt/β-catenin pathway: a potential therapeutic axis in IBD and CRC. Front Oncol 2024;14:1392565. [Crossref] [PubMed]
- Gros B, Kaplan GG. Ulcerative Colitis in Adults: A Review. JAMA 2023;330:951-65. [Crossref] [PubMed]
- Wang Y, Zhang J, Zhang B, et al. Modified Gegen Qinlian decoction ameliorated ulcerative colitis by attenuating inflammation and oxidative stress and enhancing intestinal barrier function in vivo and in vitro. J Ethnopharmacol 2023;313:116538. [Crossref] [PubMed]
- Magro F, Pai RK, Kobayashi T, et al. Resolving Histological Inflammation in Ulcerative Colitis With Mirikizumab in the LUCENT Induction and Maintenance Trial Programmes. J Crohns Colitis 2023;17:1457-70. [Crossref] [PubMed]
- Temido MJ, Peixinho M, Cunha R, et al. Plasma calprotectin as a biomarker of inflammatory activity in ulcerative colitis. Med Clin (Barc) 2025;164:168-72. [Crossref] [PubMed]
- Yan J, Deng F, Tan Y, et al. Systemic immune-inflammation index as a potential biomarker to monitor ulcerative colitis. Curr Med Res Opin 2023;39:1321-8. [Crossref] [PubMed]
- Hagiwara SI, Abe N, Hosoi K, et al. Utility of a rapid assay for prostaglandin E-major urinary metabolite as a biomarker in pediatric ulcerative colitis. Sci Rep 2023;13:9898. [Crossref] [PubMed]
- Zhang H, Mo Y, Wang L, et al. Potential shared pathogenic mechanisms between endometriosis and inflammatory bowel disease indicate a strong initial effect of immune factors. Front Immunol 2024;15:1339647. [Crossref] [PubMed]
- Zhu J, Wu Y, Ge X, et al. Discovery and Validation of Ferroptosis-Associated Genes of Ulcerative Colitis. J Inflamm Res 2024;17:4467-82. [Crossref] [PubMed]
- Tian L, Gao H, Yao T, et al. Interactions between NAD+ metabolism and immune cell infiltration in ulcerative colitis: subtype identification and development of novel diagnostic models. Front Immunol 2025;16:1479421. [Crossref] [PubMed]
- Devlin JC, Axelrad J, Hine AM, et al. Single-Cell Transcriptional Survey of Ileal-Anal Pouch Immune Cells From Ulcerative Colitis Patients. Gastroenterology 2021;160:1679-93. [Crossref] [PubMed]
- Zheng H, Liu H, Li H, et al. Characterization of stem cell landscape and identification of stemness-relevant prognostic gene signature to aid immunotherapy in colorectal cancer. Stem Cell Res Ther 2022;13:244. [Crossref] [PubMed]
- Saul D, Leite Barros L, Wixom AQ, et al. Cell Type-Specific Induction of Inflammation-Associated Genes in Crohn's Disease and Colorectal Cancer. Int J Mol Sci 2022;23:3082. [Crossref] [PubMed]
- Huang H, Jiang J, Fan Y, et al. Non-targeted metabolomics and pseudo-targeted lipidomics combined with gut microbes reveal the protective effects of Causonis japonica (Thunb.) Raf. in ulcerative colitis mice. Front Cell Infect Microbiol 2024;14:1397735. [Crossref] [PubMed]
- Desterke C, Fu Y, Francés R, et al. Metabolic Transcriptional Activation in Ulcerative Colitis Identified Through scRNA-seq Analysis. Genes (Basel) 2024;15:1412. [Crossref] [PubMed]
- Strande V, Lund C, Hagen M, et al. Clinical course of ulcerative colitis: Frequent use of biologics and low colectomy rate first year after diagnosis-results from the IBSEN III inception cohort. Aliment Pharmacol Ther 2024;60:357-68. [Crossref] [PubMed]
- Ma Y, Zhang X, Xuan B, et al. Disruption of CerS6-mediated sphingolipid metabolism by FTO deficiency aggravates ulcerative colitis. Gut 2024;73:268-81. [Crossref] [PubMed]
- Zhong G, He C, Wang S, et al. Research progress on the mechanism of cholesterol-25-hydroxylase in intestinal immunity. Front Immunol 2023;14:1241262. [Crossref] [PubMed]
- Lama A, Provensi G, Amoriello R, et al. The anti-inflammatory and immune-modulatory effects of OEA limit DSS-induced colitis in mice. Biomed Pharmacother 2020;129:110368. [Crossref] [PubMed]
- Bjerrum JT, Wang Y, Zhang J, et al. Lipidomic Trajectories Characterize Delayed Mucosal Wound Healing in Quiescent Ulcerative Colitis and Identify Potential Novel Therapeutic Targets. Int J Biol Sci 2022;18:1813-28. [Crossref] [PubMed]
- Kayama H, Takeda K. Emerging roles of host and microbial bioactive lipids in inflammatory bowel diseases. Eur J Immunol 2023;53:e2249866. [Crossref] [PubMed]
- Jahagirdar V, Bapaye J, Chandan S, et al. Diagnostic accuracy of convolutional neural network-based machine learning algorithms in endoscopic severity prediction of ulcerative colitis: a systematic review and meta-analysis. Gastrointest Endosc 2023;98:145-154.e8. [Crossref] [PubMed]
- Wu X, Zhang T, Zhang T, et al. The impact of gut microbiome enterotypes on ulcerative colitis: identifying key bacterial species and revealing species co-occurrence networks using machine learning. Gut Microbes 2024;16:2292254. [Crossref] [PubMed]
- Hong S, Wang H, Chan S, et al. Identifying Macrophage-Related Genes in Ulcerative Colitis Using Weighted Coexpression Network Analysis and Machine Learning. Mediators Inflamm 2023;2023:4373840. [Crossref] [PubMed]
- Kulkarni C, Liu D, Fardeen T, et al. Artificial intelligence and machine learning technologies in ulcerative colitis. Therap Adv Gastroenterol 2024;17:17562848241272001. [Crossref] [PubMed]
- Wang Z, Wang Y, Yan J, et al. Analysis of cuproptosis-related genes in Ulcerative colitis and immunological characterization based on machine learning. Front Med (Lausanne) 2023;10:1115500. [Crossref] [PubMed]
- Deng B, Liao F, Liu Y, et al. Comprehensive analysis of endoplasmic reticulum stress-associated genes signature of ulcerative colitis. Front Immunol 2023;14:1158648. [Crossref] [PubMed]
- Wang Y, Zhuang H, Jiang XH, et al. Corrigendum: Unveiling the key genes, environmental toxins, and drug exposures in modulating the severity of ulcerative colitis: a comprehensive analysis. Front Immunol 2023;14:1323997. [Crossref] [PubMed]
- Zhang Z, Wang S, Zhu Z, et al. Identification of potential feature genes in non-alcoholic fatty liver disease using bioinformatics analysis and machine learning strategies. Comput Biol Med 2023;157:106724. [Crossref] [PubMed]
- Sun HW, Zhang X, Shen CC. The shared circulating diagnostic biomarkers and molecular mechanisms of systemic lupus erythematosus and inflammatory bowel disease. Front Immunol 2024;15:1354348. [Crossref] [PubMed]
- Bourgonje AR, Ibing S, Livanos AE, et al. Distinct perturbances in metabolic pathways associate with disease progression in inflammatory bowel disease. J Crohns Colitis 2025;19:jjaf082. [Crossref] [PubMed]
- Jangi S, Hsia K, Zhao N, et al. Dynamics of the Gut Mycobiome in Patients With Ulcerative Colitis. Clin Gastroenterol Hepatol 2024;22:821-830.e7. [Crossref] [PubMed]
- Heydarpour M, Parksook WW, Pojoga LH, et al. Mineralocorticoid Receptor and Aldosterone: Interaction Between NR3C2 Genetic Variants, Sex, and Age in a Mixed Cohort. J Clin Endocrinol Metab 2024;110:e140-9. [Crossref] [PubMed]
- Lu J, Hu F, Zhou Y. NR3C2-Related Transcriptome Profile and Clinical Outcome in Invasive Breast Carcinoma. Biomed Res Int 2021;2021:9025481. [Crossref] [PubMed]
- Sun YY, Gao HC, Guo P, et al. Identification of NR3C2 as a functional diagnostic and prognostic biomarker and potential therapeutic target in non-small cell lung cancer. Cancer Innov 2024;3:e122. [Crossref] [PubMed]
- Zhao R, Hong L, Shi G, et al. Mineralocorticoid promotes intestinal inflammation through receptor dependent IL17 production in ILC3s. Int Immunopharmacol 2024;130:111678. [Crossref] [PubMed]
- Ranea-Robles P, Chen H, Stauffer B, et al. The peroxisomal transporter ABCD3 plays a major role in hepatic dicarboxylic fatty acid metabolism and lipid homeostasis. J Inherit Metab Dis 2021;44:1419-33. [Crossref] [PubMed]
- Kawaguchi K, Imanaka T. Substrate Specificity and the Direction of Transport in the ABC Transporters ABCD1-3 and ABCD4. Chem Pharm Bull (Tokyo) 2022;70:533-9. [Crossref] [PubMed]
- Mitra R, Adhikari R, Davis SS, et al. Distinct peroxisome populations differentially respond to alcohol-associated hepatic injury. Mol Biol Cell 2024;35:ar156. [Crossref] [PubMed]
- Liu S, Chen T, Wu W. Predictive value of whole-brain CT perfusion combined with ABCD3 score for short-term secondary cerebral infarction after TIA. Front Neurol 2023;14:1244014. [Crossref] [PubMed]
- Kawai H, Takashima S, Ohba A, et al. Development of a system adapted for the diagnosis and evaluation of peroxisomal disorders by measuring bile acid intermediates. Brain Dev 2023;45:58-69. [Crossref] [PubMed]
- Zhang Y, Zhang Y, Wang J, et al. Abnormal expression of ABCD3 is an independent prognostic factor for colorectal cancer. Oncol Lett 2020;19:3567-77. [Crossref] [PubMed]
- Nabar NR, Heijjer CN, Shi CS, et al. LRRK2 is required for CD38-mediated NAADP-Ca(2+) signaling and the downstream activation of TFEB (transcription factor EB) in immune cells. Autophagy 2022;18:204-22. [Crossref] [PubMed]
- Zeng F, Zhang J, Jin X, et al. Effect of CD38 on B-cell function and its role in the diagnosis and treatment of B-cell-related diseases. J Cell Physiol 2022;237:2796-807. [Crossref] [PubMed]
- Liu Y, Bian T, Zhang Y, et al. A combination of LMO2 negative and CD38 positive is useful for the diagnosis of Burkitt lymphoma. Diagn Pathol 2019;14:100. [Crossref] [PubMed]
- Yan W, Cui X, Guo T, et al. ALOX15 Aggravates Metabolic Dysfunction-Associated Steatotic Liver Disease in Mice with Type 2 Diabetes via Activating the PPARγ/CD36 Axis. Antioxid Redox Signal 2025;43:37-55. [Crossref] [PubMed]
- Ding X, Qin J, Huang F, et al. The combination of machine learning and untargeted metabolomics identifies the lipid metabolism -related gene CH25H as a potential biomarker in asthma. Inflamm Res 2023;72:1099-119. [Crossref] [PubMed]
- Oshina K, Kuroda K, Nakabayashi K, et al. Gene expression signatures associated with chronic endometritis revealed by RNA sequencing. Front Med (Lausanne) 2023;10:1185284. [Crossref] [PubMed]
- Khalifa HM, Alkayyat H, Jadah RHSH. A Case Report of a Child With Rare Phosphatidylinositol Glycan Anchor Biosynthesis Class N (PIGN) Gene Mutation With Hypotonia, Epilepsy, and Global Developmental Delay. Cureus 2025;17:e80072. [Crossref] [PubMed]
- Teye EK, Lu S, Chen F, et al. PIGN spatiotemporally regulates the spindle assembly checkpoint proteins in leukemia transformation and progression. Sci Rep 2021;11:19022. [Crossref] [PubMed]
- Li B, Wang Y, Jiang X, et al. Natural products targeting Nrf2/ARE signaling pathway in the treatment of inflammatory bowel disease. Biomed Pharmacother 2023;164:114950. [Crossref] [PubMed]
- Shen J, Cheng J, Zhu S, et al. Regulating effect of baicalin on IKK/IKB/NF-kB signaling pathway and apoptosis-related proteins in rats with ulcerative colitis. Int Immunopharmacol 2019;73:193-200. [Crossref] [PubMed]
- Yang Y, Hua Y, Zheng H, et al. Biomarkers prediction and immune landscape in ulcerative colitis: Findings based on bioinformatics and machine learning. Comput Biol Med 2024;168:107778. [Crossref] [PubMed]
- Corridoni D, Antanaviciute A, Gupta T, et al. Single-cell atlas of colonic CD8(+) T cells in ulcerative colitis. Nat Med 2020;26:1480-90. [Crossref] [PubMed]
- Huang J, Zhang J, Wang F, et al. Comprehensive analysis of cuproptosis-related genes in immune infiltration and diagnosis in ulcerative colitis. Front Immunol 2022;13:1008146. [Crossref] [PubMed]
(English Language Editor: L. Huleatt)

