Article Text
Abstract
Advancements in omics technologies and artificial intelligence (AI) methodologies are fuelling our progress towards personalised diagnosis, prognosis and treatment strategies in hepatology. This review provides a comprehensive overview of the current landscape of AI methods used for analysis of omics data in liver diseases. We present an overview of the prevalence of different omics levels across various liver diseases, as well as categorise the AI methodology used across the studies. Specifically, we highlight the predominance of transcriptomic and genomic profiling and the relatively sparse exploration of other levels such as the proteome and methylome, which represent untapped potential for novel insights. Publicly available database initiatives such as The Cancer Genome Atlas and The International Cancer Genome Consortium have paved the way for advancements in the diagnosis and treatment of hepatocellular carcinoma. However, the same availability of large omics datasets remains limited for other liver diseases. Furthermore, the application of sophisticated AI methods to handle the complexities of multiomics datasets requires substantial data to train and validate the models and faces challenges in achieving bias-free results with clinical utility. Strategies to address the paucity of data and capitalise on opportunities are discussed. Given the substantial global burden of chronic liver diseases, it is imperative that multicentre collaborations be established to generate large-scale omics data for early disease recognition and intervention. Exploring advanced AI methods is also necessary to maximise the potential of these datasets and improve early detection and personalised treatment strategies.
- CHRONIC LIVER DISEASE
- ALCOHOLIC LIVER DISEASE
- NONALCOHOLIC STEATOHEPATITIS
- ACUTE LIVER FAILURE
- HEPATOCELLULAR CARCINOMA
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
- CHRONIC LIVER DISEASE
- ALCOHOLIC LIVER DISEASE
- NONALCOHOLIC STEATOHEPATITIS
- ACUTE LIVER FAILURE
- HEPATOCELLULAR CARCINOMA
Key messages
AI methods for extracting insights from complex omics data in liver diseases are still emerging, presenting a significant opportunity for novel developments.
Large, well-curated, and diverse public datasets are crucial for developing AI methods that generalize across populations.
Open sourcing data and code, along with standardized reporting of model performance, is essential for ensuring AI model reproducibility and reliability.
Integrating omics data with clinical, radiology, and histopathology information can significantly enhance the prediction accuracy and clinical relevance of AI models.
Explainability is key to driving the adoption of AI in early diagnosis and personalized treatment, especially for complex models.
Introduction
Technological advancements have improved our capacity to generate and analyse emerging large-scale omics modalities. This surge in data volume has necessitated the development of artificial intelligence (AI) techniques tailored for analysis of multidimensional biological datasets1 2 to help identify biomarkers and other signals.
Recent improvements in AI methods have been driven by deep learning (DL) models, themselves enabled by the development of graphics processing units.3 DL has led to breakthroughs in many practical applications, including in biomedicine, such as the prediction of protein folding in AlphaFold.4 Given the increasing availability of omics data, a similar stride forward is possible in understanding liver diseases. For example, AI methods integrating clinical and omics data offer the potential for rapid diagnosis of liver disease and severity of fibrosis.
In this paper, we provide an overview of the current landscape of AI tools applied to omics data in studying liver diseases. We focus on studies considering individual molecular level as well as the hierarchy of molecular levels on different liver diseases, along with phenotypic and clinical information. We emphasise novel AI approaches devised to address common challenges encountered in liver disease studies and highlight research trends. We discuss a range of diagnostic and prognostic models along with a few therapeutic models to cover the current landscape of this area. Diagnostic models are particularly interesting as early detection has the potential to guide personalised treatment strategies. We highlight common data analysis patterns and discuss recent breakthroughs using DL-based approaches in and beyond liver diseases. We also briefly discuss recent advancements in AI models, methods and learning paradigms in related research areas, such as medical imaging, protein structure prediction and drug discovery, which have the potential to significantly improve liver disease applications.
AI in omics data
Omics analysis: from traditional statistics to AI-driven insights
Given the central role of the liver in metabolic and immune functions, the integration of different omics modalities could provide a comprehensive understanding of the underlying regulatory mechanisms. Whereas statistical approaches were traditionally employed, data-driven AI models have now become more ubiquitous for omics data analysis. These models are adept at handling the intricate, high-dimensional and noisy omics datasets. DL models, a class of neural network-based AI models that use many layers, are particularly effective at learning complex relationships within high-dimensional data and are increasingly used for omics data. Further, integration with clinical and structural data with omics can enable the identification of molecular subtypes and patient strata, leading to more targeted and effective treatments.5 6 Current AI models are capable of learning complex functions that map omics and clinical features to outcomes. Figure 1 illustrates the emerging approach where an AI model prediction is treated as a biomarker.
The most common approach in the application of AI to omics data is to first use statistical methods such as differential analysis to identify a subset of relevant features, followed by AI model development to integrate prediction based on these features. Feature selection or dimensionality reduction is often necessary given the high-dimensional nature of omics data. The choice of algorithms for feature selection and modelling is typically influenced by the data’s structure and its specific characteristics, and the analysis objective. Table 1 presents some algorithm choices that align with these data nuances and problem contexts. While table 1 is not comprehensive and does not recommend using these algorithms exclusively for the conditions highlighted, it demonstrates that a variety of potential solutions are available for a given problem. For instance, when the number of samples is small, linear models, which are less prone to overfitting, serve as a good starting point. However, with appropriate data augmentation to generate synthetic samples, other alternatives, such as random forests or deep neural networks, can also be considered. Data augmentation, though, is not straightforward and requires careful consideration to ensure that the synthetic samples are realistic within the context of the domain and the problem. While this practice is common in image and natural language processing (NLP), the complexity of omics data presents a challenge that necessitates further research. Therefore, it is essential to consider the trade-offs between computational efficiency, interpretability and predictive performance when selecting algorithms. Evaluating multiple methods and validating them through techniques like cross-validation can help achieve reliable and robust models.
While end-to-end AI approaches that combine the feature selection and modelling steps have been proposed, statistical feature selection often infuses domain knowledge such as pathways and gene networks that are challenging to integrate without a strong prior. Statistical methods such as differential analysis consider each feature independently. Whereas AI models attempt to learn a non-linear relationship between multiple features, their combinations and the outcome. Figure 2 describes a common data analysis pipeline, where the modelling step integrates a small subset of selected features across several modalities in the molecular hierarchy. Interpretability and generalisability are two essential aspects of the AI model in clinical applications (figure 2) and are further discussed in the ‘Common pitfalls of applying AI to omics data’ section. In scenarios where model interpretability is important, preprocessing steps like feature selection or dimensionality reduction are still valuable.
Due to the challenges and costs associated with collecting and processing patient samples, omics datasets are often smaller compared with those used in other successful AI application areas such as computer vision and NLP. Developing AI methods for such high-dimensional data with small sample sizes is challenging. Self-supervised and transfer learning approaches can address some of these challenges. Figure 3 illustrates a few of the most widely used learning paradigms that are often referred to in this paper. Some other popular paradigms, such as reinforcement learning (RL) and federated learning (FL), are not shown here. In practice, these approaches are often combined to achieve specific results. For instance, one can use unsupervised pretraining followed by supervised fine-tuning for transfer learning when a large unlabelled dataset and a small labelled dataset are available.
Data-driven feature selection and biomarker discovery
Knowledge-driven statistical feature selection is widely popular in analysing omics data. Linear techniques such as principal component analysis7 and linear discriminant analysis (LDA)8 are commonly used to address multicollinearity. Multidimensional scaling (MDS)9 aims to maintain local relationships (neighbours) in a lower-dimensional space. Non-linear dimensionality techniques, such as t-distributed Stochastic Neighbour Embedding and Uniform Manifold Approximation and Projection, seek a balance between preserving the global shape of the data and local relationships within the data in the projected lower-dimensional space.10 However, the resulting projected feature spaces, whether linear or non-linear, are often less interpretable, and thus primarily used for visualisation and validation purposes.
Linear variable selection methods11 12 alongside non-linear strategies such as tree-based methods,13 support vector machine (SVM) and feature importance14 offer the advantage of preserving features and are therefore widely used as a precursor step to the modelling step.
Unsupervised DL-based autoencoders (AEs) are designed to learn a compressed, lower-dimensional representation of input data. The compressed data are then used to reconstruct the original input, ensuring that vital features necessary for reconstruction are captured by the lower-dimensional feature space. AE variants, such as variational AE (VAE), have shown promise in extracting more interpretable low-dimensional latent spaces from gene expression profiles for cancer classification.15 AEs are often used to pretrain DL models ultimately used for supervised, semisupervised or transfer learning.16 17
Molecular subtyping
Supervised AI models can be trained on molecular, clinical and phenotypic data using known subtypes to classify patients into diagnostic categories. Such subtyping can aid in biomarker discovery and potentially guide personalised preventive measures.18 Gao et al19 developed a DL-based supervised cancer subtyping framework trained on established subtypes based on the consensus clustering molecular subtyping approaches.
Unsupervised AI techniques, such as clustering algorithms, are vital for identifying novel subgroups within heterogeneous datasets.16 20–23 The Cancer Genome Atlas (TCGA) network used unsupervised clustering methods on multiplatform data to identify molecular subtypes in breast cancer samples.24 Similar clustering approaches have been employed to integrate multiple omics modalities from primary tumours and paired para-tumoural tissue for the identification of three hepatocellular carcinoma (HCC) subtypes with distinct clinical prognoses.18
AI in liver diseases
In this section, we review AI methods for the diagnosis, prognosis and treatment of liver diseases. Liver fibrosis, a hallmark of chronic liver diseases, develops variably in different individuals in the setting of chronic liver diseases, with its severity spanning from minimal fibrosis to decompensated cirrhosis. Omics technologies could help in the detection and treatment of early-stage fibrosis to slow disease progression, reduce the risk of end-stage liver diseases, such as cancer and decrease mortality. Table 2 summarises the studies considered in this review. An extended version of this table, containing the year of publication, type of clinical model and specimen/tissue information, is available as online supplemental table 1. When choosing studies to include, we considered omics diversity, algorithmic novelty, novel biological discoveries and cohort sizes as important factors. We use the self-reported area under the receiver operating characteristic (AUROC) statistic as a measure of the overall accuracy of the method. However, we note that this statistic is not the only possible measure of accuracy and is highly dependent not only on the positive but also the negative examples used in the study. For each liver disease considered, the degree of overlap in results across studies varies. Although common biomarkers are used, specific outcomes and conclusions often differ due to variations in study design, population characteristics, disease prevalence and the statistical or AI methods employed. The selected studies represent a subset of the work being carried out on this topic and are not a systematic review. Figure 4 illustrates the range of AI methods applied across various liver diseases over the last 10 years.
Supplemental material
Metabolic dysfunction-associated steatotic liver disease
Metabolic dysfunction-associated steatotic liver disease (MASLD), in which fat accumulates in the liver, is one of the most prevalent liver diseases worldwide. In recent years, metabolite and serum lipid profiling have emerged as important tools to diagnose metabolic dysfunction and assess disease severity in MASLD, including conditions such as active inflammation seen in metabolic dysfunction-associated steatohepatitis (MASH) and fibrosis.
Machine learning methodologies have been used to investigate metabolomics and pathophysiology of MASLD. McGlinchey et al25 used Gaussian mixture model-based clustering to group lipids and investigated their association with metabolites and clinical variables using a correlation network. Using their network, they identified unique metabolites linked to steatosis, MASH and fibrosis. They developed RF binary classifiers to model fibrosis progression, distinguishing between early and advanced stages, with a median AUROC of 0.77 for F0–F1 vs F2–F4 task and a median AUROC of 0.78 for F0–F2 vs F3–F4 task. Their findings highlighted a critical metabolic transition from stages F2 to F3 in MASLD pathogenesis, emphasising the significance of oxidative stress.
Quesada-Vázquez et al26 employed the Boruta algorithm, a supervised feature selection method based on RF, to identify metabolomic signatures of steatosis and used the limma linear regression model27 for transcriptomic analysis. Through SHAPley additive explanation (SHAP) analysis, they found plasma histidine strongly inversely associated with steatosis (SHAP mean value=0.777) and linked it to a hepatic transcriptomic signature involving insulin signalling, inflammation and trace amine-associated receptor 1.
Perakakis et al28 developed non-invasive diagnostic models that integrate lipids, glycans and hormones to diagnose MASH using SVM with recursive feature elimination. They identified a combination of lipids that could accurately diagnose the presence of liver fibrosis with 98% accuracy. Chiappini et al29 also developed an RF model to differentiate MASH from various MASLD stages using lipid profiling. They highlighted dysregulation in the fatty acid synthesis metabolic pathway as a diagnostic of MASH.
In a paediatric study, Khusial et al30 combined the plasma metabolome with clinical variables with an RF model, identifying a panel of amino acids consisting of serine, leucine/isoleucine and tryptophan to screen for MASLD in youth (AUROC=0.94). Apart from plasma metabolome, recent studies have systematically compared lipidomic profiles from urinary extracellular vesicles (EV), identifying lipid signatures diagnostic of MASH.31 Moolla et al32 also differentiated early (F0–F2) from advanced (F3–F4) fibrosis using a generalised matrix learning vector quantisation analysis on urinary metabolomic data (AUROC=0.92).
The association between the human gut microbiota and its contribution to the pathogenesis of MASLD has also been explored through machine learning methodologies. In a study by Wang et al,33 predictive taxonomic and functional signatures were identified from publicly available MASLD shotgun metagenomic datasets to elucidate links to MASH progression. To further investigate the diagnostic accuracy of their microbiome signature, stochastic gradient-boosting machine (GBM) learning models were trained to identify the severity of MASH, achieving performance ranging from AUROC of 0.7 to 0.94. Oh et al34 identified a gut microbiome-derived signature, combined with patient age, accurately detecting cirrhosis in a diverse MASH-cirrhosis population compared with non-MASH controls. Their RF model, trained on shotgun metagenomic and untargeted metabolomic profiles, achieved an AUROC of 0.91.
Using RF modelling, Sharpton et al35 identified a stool metagenomic signature comprising 13 discriminatory species to detect decompensated MASH cirrhosis. In their study, Saboo et al36 analysed stool and saliva microbiomes of cirrhosis patients, highlighting the superior predictive value of stool microbiota for cirrhosis progression over saliva microbiota. Their RF-based classification model based on stool microbiota achieved the highest AUROC of 0.78 in distinguishing patients with cirrhosis from controls.
The gut microbiome has also been investigated in a prognostic model to understand MASH development. Leung et al37 examined serum and stool samples and characterised both metagenomic and metabolomic changes during MASH progression in a community-based cohort. They developed interpretable models using RF, with AUROCs of 0.72–0.80, integrating baseline microbial signatures to identify patients at risk of developing MASLD 4 years after baseline.
Other omics modalities have emerged in recent literature as potential diagnostic and prognostic markers in MASLD. Eslam et al38 combined routinely available clinical and laboratory data with IFNL genotype for fibrosis prediction using decision trees. Govaere et al39 used logistic regression (LR) to model the progression of MASH, discriminating mild from advanced disease (MASH F0/F1 vs MASH F≥2) with an AUROC of 0.86. Baboota et al40 identified GREM1 as a strong predictor of fibrosis using conditional RF and gradient boosting, highlighting senescence markers as crucial drivers of MASH in humans.
In proteomic-based studies, Feng et al41 used LASSO regression with a 10-fold cross-validation to pinpoint urinary proteins associated with MASH and fibrosis stages. Their model distinguished advanced fibrosis with a better AUROC of 0.92 compared with early fibrosis stages with AUROC of 0.86. Luo et al42 achieved an AUROC of 0.78 using an elastic-net algorithm based on a 4-protein or 12-protein panel to distinguish advanced fibrosis among MASH patients.
An increasing number of studies have combined multiomics technologies to find diagnostic markers of MASLD.43 44 Atabaki-Pasdar et al43 applied machine learning methods, on patients with or without type 2 diabetes from the Innovative Medicines Initiative diabetes research on patient stratification Consortium cohort,45 to devise an early diagnostic model for MASLD. A Pearson correlation matrix, with a pairwise correlation threshold of r>0.8, was used for the feature selection of clinical variables while LASSO-based feature selection was performed for dimension reduction at each of the omics levels. An RF model developed using the preselected features, which consisted of both clinical and omics variables, achieved an AUROC of 0.84. Wood et al44 further assessed the performance of steatosis classifier models in the setting of obesity using LR on each omics domain alone and found that proteomics achieved the highest AUROC of 0.913 followed by phenomic data of 0.892 and PNPLA3 genotyping data of only 0.596. Combining all markers selected from each individual data domain achieved an AUROC of 0.935 for the diagnosis of hepatic steatosis.
Overall, AI algorithms integrating omics and clinical data show significant potential to inform personalised diagnostic, prognostic and therapeutic strategies in MASH.
Alcohol-associated liver disease
Alcohol-associated liver disease (ALD) encompasses a range of liver injuries due to alcohol consumption, spanning from hepatic steatosis to advanced stages like alcoholic hepatitis (AH), alcohol-associated cirrhosis (AC) leading to acute-on-chronic liver failure (ACLF). Despite advances, the molecular pathophysiology of ALD remains incompletely understood, highlighting the necessity for non-invasive diagnostic methods to screen individuals at risk, notably those with a history of alcohol misuse. Omics data and machine learning methodologies have been explored as potential diagnostic markers in ALD.
In a cohort of 7115 individuals over ~15 years, Liu et al46 explored the use of microbiome-augmented gradient boosting algorithms to predict liver disease incidence, particularly ALD. Gut microbial composition was assessed across various taxonomic levels. Prediction models for incident ALD were developed, using microbial features down to the species level, achieving an AUROC of 0.895.
Niu et al47 used plasma proteomic profiling through mass spectrometry to develop a diagnostic test for early fibrosis detection in patients with ALD. Their LR models, trained on a subset of 22 proteins, achieved better performance than 15 standard clinical tests in predicting significant fibrosis, mild inflammatory activity and any liver steatosis, achieving a mean AUROC of 0.90.
In their study, Listopad et al48 applied multiple machine learning models, such as LR, SVM and k-nearest neighbours (k-NN) to transcriptomic and proteomic data from liver tissue and peripheral blood mononuclear cells (PBMCs) to differentiate AH, AC and healthy controls. Integrating transcriptomic and proteomic data yielded superior performance in both liver tissue and PBMCs compared with individual omics-based classification methods. In PBMCs, the combined model achieved an AUROC of 0.96, outperforming the AUROCs of 0.89 for each individual omics-based classification.
Trépo et al49 developed gene-signature Model for End-Stage Liver Disease (gs-MELD), a scoring system combining baseline liver gene expression patterns of 123 genes with the MELD score to differentiate severe AH patients based on survival risk. The gs-MELD score was crafted using regression coefficients obtained from a multivariable Cox regression model.
Primary sclerosing cholangitis
Primary sclerosing cholangitis (PSC) is a rare cholestatic liver disease with a poorly understood pathogenesis and limited therapeutic options. It is characterised by bile stasis and progresses to fibrosis, cirrhosis, at times culminating in hepatic insufficiency.
Iwasawa et al50 investigated the salivary microbiota to distinguish salivary microbial communities of paediatric-onset PSC patients from healthy controls. For the classification problem, a combination of genera and species was used in RF models to distinguish the PSC group with an AUROC of 0.74.
Altered bile acids play a pivotal role in nutrient absorption and act as crucial signalling molecules, regulating hepatic metabolism and modulating immune responses. While altered BA homeostasis is an intrinsic facet of cholestatic liver diseases, the clinical utility of plasma BA assessment in PSC remains understudied.
Mousa et al51 used a combination of univariate Cox proportional hazard and multivariable gradient boosting machine models to evaluate whether plasma BA was an effective predictor of hepatic decompensation in PSC patients. Their model achieved a high concordance statistic (C-statistic) of 0.86 on a validation cohort.
Hepatitis B virus and hepatitis C virus
Although viral hepatitis incidence has decreased due to curative treatments and vaccines for hepatitis C virus (HCV) and hepatitis B virus (HBV), respectively, they remain the leading causes of chronic liver diseases. Viral hepatitis progresses rapidly, often culminating in cirrhosis, HCC and end-stage liver disease.
Treatment for chronic HCV infection has improved rapidly since the introduction of all-oral, interferon-free direct-acting antiviral (DAA) regimens. DAAs have significantly high cure rates (sustained virological response, SVR) compared with previous interferon-based therapies. Cure rates with DAAs exceed 90%–95% across different HCV genotypes and patient populations, including those with cirrhosis and other comorbidities. However, patient-specific comorbidities burden, prior treatment exposures and challenges with adherence, which can affect the effectiveness of DAAs. Park et al evaluated multiple machine learning approaches and found GBM the most effective in generating prediction risk scores for identifying patients with chronic hepatitis C infection at high risk of DAA treatment failure and to determine predictors associated with DAA treatment failure.52 Feldman et al present a machine-learning-based approach to predict patient characteristics associated with the need for extended duration for DAA therapy given a particular DAA regimen. Their optimal predictive model using XGBoost had an AUC of 0.745.53
In recent times, AI-assisted drug discovery has streamlined and enhanced the drug discovery process. Particularly, AI has been used to identify new uses for existing drugs. One such study by Kamboj et al used SVM, RF, k-NN and artificial neural network and quantitative structure–activity relationship approaches to predict repurposed drugs targeting HCV non-structural proteins.54
Wu et al55 implemented an attention-based DL model to predict HBV genomic integration sites, unveiling novel insights into HBV-induced cancer. Their binary classifier, incorporating convolutional layers and attention modules, effectively predicted HBV integration sites by identifying key genomic regions crucial for classification.
In a proteomic study by Estevez et al,56 an RF model analysed patterns in 51 common cytokines. Their findings revealed distinct cytokine profiles in patients with HBV or HCV infection, particularly in the context of the presence of HCC.
HCC and intrahepatic cholangiocarcinoma
While the burden of HBV and HCV-associated HCC has been declining with effective antiviral therapy, HCC incidence related to metabolic syndrome is expected to rise due to the dramatic increase in the prevalence of MASH in the general population.57
Omics data and machine learning methodologies are a growing field of study in primary liver malignancies and their potential application in translational research is broad, from screening, to diagnostics, to prognostic clinical tests.
Hershberger et al58 identified machine-learning-derived salivary metabolite combinations as promising biomarkers for HCC detection, highlighting saliva analysis as a potential diagnostic tool for liver diseases. Furthermore, volatile organic compounds (VOCs) in patient breath offer valuable insights into disease processes, capable of distinguishing disease status. Leveraging machine learning with breath VOCs holds the potential for developing improved, non-invasive screening tools for chronic liver disease and primary and secondary liver tumours.59
Lewinska et al60 employed multivariate analysis to evaluate the diagnostic potential of altered serum lipidome in MASLD-associated HCC, revealing depleted unsaturated fatty acids and elevated triglycerides as indicative of deregulated metabolic networks in MASLD-HCC. Wu et al61 enhanced serum lipid profiling accuracy with graphitised carbon matrix, offering superior diagnostic utility for liver cancer compared with conventional matrices.
The primary factors behind HCC are genetic and epigenetic changes. Lu et al62 integrated clinical features and genetic biomarkers to establish a decision tree-based model for HCC detection postviral eradication. Their prospective study included 55 HCV patients with advanced fibrosis achieving sustained virological response. The model achieved 95.7% accuracy in HCC detection, identifying TAS1R3, FOSL1 and ABCA3 as key predictors. The transcriptome serves as a pivotal link between cellular phenotype and genetic tumour biology, containing RNA-encoded information transcribed from DNA. Unlike the relatively stable genome, the transcriptome is dynamic, responding actively to physiological or pathological conditions. Choi et al63 introduced a network-based DL model, adapted from the Word2Vec framework, to analyse gene expression data sourced from TCGA. This approach identified distinct gene modules significantly linked to HCC prognosis. Along with transcriptomic data, Chaudhary et al64 used microRNA and methylation data in an AE-based DL framework to identify two HCC subpopulations with differential survival. Their results consistently linked the more aggressive subtype with higher TP53 inactivation mutation frequencies while the moderate subtype exhibited activated metabolism-related pathways, including drug metabolism, amino acid and fatty acid metabolism.
Recent single-cell transcriptomic studies have enabled in-depth analysis of the immune landscape. LASSO regression was used to identify T-cell exhaustion signatures, characterising the immune landscape and predicting HCC prognosis.65 Gong et al66 uncovered the critical role of neutrophils in the HCC tumour microenvironment. They developed a neutrophil-derived signature via the partition around medoids algorithm), followed by Cox analysis, integrating them by fitting 101 models using the leave-one-out cross-validation framework.
As ageing-related genes are closely related to the occurrence and development of cancer, building prognostic models with them is gaining popularity. Zhang et al67 using gene expression data developed an HCC senescence score model for prognostic prediction. Using a combination of feature selection methods and a LASSO penalised Cox regression model, the genes, CDCA8, CENPA, SPC25 and TTK, associated with HCC progression. In another study, LASSO regression was used to develop a prognostic model consisting of three ageing-related genes, TFDP1, NDRG1 and FXR1.68
Liu et al69 identified a set of nine efferocytosis-related genes and their regulatory influence on HCC immunotherapy to construct a prognostic model for HCC. Ma et al70 developed a consensus clustering approach to identify the impact of tumour cell state heterogeneity and functional clonality on HCC and intrahepatic cholangiocarcinoma (iCCA) patient prognosis.
Data-driven omics-based biomarkers have the potential to guide personalised therapeutic strategies in oncology. Genomic predictors could be used to tailor chemotherapy, immunotherapy and combined treatments for more effective therapeutic response. Retrospective studies have shown that gene expression profiles and other molecular signatures derived from omics data can predict patient response to therapies like anti-PD1 and anti-CTLA-4, which are used in immunotherapy. Hectors et al found that the expression of the immunotherapy target PD-L1 at the protein level, as well as PD-1 and CTLA4 at the mRNA level, correlated with radiomic features of HCC.71 Chen et al evaluated the stemness index based on transcriptome data to find subtypes within patients with HCC.72 Their LR-based classifier achieved an AUC of 0.918 to identify patients who are more probable to have a sensitive response to immunotherapy based on stemness features. These AI tools would need to be further validated to permit deployment into clinical practice, but nonetheless show the promise of integrating clinical and ‘omics data using AI to guide a personalised therapeutic approach. Although not using omics, DL architectures, such as transformers, have been used to study HCC. Sato et al73 used a transformer model to predict overall survival in patients with HCC treated by radiofrequency ablation (RFA). Wang et al74 used a transformer to study associations between MRI and microvascular invasion in patients with solitary HCC. Further, combining different treatment modalities, such as systemic therapies with local treatments like RFA or transarterial chemoembolisation (TACE), is an emerging strategy aimed at improving therapeutic outcomes and overcoming resistance mechanisms in HCC and is discussed in the ‘Omics data integration with biomedical images’ section.
Acute liver failure
Acute liver failure (ALF) is a rare clinical manifestation of rapid and severe liver dysfunction in patients without pre-existing liver disease. Challenges associated with building AI tools for ALF include its sudden onset which limits the availability of comprehensive molecular profiles of affected patients. AI tools for ALF must address the need for short-term risk prediction, patient stratification and evaluation of therapeutic responses to improve patient outcomes.
Sharma et al75 identified the top five metabolites through multiomics analysis and assessed their potential in predicting early mortality or the need for emergency liver transplantation in patients with ALF. Five machine learning algorithms RF, k-NN, SVM, classification and regression trees and LDA were evaluated for classification based on metabolite expression levels. While models trained on individual metabolites exhibited significant variation, the combined model utilising all five metabolites achieved the highest AUC of 0.933 on the validation set.
Drug-induced hepatotoxicity, also a leading cause of ALF, is categorised as acute or chronic depending on the duration of injury and histological damage location. Jin et al76 developed an interpretable XGBoost-SHAP model integrating rat liver gene expression profiles, drug labels and clinical reports to evaluate and predict drug-induced liver injury (DILI). Moore et al77 investigated SNP impact on DILI susceptibility using linear models such as multivariate adaptive regression splines (MARS), multifactor dimensionality reduction (MDR) and LR. They found that SNP–SNP interactions significantly predicted DILI chronicity.
Acute-on-chronic liver failure
ACLF is characterised by rapid clinical deterioration in patients with chronic liver disease. While omics-based studies have increased, the rapidly evolving clinical course of ACLF poses challenges in identifying its pathophysiological process and developing prognostic models.78
A study by Zhang et al79 used an RF model in conjunction with differential analysis (t-test followed by false discovery rate-based sorting) to prioritise metabolite biomarkers that are found relevant in both approaches. For diagnosing pre-ACLF patients, they combined two metabolites, pipecolate and gamma-carboxyethyl hydroxychroman (γ-CEHC), with clinical parameters using an LR model. Their combined diagnostic model achieved an AUROC of 0.88 on a validation dataset.
Further, they developed a prognostic model, with metabolites comprising ureidopropionate, N-acetyl-aspartyl-glutamate and pipecolate, to predict 90-day mortality in patients with ACLF, achieved an AUROC of 0.83 on a validation dataset.
Acute cellular rejection
There is a growing body of studies focusing on molecular profiling of liver allografts for early diagnosis of outcomes, such as acute cellular rejection (ACR). ACR commonly occurs within the initial 90 days postliver transplantation, marked by cytotoxic T cell infiltration into the transplanted liver tissue. In a paediatric study, Ningappa et al80 applied an integrative machine learning method merging transcriptomics data with the human protein interactome. Through a LASSO-based feature selection approach, they pinpointed network module signatures linked to rejection in paediatric liver transplantation with a median AUROC of 0.7. Their comparison of network-based signatures to gene-based signatures obtained through the differential expression of genes (DEGs) demonstrates how these network modules may be more suitable for targeting with antirejection drugs.
Omics data integration with multiple modalities
Recent studies have emphasised the significance of integrating omics data with additional modalities like clinical variables, as well as radiology and histopathology images. These studies demonstrate the consequential enhancements in prediction accuracy and clinical relevance, affirming the superiority of the integrated approach.
Omics data integration with clinical variables
Integration of clinical variables with omics data has been shown to significantly improve accuracy in liver disease characterisation and diagnosis across various studies. By combining information from both clinical assessments and omics profiling, researchers have achieved enhanced predictive performance, allowing for more accurate diagnosis and prognosis evaluation.
The RF model, analysing gut microbiome changes during MASLD progression alongside integrated microbial and clinical features (body mass index, age, High-Density Lipoprotein, fasting insulin), demonstrated an improved AUROC of 0.8 compared with both clinical prognostic models (AUROCs of 0.58–0.60) and models relying solely on microbial signatures (AUROC of 0.72).37 Similar improvements were demonstrated with lipids and polar metabolite integration in MASLD25 and microbiome data integration in ALD,46 Oh et al combined gut microbiome-derived metagenomic signature with patient age and serum albumin levels to detect cirrhosis. They found that addition of serum aspartate aminotransferase levels enabled discrimination of cirrhosis from earlier fibrosis stages.34
In their multiomics study, Atabaki-Pasdar et al43 found that integrating genetic, transcriptomic, proteomic and metabolomic with clinical variables yielded better performance (AUROC 0.84) compared with using clinical variables alone (AUROC 0.82) in predicting liver fat quantity.
Omics data integration with biomedical images
Histopathology plays a vital role in LD diagnosis and characterisation. DL-based histopathology image analysis studies often leverage omics data for validation. Huang et al81 used a VGG-16 network-based CNN and Raman spectroscopy to distinguish between HCC and iCCA and leveraged tissue metabolomics for validation. Calderaro et al82 developed an attention-based DL model to reclassify combined HCC-iCCA as either HCC or iCCA using whole slide images. Clinical and biological relevance of the predictions were validated using survival prognosis and spatial transcriptomic analysis. Conway et al83 used a pretrained deep CNN model to extract human interpretable features from histopathology images and correlated them with transcriptomic profiles to reveal a five-gene signature that distinguishes stage F3 from F4 fibrosis.
Radiogenomics, a promising field that correlates radiomics features with other omics modalities, has gained significant attention. Several studies have successfully used radiogenomics to decode the molecular profile of HCC.71 84–86 In a recent study,87 applied radiogenomics to predict mutations in the PI3K signalling pathway.
While surgical resection and liver transplantation remain the primary curative treatments for HCC, radiofrequency ablation (RFA) is a cornerstone for image-guided ablation in non-surgical early-stage HCC. For intermediate-stage HCC, TACE is the most widely used treatment. Boldanova et al59 integrated clinical observations with genetic and radiological insights to forecast patient responses to TACE. They used RF to identify potent features, such as tumour morphology and genetic markers, for predicting TACE response.
For those who cannot undergo the above treatments for HCC, targeted therapies, immunotherapy and other systemic treatment options have become invaluable, particularly for treating advanced stages of HCC. Tian et al combined 14 features—9 selected from a three-dimensional CNN trained on MRI images, and 5 radiomic features—to predict PD-L1 levels using an SVM classifier. Their combined model performed the best with an AUC of 0.897 while the DL-based feature selection model alone and the radiomics-based feature selection model alone had AUCs of 0.852 and 0.794, respectively.88
Common pitfalls of applying AI to omics data
Depending on the question and AI approach, careful design of training and validation datasets and pipelines is crucial. In clinical contexts, factors, such as data stratification, randomisation, cross-validation strategy and preventing preprocessing leaks, are especially important.
Whalen et al89 identify key challenges in applying AI in genomics, highlighting the importance of addressing distributional differences, such as batch effects, to prevent model overperformance on testing data and ensure model generalisability. Harmonising data distributions becomes crucial when integrating multiple omics modalities.
Functional interdependencies between genes and proteins and confounding factors like age are common in omics data. Addressing confounders necessitates precise data stratification while correlated features are managed through selection or grouping. Ensemble methods like RF and DL are inherently adept at learning from correlated features.
Missing data are a significant challenge in omics analysis, stemming from technical limitations, experimental constraints and sample variability. In single-cell RNA-seq, a missing transcript may indicate the cell’s state or result from technical issues like low sequencing depth or library preparation issues.90 Strategies for addressing missing data range from basic imputation methods like mean or median to more advanced techniques such as k-NN and iterative imputation. Emerging approaches include transfer learning-based neural networks and deep neural network tensor factorisation for imputation, offering promising solutions for omics data analysis.91 92
Class imbalance is a common challenge in omics due to the rarity of certain conditions, which can bias models. Strategies to address this include sample weighting, oversampling or undersampling and data augmentation. Specifically, data augmentation involves generating synthetic data to supplement existing datasets using generative models, such as generative adversarial networks or VAEs. Lee et al93 used a generative neural network to differentiate common differentially expressed genes from context-specific transcriptional patterns. While generative models can help address limited samples or missing data, the augmented data must undergo careful validation to ensure its biological relevance.
The explainability of AI models is crucial in the context of healthcare applications, particularly for black-box models. Understanding how an AI model reaches its decisions is essential for gaining confidence in it, justifying AI-augmented decision-making and to understand its biases. This is also a key step in improving our mechanistic understanding of omics data analysis, where biological processes are inherently complex. Tree-based methods like gradient boosting and decision trees offer inherent mechanisms for assessing feature importance, making them valuable not only for prediction but also for interpretability tasks.13 94 Techniques such as partial dependence plots further aid in visualising the relationships modelled by these methods.95
For explaining the output of any AI model in a model-agnostic manner, SHAP is widely applicable, generating both global and sample-level explanations.96 Variants of SHAP, including DeepSHAP and DeepLIFT, have been used for studying molecular features.97 98 Attention-based mechanisms, which are inherently explainable, have found applications in DNA sequence and multiomics data analysis.99 In the context of omics data integration, there is an opportunity to further develop DL-based approaches that are intrinsically self-explanatory7 or can rationalise their decisions.8
AI beyond liver diseases
The rapid development of AI methods has created opportunities to tackle the unique challenges in biological data analysis, offering innovative approaches to overcome them. In this section, we discuss some of these advances and highlight their potential in liver disease diagnosis and treatment.
In scenarios with limited training data, transfer learning has emerged as a powerful approach as demonstrated by Lu et al using whole slide images.100 Pretraining on large-scale data allows a model to learn general knowledge, which is then fine-tuned for specific tasks. This approach generalises well to unseen data and reduces the need for extensive data labelling. While image data are abundant, similar strategies are being explored for omics data using large datasets like TCGA. For instance, Wang et al demonstrated a pretrained multiomics network using gene expression, DNA methylation, gene mutation and copy number variation data from 32 pan-cancers in the TCGA database. This network could be fine-tuned for downstream tasks in oncology, even with incomplete data modalities.101 Developing open-source, pretrained models for omics data will allow researchers to use DL techniques, even with limited liver disease-specific data.
The pretraining phase in transfer learning often employs supervised learning with labels. However, obtaining and annotating labelled data can be resource-intensive and time-consuming. Moreover, labels already available in general datasets may not be well suited for the ultimate learning objective. In such cases, self-supervised learning, a form of unsupervised learning that generates training labels from unlabelled data to learn representations, can be used. Zhou et al,102 for instance, pretrained a foundation model on 1.6 million unlabelled retinal images through self-supervision before adapting the model for various disease detection, from sight-threatening eye diseases to complex systemic disorders such as heart failure and myocardial infarction. Similar approaches can be adopted for omics data on conditions, such as ALF, where samples are rare and limited. The inherent characteristics of an omics modality can be learnt from unannotated data and then fine-tuned on annotated samples.
The main challenge in self-supervised learning lies in developing methods to create labels from unlabelled data. One technique to achieve this is contrastive learning, which involves using pairs of positive (similar) and negative (dissimilar) samples derived from the unlabelled data to learn representations. This method ensures that similar objects are mapped closer together in the learnt latent space than dissimilar ones. Tian et al103 leverage contrastive learning for predicting the origins of primary tumours in cancers, using cytological images. Yang et al104 used a scalable contrastive learning framework to analyse single-cell multimodal data at a 10 million cell scale, applying it to a COVID-19 immune cell dataset and by comparing it to a reference atlas of healthy and infected samples.
Transformers, a type of model architecture foundational in NLP and computer vision, are now being adapted for use in omics and clinical applications.105 The attention mechanism in transformers allows the model to assess different data segments, capturing relationships among tokens throughout the sequence, regardless of their distance from each other. These models can capture (learn) spatial relationships, such as between image regions or among molecules in omics profiles and temporal patterns in longitudinal data. Liu et al106 developed a transformer model that integrates multiomics data with a criss-cross attention mechanism to capture interactions between biological pathways and data types, showing potential for cancer screening.
Translation approaches, developed for translating text or data from one form to another, such as from text to speech, have also proven valuable in biological data analysis. Schmauch et al107 employed a DL model to predict transcriptome profiles from histology images. Their model effectively correlates specific molecular signatures with morphological patterns. In medicine, translation models offer significant advantages in multimodal analysis. First, they facilitate the integration of diverse data types like omics, clinical measurements and imaging data, uncovering intricate relationships that enhance our understanding of diseases. Second, they address a common challenge in multimodal data analysis: the presence of misaligned datasets where certain measurements or modalities are missing for many patients. Translation models can effectively impute missing modalities, thereby enriching the dataset and improving diagnostic and prognostic capabilities.
RL is a learning paradigm that has shown success in tasks requiring decision-making over multiple steps. In RL, an agent learns to make decisions by interacting with its environment. At each step, the agent seeks to maximise its cumulative future rewards by exploring various decisions and assessing their long-term consequences. RL has achieved notable successes, such as surpassing human performance in complex games like AlphaGo.108 In the medical field, Barata et al109 applied RL to the diagnosis of skin cancer. They conceptualised diagnosis as a series of decisions, where each decision, whether correct or incorrect, carries a reward or penalty based on expert judgments. Through interactions with the environment, the RL agent learns optimal decision-making strategies for different diagnostic scenarios.
Typically, AI has addressed generalisability issues through a centralised approach by aggregating and data from multiple sources. However, this centralised approach is not always feasible and raises data privacy concerns. FL offers a solution by sharing model updates instead of raw data. This approach enables AI models to learn from diverse datasets that may otherwise be inaccessible, which is particularly advantageous in clinical medicine. FL shows potential in addressing health disparities, supporting underserved populations and advancing research on rare diseases. Importantly, FL enhances patient privacy by minimising the need to transfer or duplicate sensitive data. For instance, Pati et al110 illustrate how AI models can perform brain tumour segmentation and diagnosis by leveraging patient data across 71 sites worldwide, employing FL to protect patient confidentiality without sharing personal data.
Generative AI is a family of models that are capable of synthesising previously unseen data. Transformers, for instance, are a commonly used architecture to develop generative models. Cui et al111 demonstrate a generative pretraining workflow, trained on over 33 million cells, specifically for single-cell omics data, adapting the transformer architecture to simultaneously learn cell and gene representations. One natural application of these methods is to augment training datasets, thereby improving the robustness and generalisation of other AI models. Generative AI, however, has applications that go much further. In particular, their ability to generate novel hypotheses has tremendous potential in drug discovery and molecule design. AlphaFold,112 the current state-of-the-art in protein structure prediction, uses a class of generative AI called diffusion models.
Discussion and conclusions
This review focused on a wide range of omics modalities that are being studied across liver diseases to improve our understanding of the underlying pathophysiological mechanisms. However, most individual studies explore a few omics modalities. Figure 5 shows that transcriptomics is the most extensively studied, followed by metabolomics and lipidomics. Practical challenges include high costs, limited long-term follow-up data, missing data limit the exploration to a few omics layers. Given the nature of existing omics datasets, there is a growing need for AI techniques capable of distilling knowledge from disparate and sparse datasets. AI methods such as transfer learning, which use large datasets for pretraining before adaptation to a different domain, could be useful in situations with limited sample sizes. The success of models built on benchmark datasets like TCGA underscores the need for creating and expanding public curated datasets.
While our review has primarily focused on the scientific advancements, it is essential to establish protocols, ethical assessments and regulatory oversight for using AI in clinical settings. AI holds great promise for improving outcomes in liver diseases, but it is not yet commonly used in clinical settings. Several challenges have limited their applicability outside their training domain and cohort, including their generalisability. Patient populations vary widely across demographics. If the training data are not representative of the broader population, it significantly degrades model’s generalisability in a real-world setting.
Most studies rely on external validation cohorts for benchmarking. Studies without a separate validation cohort often perform a k-fold cross-validation and/or random split of the discovery cohort into training and test sets. While this is an essential first step, most of these cohorts are still relatively small (<500 patients) and do not reflect the diversity in the global population. Ensuring models are validated on large and diverse datasets that match the conditions expected during the clinical deployment of the model, also known as data stratification, is essential. The performance of most current models is likely to diminish when assessed prospectively using ‘real-world’ data. Further, most current AI models treat health as a single time point event, whereas diagnosis requires consideration of a patient’s history. Therefore, more large-scale multicentre prospective studies with standardised omics data acquisition techniques are needed to ensure further development of AI methods in liver diseases. The development of extensive, well-curated and well-phenotyped datasets is essential for creating multimodal AI models. This is crucial because no level of technical sophistication can extract information that is absent from the data. Standardised validation datasets are also necessary to effectively assess model performances in real-world settings and to compare algorithms.
While efforts are underway to ensure data availability, model and code availability is lagging in healthcare. To improve confidence in these models, results reported in manuscripts should be easily reproducible, which requires that the source code, trained models, and training and validation datasets be open sourced.
Given the large variations in how results are reported, we choose AUROC as it is the most consistently reported metric in most studies. We also used a combination of AUROC and the size of the validation cohort to consider the strength of the study—a high AUROC on a small dataset is not guaranteed to translate to equivalent results in real-world settings. Model performance measures, such as AUROC or concordance index, are useful in evaluating the ability of the model to discriminate between classes, however, relying on it as the sole performance measure can be limiting. Metrics such as the net benefit, decision curve analysis and cost-effectiveness analysis also provide valuable insights into the clinical utility and impact of the model. This underscores the necessity of developing standardised problem-specific reporting standards. Such practices are well established in fields like computer vision for tasks such as object detection and segmentation.113
Regulatory and ethical challenges such as privacy, susceptibility to misinterpretation and fairness are critical concerns when deploying AI systems in a clinic. Further research from the machine learning community is required to address concerns such as amplification of biases. It is important for these models to be able to identify and report their own deficiencies and accurately estimate their confidence in decisions. The safe integration of clinical AI ultimately relies on well-informed clinician-users, who play a crucial role in identifying and reporting emerging issues and ensuring the safe usage of deployed models.114 Finally, a practical barrier to deploying AI models in clinical settings is the differences in standards and systems for data collection and treatment across institutions. Effectively integrating AI models in the clinical workflow of an institution requires specially trained individuals.
There are common biomarkers, such as alanine aminotransferase, aspartate aminotransferase, gamma-glutamyl transferase, albumin and bilirubin, which are frequently used across different studies on liver diseases. However, differences in study design, population characteristics, prevalence of the disease in the studied population and the statistical or AI methods employed, contribute to large variations in results and conclusions. To decipher consistent trends from these studies, further systematic reviews and meta-analyses studies are needed.
A critical advancement in the field will be the development of frameworks that enable integration of pretrained models from routinely measured parameters such as clinical data and imaging data with more specialised modalities such as omics. Such cross-disciplinary frameworks are essential for holistically capturing the signatures of a disease propagating across the system
Overall, there has been significant progress in the development of AI methods applied to omics data. The next generation of AI models integrating clinical, omics and other types of data, all provide a tremendous opportunity to personalise the care of liver patients and identify individualised strategies to improve their health trajectories. As these AI paradigms continue to evolve, their integration into liver disease research and clinical practice holds great promise for improving diagnosis, treatment and patient outcomes, paving the way for more personalised and effective care.
Ethics statements
Patient consent for publication
Acknowledgments
The authors would like to thank Surabie Sivanendran for her contributions towards extracting feature selection methods for some studies. The authors would also like to thank David Huang for his help in initial shortlisting of radiogenomics studies.
References
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Footnotes
Contributors SG, MBr and MBh conceptualised the review; SG wrote the manuscript and created the figures and table; SG, XZ and MA researched data for the article and edited the manuscript; MBr and MBh provided critical feedback, edited and helped shape the manuscript.
Funding This work was supported by the University of Toronto’s Eric and Wendy Schmidt AI in Science Postdoctoral Fellowship, a programme of Schmidt Futures, granted to Soumita Ghosh. MBr is a CIFAR Chair in Artificial Intelligence.
Competing interests None declared.
Provenance and peer review Commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.