Identifying transcriptional miRNA biomarkers by integrating high-throughput sequencing and real-time PCR data

doi:10.1016/j.ymeth.2012.10.005

Methods

Volume 59, Issue 1, January 2013, Pages 154-163

https://doi.org/10.1016/j.ymeth.2012.10.005 Get rights and content

Abstract

Using both high-throughput sequencing and real-time PCR, the miRNA transcriptome can be analyzed in complementary ways. We describe the necessary bioinformatics pipeline, including software tools, and key methodological steps in the process, such as adapter removal, read mapping, normalization, and multiple testing issues for biomarker identification. The methods are exemplified by the analysis of five favorable (event-free survival) vs. five unfavorable (died of disease) neuroblastoma tumor samples with a total of over 188 million reads.

Highlights

► Reproducible analysis workflow for miRNA expression by high-throughput sequencing. ► How to address the challenges raised by mapping short miRNA reads. ► Special treatment for dinucleotide color space mapping. ► Biomarker candidate discovery by robust normalization and differential expression analysis. ► Validation with RT-qPCR Cq-values.

Introduction

Current high-throughput sequencing (HTS) technologies offer the opportunity to characterize the genomic, epigenomic and transcriptomic state of a tumor. Here we focus on the bioinformatic methodology of characterizing the microRNA (miRNA) transcriptome of a sample. Functional miRNAs regulate the translation and cleavage of mRNAs by sequence-specific interaction with the 3′ UTR, reviewed in [1]. MiRNAs are involved in the regulation of many physiological processes, including differentiation, development and apoptosis [2]. In cancer, miRNAs may exert oncogenic function by inhibiting tumor suppressor genes or may act as tumor suppressors by inhibiting oncogenes [3]. The goal is to identify putative biomarkers, i.e., miRNAs that are differentially expressed between tumor and surrounding tissue, or between low-risk and high-risk subtypes of the tumor. In a previous study, we found differential miRNA expression between favorable versus unfavorable neuroblastoma subtypes [4].

In this article, we discuss the fundamental challenges involved in estimating expression values from short RNA reads and describe the computational pipeline for obtaining a ranked list of differentially expressed miRNAs from the raw sequence reads. Expression levels of these biomarker candidates should be confirmed by RT-qPCR, and we discuss how logarithmic HTS expression values correspond to negative $Δ C_{q}$ values. To provide some guidance for other experiments, we illustrate each step with numerical examples from our previous neuroblastoma study [4].

Section snippets

Datasets

The dataset of our previous study consists of five low-risk patients (neuroblastoma stage 1, no MYCN amplification, event-free survival [EFS], labeled 552–556) and five high-risk patients (neuroblastoma stage 4 with MYCN amplification, died of disease [DoD], labeled 557–561). This dataset can be retrieved from the NCBI Sequence Read Archive [5] using Accession No. SRA009986.

In the form presented here, the pipeline expects color space [6] FASTA (.csfasta) and quality (.qual) files, which is the

Fundamental challenges

Before we discuss the steps of the pipeline, we highlight three fundamental difficulties that arise during the analysis of short (mi) RNA reads.

Automated miRNA expression analysis

Computing the expression profiles of all miRNAs from raw sequence reads involves several steps. In addition to a textual description, a well-documented and formalized workflow description is necessary to reproduce each single step. We use the workflow system snakemake [7] to describe the bioinformatics pipeline in a way that is both formal and human-readable and can be visualized in graphical form (cf. Section 4.1).

We then describe each key step in more detail. For each dataset (patient), the

Discussion and conclusion

As discussed in Section 3, there are several challenges present in the typical short reads of miRNA datasets that distinguish their analysis from that of typical RNA-seq experiments. Attention to detail in every analysis step, such as embedding the mature miRNA reference sequences into their adapter context when mapping in color space, or robust non-invasive normalization methods are crucial for obtaining accurate expression level estimates.

We provide an optimized automated pipeline, as

Acknowledgments

S.R., J.K., A.S. gratefully acknowledge funding by the DFG Collaborative Research Center (Sonderforschungsbereich, SFB) 876 “Providing Information by Resource-Constrained Data Analysis” within projects TB1 and C1 (http://sfb876.tu-dortmund.de).

References (24)

D.P. Bartel
Cell
(2004)
M.I. Abouelhoda et al.
J. Discrete Algorithms
(2004)
J. Winter et al.
Nat. Cell Biol.
(2009)
R. Garzon et al.
Annu. Rev. Med.
(2009)
J.H. Schulte et al.
Nucleic Acids Res.
(2010)
R. Leinonen et al.
Nucleic Acids Res.
(2011)
H. Breu, A Theoretical Understanding of 2 Base Color Codes and Its Application to Annotation, Error Detection, and...
J. Köster et al.
Bioinformatics
(2012)
M. Martin
EMBnet.journal
(2011)
H. Li et al.
Bioinformatics
(2009)

H. Li et al.

Bioinformatics

(2009)

A.R. Quinlan et al.

Bioinformatics

(2010)

Cited by (11)

Amplification-free electrochemical biosensor detection of circulating microRNA to identify drug-induced liver injury
2023, Biosensors and Bioelectronics
Drug-induced liver injury (DILI) is a major challenge in clinical medicine and drug development. There is a need for rapid diagnostic tests, ideally at point-of-care. MicroRNA 122 (miR-122) is an early biomarker for DILI which is reported to increase in the blood before standard-of-care markers such as alanine aminotransferase activity. We developed an electrochemical biosensor for diagnosis of DILI by detecting miR-122 from clinical samples. We used electrochemical impedance spectroscopy (EIS) for direct, amplification free detection of miR-122 with screen-printed electrodes functionalised with sequence specific peptide nucleic acid (PNA) probes. We studied the probe functionalisation using atomic force microscopy and performed elemental and electrochemical characterisations. To enhance the assay performance and minimise sample volume requirements, we designed and characterised a closed-loop microfluidic system. We presented the EIS assay's specificity for wild-type miR-122 over non-complementary and single nucleotide mismatch targets. We successfully demonstrated a detection limit of 50 pM for miR-122. Assay performance could be extended to real samples; it displayed high selectivity for liver (miR-122 high) comparing to kidney (miR-122 low) derived samples extracted from murine tissue. Finally, we successfully performed an evaluation with 26 clinical samples. Using EIS, DILI patients were distinguished from healthy controls with a ROC-AUC of 0.77, a comparable performance to qPCR detection of miR-122 (ROC-AUC: 0.83). In conclusion, direct, amplification free detection of miR-122 using EIS was achievable at clinically relevant concentrations and in clinical samples. Future work will focus on realising a full sample-to-answer system which can be deployed for point-of-care testing.
K fertilizer alleviates N<inf>2</inf>O emissions by regulating the abundance of nitrifying and denitrifying microbial communities in the soil-plant system
2021, Journal of Environmental Management
Citation Excerpt :
However, few studies focused on the effects of K on soil N2O emission and its associated mechanism in a cotton field. Recent advances in high-throughput sequencing methods (Rahmann et al., 2013) now allow us to examine the effects of K fertilization on the key functional genes, enzymes and microbial communities related to soil N2O production in the soil–cotton system. In this study, we investigated the community structures and abundance of functional genes of nitrifying (amoA in AOA and AOB) and denitrifying microbes (narG, narH, narl, nirK, nirS, norB and nosZ), as well as their relationship with soil properties and N2O flux, and investigated their response to K and combined N and K additions in cotton soils.
Potassium (K) fertilizer additions can result in high crop yields of good quality and low nitrogen (N) loss; however, the interaction between K and N fertilizer and its effect on N₂O emissions and associated microbes remain unclear. We investigated this in a pot experiment with six fertilizer treatments involving K and two sources of N, using agricultural soil from the suburbs of Wuhan, central China. The aim was to determine the effects of the interaction between K and different forms of N on the N₂O flux and the abundance of nitrifying and denitrifying microbial communities, using static chamber-gas chromatography and high-throughput sequencing methods. Compared with no fertilizer control (CK), the addition of nitrate fertilizer (NN) or ammonia fertilizer (AN) or K fertilizer significantly increased N₂O emissions. However, the combined application (NNK) of K and NN significantly reduced the average N₂O emissions by 28.3%, while the combined application (ANK) of K and AN increased N₂O emissions by 22.7%. The abundance of nitrifying genes amoA in ammonia oxidizing archaea (AOA) and ammonia oxidizing bacteria (AOB) changed in response to N and/or K fertilization, but the denitrifying genes narG, nirK and norl were strongly correlated with N₂O emissions. This suggests that N or K fertilizer and their interaction affect N₂O emissions mainly by altering the abundance of functional genes of denitrifying microbes in the soil-plant system. The genera Paracoccus, Rubrivivax and Geobacter as well as Streptomyces and Hyphomicrobium play an important role in N₂O emissions during denitrification with the combined application of N and K.
Amplification-free profiling of microRNA-122 biomarker in DILI patient serums, using the luminex MAGPIX system
2020, Talanta
Citation Excerpt :
Consequently, this has limited investment into novel miRNA based biomarker assays. Conventional methods for miRNA profiling include technologies such as: reverse transcription-quantitative polymerase chain reaction (RT-qPCR) [19], miRNA profiling microarrays [20], next-generation sequencing (NGS) [21] and surface plasmon resonance (SPR) biosensors [22]. Each of these methods have unique strengths and weaknesses, but none are able to deliver the reliability required in clinical settings.
CBS-miRSeq: A comprehensive tool for accurate and extensive analyses of microRNA-sequencing data
2019, Computers in Biology and Medicine
Citation Excerpt :
Both tests found 98 miRNAs with a nominal significance p-value < 0.05 (Table S7) on which 18 of them were significantly DE at FDR < 0.05 and log2FC ≥ |1.5| (Fig. 4). CBS-miRSeq confirmed all five biomarkers (i.e. miR-181a-2-3p overexpressed and miR-628–5p, miR-744–5p, miR-1249 and miR-3612 decreased in patients with unfavorable outcome) reported in the original study [23]. In addition, we identified 13 other DE miRNAs significantly up-regulated (miR-181a-5p, miR-675–3p, miR-99a-5p, miR-325, miR-181b-5p, miR-551b-3p, miR-1179, miR-3648, and miR-575, in order of significance) and down-regulated (miR-1912, miR-196a-5p, miR-204–5p, and miR-149–5p) in unfavorable human neuroblastoma tumor samples (Fig. 4).
Several online and local tools have been developed to analyze microRNA-sequencing (miRNA-Seq) data, but usually they are limited by many factors including: inaccurate processing, lack of optimal parameterization, outdated references plus annotations, restrictions in uploading large datasets, and shortage of biological inferences.
In this work, we have developed a fully customized bioinformatics analysis pipeline (Color and Base-Space miRNA-Seq – CBS-miRSeq) for the seamless processing of short-reads miRNA-Seq data. The pipeline has been designed using Bash, Perl, and R scripts. CBS-miRSeq includes modules for read pre- and post-processing (quality assessment, filtering, adapter trimming and mapping) and different types of downstream analyses (identification of miRNA variants (isomiRs), novel miRNA prediction, miRNA:mRNA interaction target prediction, robust differential miRNA analysis, and target gene functional analysis). In this manuscript, we show that re-analysis of two published datasets using the CBS-miRSeq pipeline leads to better performance and efficiency in terms of their pipelines set and biomarker discovery between two biological conditions.
Characterization of conserved and novel miRNAs using deep sequencing and prediction of miRNA targets in Crucian carp (Carassius auratus)
2017, Gene
Citation Excerpt :
Up to now, 28,645 mature miRNAs from 223 species have been discovered and deposited in the public available miRNA database miRBase (http://www.mirbase.org/, Release 21, 2014). Recently, deep sequencing technologies has become the standard approach to identify miRNAs in organisms for which small RNAs have not been characterized or novel miRNAs that might not be detected using traditional methods (Li et al., 2011; Li et al., 2016; Rahmann et al., 2013; Sha et al., 2014). This powerful strategy provides insight into the identification of conserved miRNAs as well as low-abundance or species-specific (novel) miRNAs, especially for animal species whose genomes have not been fully sequenced.
MicroRNAs (miRNAs) are a class of endogenous small non-coding RNAs of − 22 nucleotides that can base pair with their target mRNAs, which represses their translation or induces their degradation in various biological processes. However, little is known about identification of miRNAs and their target genes in C. auratus. In the present study, a small RNA library from pooled tissue of C. auratus was constructed and sequenced using the deep sequencing. A total of 320 conserved miRNAs (belonging to 105 families) as well as 11 potentially novel miRNAs were identified. Stem-loop qRT-PCR analysis confirmed that both conserved and novel miRNAs were expressed in C. auratus, and some of them were preferentially expressed in certain tissues. Subsequently, a total of 1668 potential target genes were predicted for these identified miRNAs and GO analysis showed that most of the targets were involved in lots of physiological actions. This study represents a first large-scale identification and characterization of C. auratus miRNAs and their potential target genes. Taken together, our results add new information to existing data on C. auratus miRNAs and should be useful for investigating the biological functions of miRNAs in fishes and other aquatic species.
Machine Learning under Resource Constraints
2022, Machine Learning under Resource Constraints

View all citing articles on Scopus

View full text

Identifying transcriptional miRNA biomarkers by integrating high-throughput sequencing and real-time PCR data

Abstract

Highlights

Introduction

Section snippets

Datasets

Fundamental challenges

Automated miRNA expression analysis

Discussion and conclusion

Acknowledgments

Cell

J. Discrete Algorithms

Nat. Cell Biol.

Annu. Rev. Med.

Nucleic Acids Res.

Nucleic Acids Res.

Bioinformatics

EMBnet.journal

Bioinformatics

Bioinformatics

Bioinformatics