Elsevier

Methods

Volume 59, Issue 1, January 2013, Pages 154-163
Methods

Identifying transcriptional miRNA biomarkers by integrating high-throughput sequencing and real-time PCR data

https://doi.org/10.1016/j.ymeth.2012.10.005Get rights and content

Abstract

Using both high-throughput sequencing and real-time PCR, the miRNA transcriptome can be analyzed in complementary ways. We describe the necessary bioinformatics pipeline, including software tools, and key methodological steps in the process, such as adapter removal, read mapping, normalization, and multiple testing issues for biomarker identification. The methods are exemplified by the analysis of five favorable (event-free survival) vs. five unfavorable (died of disease) neuroblastoma tumor samples with a total of over 188 million reads.

Highlights

► Reproducible analysis workflow for miRNA expression by high-throughput sequencing. ► How to address the challenges raised by mapping short miRNA reads. ► Special treatment for dinucleotide color space mapping. ► Biomarker candidate discovery by robust normalization and differential expression analysis. ► Validation with RT-qPCR Cq-values.

Introduction

Current high-throughput sequencing (HTS) technologies offer the opportunity to characterize the genomic, epigenomic and transcriptomic state of a tumor. Here we focus on the bioinformatic methodology of characterizing the microRNA (miRNA) transcriptome of a sample. Functional miRNAs regulate the translation and cleavage of mRNAs by sequence-specific interaction with the 3′ UTR, reviewed in [1]. MiRNAs are involved in the regulation of many physiological processes, including differentiation, development and apoptosis [2]. In cancer, miRNAs may exert oncogenic function by inhibiting tumor suppressor genes or may act as tumor suppressors by inhibiting oncogenes [3]. The goal is to identify putative biomarkers, i.e., miRNAs that are differentially expressed between tumor and surrounding tissue, or between low-risk and high-risk subtypes of the tumor. In a previous study, we found differential miRNA expression between favorable versus unfavorable neuroblastoma subtypes [4].

In this article, we discuss the fundamental challenges involved in estimating expression values from short RNA reads and describe the computational pipeline for obtaining a ranked list of differentially expressed miRNAs from the raw sequence reads. Expression levels of these biomarker candidates should be confirmed by RT-qPCR, and we discuss how logarithmic HTS expression values correspond to negative ΔCq values. To provide some guidance for other experiments, we illustrate each step with numerical examples from our previous neuroblastoma study [4].

Section snippets

Datasets

The dataset of our previous study consists of five low-risk patients (neuroblastoma stage 1, no MYCN amplification, event-free survival [EFS], labeled 552–556) and five high-risk patients (neuroblastoma stage 4 with MYCN amplification, died of disease [DoD], labeled 557–561). This dataset can be retrieved from the NCBI Sequence Read Archive [5] using Accession No. SRA009986.

In the form presented here, the pipeline expects color space [6] FASTA (.csfasta) and quality (.qual) files, which is the

Fundamental challenges

Before we discuss the steps of the pipeline, we highlight three fundamental difficulties that arise during the analysis of short (mi) RNA reads.

Automated miRNA expression analysis

Computing the expression profiles of all miRNAs from raw sequence reads involves several steps. In addition to a textual description, a well-documented and formalized workflow description is necessary to reproduce each single step. We use the workflow system snakemake [7] to describe the bioinformatics pipeline in a way that is both formal and human-readable and can be visualized in graphical form (cf. Section 4.1).

We then describe each key step in more detail. For each dataset (patient), the

Discussion and conclusion

As discussed in Section 3, there are several challenges present in the typical short reads of miRNA datasets that distinguish their analysis from that of typical RNA-seq experiments. Attention to detail in every analysis step, such as embedding the mature miRNA reference sequences into their adapter context when mapping in color space, or robust non-invasive normalization methods are crucial for obtaining accurate expression level estimates.

We provide an optimized automated pipeline, as

Acknowledgments

S.R., J.K., A.S. gratefully acknowledge funding by the DFG Collaborative Research Center (Sonderforschungsbereich, SFB) 876 “Providing Information by Resource-Constrained Data Analysis” within projects TB1 and C1 (http://sfb876.tu-dortmund.de).

References (24)

  • D.P. Bartel

    Cell

    (2004)
  • M.I. Abouelhoda et al.

    J. Discrete Algorithms

    (2004)
  • J. Winter et al.

    Nat. Cell Biol.

    (2009)
  • R. Garzon et al.

    Annu. Rev. Med.

    (2009)
  • J.H. Schulte et al.

    Nucleic Acids Res.

    (2010)
  • R. Leinonen et al.

    Nucleic Acids Res.

    (2011)
  • H. Breu, A Theoretical Understanding of 2 Base Color Codes and Its Application to Annotation, Error Detection, and...
  • J. Köster et al.

    Bioinformatics

    (2012)
  • M. Martin

    EMBnet.journal

    (2011)
  • H. Li et al.

    Bioinformatics

    (2009)
  • H. Li et al.

    Bioinformatics

    (2009)
  • A.R. Quinlan et al.

    Bioinformatics

    (2010)
  • Cited by (11)

    • K fertilizer alleviates N<inf>2</inf>O emissions by regulating the abundance of nitrifying and denitrifying microbial communities in the soil-plant system

      2021, Journal of Environmental Management
      Citation Excerpt :

      However, few studies focused on the effects of K on soil N2O emission and its associated mechanism in a cotton field. Recent advances in high-throughput sequencing methods (Rahmann et al., 2013) now allow us to examine the effects of K fertilization on the key functional genes, enzymes and microbial communities related to soil N2O production in the soil–cotton system. In this study, we investigated the community structures and abundance of functional genes of nitrifying (amoA in AOA and AOB) and denitrifying microbes (narG, narH, narl, nirK, nirS, norB and nosZ), as well as their relationship with soil properties and N2O flux, and investigated their response to K and combined N and K additions in cotton soils.

    • Amplification-free profiling of microRNA-122 biomarker in DILI patient serums, using the luminex MAGPIX system

      2020, Talanta
      Citation Excerpt :

      Consequently, this has limited investment into novel miRNA based biomarker assays. Conventional methods for miRNA profiling include technologies such as: reverse transcription-quantitative polymerase chain reaction (RT-qPCR) [19], miRNA profiling microarrays [20], next-generation sequencing (NGS) [21] and surface plasmon resonance (SPR) biosensors [22]. Each of these methods have unique strengths and weaknesses, but none are able to deliver the reliability required in clinical settings.

    • CBS-miRSeq: A comprehensive tool for accurate and extensive analyses of microRNA-sequencing data

      2019, Computers in Biology and Medicine
      Citation Excerpt :

      Both tests found 98 miRNAs with a nominal significance p-value < 0.05 (Table S7) on which 18 of them were significantly DE at FDR < 0.05 and log2FC ≥ |1.5| (Fig. 4). CBS-miRSeq confirmed all five biomarkers (i.e. miR-181a-2-3p overexpressed and miR-628–5p, miR-744–5p, miR-1249 and miR-3612 decreased in patients with unfavorable outcome) reported in the original study [23]. In addition, we identified 13 other DE miRNAs significantly up-regulated (miR-181a-5p, miR-675–3p, miR-99a-5p, miR-325, miR-181b-5p, miR-551b-3p, miR-1179, miR-3648, and miR-575, in order of significance) and down-regulated (miR-1912, miR-196a-5p, miR-204–5p, and miR-149–5p) in unfavorable human neuroblastoma tumor samples (Fig. 4).

    • Characterization of conserved and novel miRNAs using deep sequencing and prediction of miRNA targets in Crucian carp (Carassius auratus)

      2017, Gene
      Citation Excerpt :

      Up to now, 28,645 mature miRNAs from 223 species have been discovered and deposited in the public available miRNA database miRBase (http://www.mirbase.org/, Release 21, 2014). Recently, deep sequencing technologies has become the standard approach to identify miRNAs in organisms for which small RNAs have not been characterized or novel miRNAs that might not be detected using traditional methods (Li et al., 2011; Li et al., 2016; Rahmann et al., 2013; Sha et al., 2014). This powerful strategy provides insight into the identification of conserved miRNAs as well as low-abundance or species-specific (novel) miRNAs, especially for animal species whose genomes have not been fully sequenced.

    • Machine Learning under Resource Constraints

      2022, Machine Learning under Resource Constraints
    View all citing articles on Scopus
    View full text