Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals

Abstract

There is growing recognition that mammalian cells produce many thousands of large intergenic transcripts1,2,3,4. However, the functional significance of these transcripts has been particularly controversial. Although there are some well-characterized examples, most (>95%) show little evidence of evolutionary conservation and have been suggested to represent transcriptional noise5,6. Here we report a new approach to identifying large non-coding RNAs using chromatin-state maps to discover discrete transcriptional units intervening known protein-coding loci. Our approach identified 1,600 large multi-exonic RNAs across four mouse cell types. In sharp contrast to previous collections, these large intervening non-coding RNAs (lincRNAs) show strong purifying selection in their genomic loci, exonic sequences and promoter regions, with greater than 95% showing clear evolutionary conservation. We also developed a functional genomics approach that assigns putative functions to each lincRNA, demonstrating a diverse range of roles for lincRNAs in processes from embryonic stem cell pluripotency to cell proliferation. We obtained independent functional validation for the predictions for over 100 lincRNAs, using cell-based assays. In particular, we demonstrate that specific lincRNAs are transcriptionally regulated by key transcription factors in these processes such as p53, NFκB, Sox2, Oct4 (also known as Pou5f1) and Nanog. Together, these results define a unique collection of functional lincRNAs that are highly conserved and implicated in diverse biological processes.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Intergenic K4–K36 domains produce multi-exonic RNAs.
Figure 2: lincRNA K4–K36 domains do not encode proteins and are conserved in their exons and promoters.
Figure 3: lincRNAs show strong associations with other lincRNAs and with several biological processes.

Similar content being viewed by others

Accession codes

Primary accessions

Gene Expression Omnibus

Data deposits

Microarray data have been deposited in the Gene Expression Omnibus (GEO) under accession number GSE13765.

References

  1. Bertone, P. et al. Global identification of human transcribed sequences with genome tiling arrays. Science 306, 2242–2246 (2004)

    Article  ADS  CAS  Google Scholar 

  2. Carninci, P. et al. The transcriptional landscape of the mammalian genome. Science 309, 1559–1563 (2005)

    Article  ADS  CAS  Google Scholar 

  3. Kapranov, P. et al. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296, 916–919 (2002)

    Article  ADS  CAS  Google Scholar 

  4. Rinn, J. L. et al. The transcriptional activity of human chromosome 22. Genes Dev. 17, 529–540 (2003)

    Article  CAS  Google Scholar 

  5. Ponjavic, J., Ponting, C. P. & Lunter, G. Functionality or transcriptional noise? Evidence for selection within long noncoding RNAs. Genome Res. 17, 556–565 (2007)

    Article  CAS  Google Scholar 

  6. Struhl, K. Transcriptional noise and the fidelity of initiation by RNA polymerase II. Nature Struct. Mol. Biol. 14, 103–105 (2007)

    Article  CAS  Google Scholar 

  7. Brannan, C. I., Dees, E. C., Ingram, R. S. & Tilghman, S. M. The product of the H19 gene may function as an RNA. Mol. Cell. Biol. 10, 28–36 (1990)

    Article  CAS  Google Scholar 

  8. Brown, C. J. et al. A gene from the region of the human X inactivation centre is expressed exclusively from the inactive X chromosome. Nature 349, 38–44 (1991)

    Article  ADS  CAS  Google Scholar 

  9. Lee, J. T., Davidow, L. S. & Warshawsky, D. Tsix, a gene antisense to Xist at the X-inactivation centre. Nature Genet. 21, 400–404 (1999)

    Article  CAS  Google Scholar 

  10. Sotomaru, Y. et al. Unregulated expression of the imprinted genes H19 and Igf2r in mouse uniparental fetuses. J. Biol. Chem. 277, 12474–12478 (2002)

    Article  CAS  Google Scholar 

  11. Rinn, J. L. et al. Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 129, 1311–1323 (2007)

    Article  CAS  Google Scholar 

  12. Willingham, A. T. et al. A strategy for probing the function of noncoding RNAs finds a repressor of NFAT. Science 309, 1570–1573 (2005)

    Article  ADS  CAS  Google Scholar 

  13. Wang, J. et al. Mouse transcriptome: neutral evolution of ‘non-coding’ complementary DNAs. Nature 431 1–2 10.1038/nature03016 (2004)

    Article  CAS  PubMed  Google Scholar 

  14. Mikkelsen, T. S. et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553–560 (2007)

    Article  ADS  CAS  Google Scholar 

  15. Griffiths-Jones, S., Grocock, R. J., van Dongen, S., Bateman, A. & Enright, A. J. miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res. 34, D140–D144 (2006)

    Article  CAS  Google Scholar 

  16. Tam, O. H. et al. Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytes. Nature 453, 534–538 (2008)

    Article  ADS  CAS  Google Scholar 

  17. Watanabe, T. et al. Endogenous siRNAs from naturally formed dsRNAs regulate transcripts in mouse oocytes. Nature 453, 539–543 (2008)

    Article  ADS  CAS  Google Scholar 

  18. Clamp, M. et al. Distinguishing protein-coding and noncoding genes in the human genome. Proc. Natl Acad. Sci. USA 104, 19428–19433 (2007)

    Article  ADS  CAS  Google Scholar 

  19. Lin, M. F. et al. Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes. Genome Res. 17, 1823–1836 (2007)

    Article  CAS  Google Scholar 

  20. Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005)

    Article  CAS  Google Scholar 

  21. Carninci, P. et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nature Genet. 38, 626–635 (2006)

    Article  CAS  Google Scholar 

  22. Su, A. I. et al. Large-scale analysis of the human and mouse transcriptomes. Proc. Natl Acad. Sci. USA 99, 4465–4470 (2002)

    Article  ADS  CAS  Google Scholar 

  23. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005)

    Article  ADS  CAS  Google Scholar 

  24. Tanay, A., Sharan, R. & Shamir, R. Discovering statistically significant biclusters in gene expression data. Bioinformatics 18 (Suppl 1). S136–S144 (2002)

    Article  Google Scholar 

  25. Chang, H. Y. et al. Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. Proc. Natl Acad. Sci. USA 102, 3738–3743 (2005)

    Article  ADS  CAS  Google Scholar 

  26. Carrio, M., Arderiu, G., Myers, C. & Boudreau, N. J. Homeobox D10 induces phenotypic reversion of breast tumor cells in a three-dimensional culture model. Cancer Res. 65, 7177–7185 (2005)

    Article  CAS  Google Scholar 

  27. Ventura, A. et al. Cre-lox-regulated conditional RNA interference from transgenes. Proc. Natl Acad. Sci. USA 101, 10380–10385 (2004)

    Article  ADS  CAS  Google Scholar 

  28. Loh, Y. H. et al. The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells. Nature Genet. 38, 431–440 (2006)

    Article  CAS  Google Scholar 

  29. Ivanova, N. et al. Dissecting self-renewal in stem cells with RNA interference. Nature 442, 533–538 (2006)

    Article  ADS  CAS  Google Scholar 

  30. Zhao, J., Sun, B. K., Erwin, J. A., Song, J. J. & Lee, J. T. Polycomb proteins targeted by a short repeat RNA to the mouse X chromosome. Science 322, 750–756 (2008)

    Article  ADS  CAS  Google Scholar 

Download references

Acknowledgements

We would like to thank our colleagues at the Broad Institute, especially J. P. Mesirov for discussions and statistical insights, X. Xie for statistical help with conservation analyses, J. Robinson for visualization help, M. Ku, E. Mendenhall and X. Zhang for help generating ChIP samples, and N. Novershtern and A. Levy for providing transcription factor lists. M. Guttman is a Vertex scholar, I.A. acknowledges the support of the Human Frontier Science Program Organization. This work was funded by Beth Israel Deaconess Medical Center, National Human Genome Research Institute, and the Broad Institute of MIT and Harvard.

Author Contributions J.L.R., E.S.L., A.R. and M. Guttman conceived and designed experiments. The manuscript was written by M. Guttman, A.R., J.L.R. and E.S.L. J.L.R., I.A., C.F., D.F., M.H., B.W.C., J.P.C. and M. Guttman performed molecular biology experiments. All data analyses were performed by M. Guttman in conjunction with M. Garber (conservation analyses), M.F.L. (codon substitution frequency), T.S.M. (ChlP-seq data), O.Z. (motif analysis) and M.N.C. (lincRNA genomic location analysis). Reagents were provided by M. Garber (pre-published conservation analysis tools); T.J. and D.F. (p53 wild-type and knockout MEFs); N.H., A.R. and I.A. (dendritic cell stimulated time course); B.E.B. (ChlP data); R.J., B.W.C. and J.P.C. (luciferase assays); and M.K. and M.F.L. (codon substitution frequency code).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John L. Rinn.

Supplementary information

Supplementary Figures

This file contains Supplementary Figures 1-11 with Legends (PDF 2081 kb)

Supplementary Information

This file contains Supplementary Methods and Supplementary References (PDF 147 kb)

Supplementary Table 1

In Supplementary Table 1 the K4-K36 domain coordinates are shown and the K4-K36 enriched domains in the 4 mouse cell types are listed. Coordinates are indicated in mouse genome build MM8. (XLS 107 kb)

Supplementary Table 2

In Supplementary Table 2 the lincRNA Exon Coordinates and Pi LOD Enrichment Score are shown. lincRNA exons defined by Nimbelegen tiling micorarrays are listed in mouse genome build MM9. Each exon has an associated Pi LOD Enrichment Score (Methods) reported. (XLS 174 kb)

Supplementary Table 3

In Supplementary Table 3 the characteristic properties of lincRNAs are shown. (DOC 36 kb)

Supplementary Table 4

In Supplementary Table 4 the PCR validation primer sequences are shown. Primer sequences used for validation of lincRNA expression by PCR and qPCR are reported. (XLS 31 kb)

Supplementary Table 5

In Supplementary Table 5 the Northern blot analysis probe sequences and primers are shown. Primers and amplicons for Northern blot analyses are provided. The correct file for Supplementary Table 5 was uploaded on 4th March, 2009. (XLS 27 kb)

Supplementary Table 6

In Supplementary Table 6 the Codon Substitution Frequency (CSF) Scores are shown. The CSF score for each K4-K36 domain is provided. Coordinates are reported in mouse genome build MM9. An updated version for Suplementary Table 6 was uploaded on 4th March, 2009 (XLS 122 kb)

Supplementary Table 7

In Supplementary Table 7 the Exon conservation for lincRNAs and other annotations are shown. Pi LOD Enrichment scores are provided for lincRNA exons and other annotations compared in the text. The coordinates are provided in Mouse genome MM9 and the max 12-mer LOD score as well as the randomized average max 12-mer LOD score is indicated along with the enrichment score. (XLS 836 kb)

Supplementary Table 8

In Supplementary Table 8 the lincRNA Promoter Conservation is shown. Pi LOD Enrichment scores are provided for each lincRNA promoter region, protein coding gene promoters, and random intergenic regions. Coordinates are provided in Mouse genome build MM9. (XLS 634 kb)

Supplementary Table 9

In Supplementary Table 9 the Human and Mouse orthologous lincRNAs are shown. lincRNAs defined in Human Lung Fibroblasts were lifted into the mouse genome (MM8) and enrichment statistics were computed for Mouse Lung Fibroblasts (Methods). The enrichment p-values and fold are indicated. (XLS 28 kb)

Supplementary Table 10

In Supplementary Table 10 the lincRNA expression across mouse tissue compendium is shown. lincRNA expression levels across various mouse cell types, tissues, and conditions are provided. The values are log values of the relative expression of each lincRNA. (XLS 420 kb)

Supplementary Table 11

In Supplementary Table 11 the Gene Set Enrichment Analysis (GSEA) association matrix is shown. Functional associations between lincRNAs (columns) and MSigDB terms (rows) are indicated. Positive association is indicated by a 1, negative association is indicated by an -1, and no association is indicated by a 0. (TXT 6203 kb)

Supplementary Table 12

In Supplementary Table 12 the P53 regulated lincRNAs upon DNA Damage Induction are shown. lincRNAs that temporally increase inP53 wild-type cells compared with P53 Knock-out cells upon stimulation with DNA damage are indicated along with their expression levels across the DNA damage time course. (XLS 26 kb)

Supplementary Table 13

In Supplementary Table 13 the P53 Motif Enrichments in induced lincRNAs are shown. P53 motif scores are provided for each lincRNA promoter along with the sequence of the best motif hit and its conservation. P53 induced lincRNAs are indicated in the last column. (XLS 347 kb)

Supplementary Table 14

In Supplementary Table 14 the NFKB regulated lincRNAs are shown. lincRNAs that are differentially expressed in TLR4 stimulation of BMDC cells compared with unstimulated BMDC cells are provided. (XLS 23 kb)

Supplementary Table 15

In Supplementary Table 15 the ES cells lincRNAs bound by Oct4 and/or Nanog are shown: The coordinates of the lincRNAs bound by Oct4/Nanog in ES cells is provided. (XLS 17 kb)

Supplementary Table 16

In Supplementary Table 16 the functional association of lincENC1 is shown. GSEA results for lincENC1 is provided for both profiled exons in the transcript. (XLS 23 kb)

Supplementary Table 17

In Supplementary Table 17the Enrichment of Gene Ontology (GO) terms for lincRNA neighbors is shown. Significant GO terms (FDR<.05) are indicated along with their associated p-values. (XLS 22 kb)

PowerPoint slides

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guttman, M., Amit, I., Garber, M. et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223–227 (2009). https://doi.org/10.1038/nature07672

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nature07672

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing