Trends in Genetics
Volume 24, Issue 7, July 2008, Pages 344-352
Journal home page for Trends in Genetics

Review
Tuning in to the signals: noncoding sequence conservation in vertebrate genomes

https://doi.org/10.1016/j.tig.2008.04.005Get rights and content

Aligning and comparing genomic sequences enables the identification of conserved sequence signatures and can enrich for coding and noncoding functional regions. In vertebrates, the comparison of human and rodent genomes and the comparison of evolutionarily distant genomes, such as human and pufferfish, have identified specific sets of ‘ultraconserved’ sequence elements associated with the control of early development. However, is this just the tip of a ‘conservation iceberg’ or do these sequences represent a specific class of regulatory element? Studies on the zebrafish phox2b gene region and the ENCODE project suggest that many regulatory elements are not highly conserved, posing intriguing questions about the relationship between noncoding sequence conservation and function and the evolution of regulatory sequences.

Section snippets

Plundering the junk in our genome

Our understanding of the functional elements within the human genome was until recently largely restricted to protein-coding sequences. However, this fraction represents just a few percent of the total amount of DNA, fuelling a fierce debate as to the extent of the function of the remainder (Box 1). Now that we have access not only to the sequence of the human genome, but to the genomes of many other vertebrates with which to compare it, we can begin to explore the remaining 98%. However, apart

Lessons from the ENCODE project

Recently, more sophisticated algorithms have been developed to profile multiple sequence alignments, primarily across mammals; these include PhastCons [33], BinCons [24] and GERP [34]. Although distinct from one another, they all use some form of phylogenetic modelling and compare local rates of evolution (substitutions), either at individual base positions or across small windows, with calculated neutral rates across the length of the alignment for each species. These methods were used to

‘Raising the bar’ – increasing the stringency, decreasing the numbers

The alternative to casting a wide net to try to capture all sequences is to target a much smaller set of sequences and aim to determine their function more thoroughly. In this way, one can design different, more intensive ways of understanding functional mechanisms and ultimately build paradigms that can be applied back to the genome as a whole.

Two different approaches have been used to identify a more focused subset of conserved sequences and some attempts have been made to characterize these

Conserved noncoding elements in invertebrate genomes

Although CNEs and noncoding UCEs are among the most highly conserved sequences in vertebrate genomes, they have no apparent orthologous sequences in invertebrate genomes 36, 38, 40. However, whole genome comparisons of invertebrate genomes have revealed that they also contain numerous highly conserved noncoding elements 33, 40, 41. The phenomenon of ultraconservation of noncoding DNA elements is therefore not a vertebrate novelty but a feature that might be common among all metazoan genomes.

In silico identification of motifs and transcription factor binding sites

The prediction of sequence signatures or language within different sets of conserved noncoding sequences is difficult because we have little knowledge of their subsequence composition (this assumes, given lengths of hundreds of bases, that each conserved sequence is not a single functional unit) and because for the majority, function is unknown. Nevertheless, recent studies have demonstrated that using limited functional information, such as the tissues in which a conserved element can act as a

The evolution of conserved noncoding sequences

When did functional noncoding sequences first appear and have they changed throughout evolution? Tracing the evolutionary ancestry of noncoding sequences has presented several challenges. Unlike the comparison of coding sequences, where the patterns of substitutions and insertions/deletions (indels) can be weighted according to their overall effect on the protein-coding product, the comparison of noncoding sequences generally lacks proven informative grammar. To overcome this, Kim and Pritchard

What is the relationship between sequence conservation and function?

Although homology-based searching has proved a powerful approach, it has led to an overemphasis on the importance of sequence conservation in defining functional elements in the genome. Doubtless, certain types of noncoding sequence, for example UCEs and CNEs, exhibit very high levels of identity. By contrast, other noncoding regions, such as promoters, are poorly conserved and yet clearly have important functional roles. A recent study deleted four different UCEs from mice without any apparent

Concluding remarks: learning the language(s)

Clearly we have a lot to learn about the nature of conserved noncoding elements in the genome. Their identification has provided a starting point for the identification of functional elements in vertebrate genomes. This has proved a robust and in some cases, highly successful, strategy. However, two common misconceptions arise from this somewhat adventitious approach. First, not all functional elements can be captured by this means, not least because sequence conservation has low resolving

Acknowledgements

T.V. is funded by a Marie Currie Intra-European Fellowship for Career Development.

References (63)

  • E. Birney

    Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project

    Nature

    (2007)
  • D.L. Gumucio

    Phylogenetic footprinting reveals a nuclear protein which binds to silencer sequences in the human gamma and epsilon globin genes

    Mol. Cell. Biol.

    (1992)
  • R.C. Hardison

    Long human-mouse sequence alignments reveal novel regulatory elements: a reason to sequence the mouse genome

    Genome Res.

    (1997)
  • J.C. Oeltjen

    Large-scale comparative sequence analysis of the human and murine Bruton's tyrosine kinase loci reveals conserved regulatory domains

    Genome Res.

    (1997)
  • M.A. Ansari-Lari

    Comparative sequence analysis of a gene-rich cluster at human chromosome 12p13 and its syntenic region in mouse chromosome 6

    Genome Res.

    (1998)
  • W. Jang

    Comparative sequence of human and mouse BAC clones from the mnd2 region of chromosome 2p13

    Genome Res.

    (1999)
  • G.G. Loots

    Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons

    Science

    (2000)
  • R.E. Ellsworth

    Comparative genomic sequence analysis of the human and mouse cystic fibrosis transmembrane conductance regulator genes

    Proc. Natl. Acad. Sci. U. S. A.

    (2000)
  • A.M. Mallon

    Comparative genome sequence analysis of the Bpa/Str region in mouse and Man

    Genome Res.

    (2000)
  • P. Dehal

    Human chromosome 19 and related regions in mouse: conservative and lineage-specific evolution

    Science

    (2001)
  • L.A. Pennacchio et al.

    Genomic strategies to identify mammalian regulatory sequences

    Nat. Rev. Genet.

    (2001)
  • U. DeSilva

    Generation and comparative analysis of approximately 3.3 Mb of mouse genomic sequence orthologous to the region of human chromosome 7q11.23 implicated in Williams syndrome

    Genome Res.

    (2002)
  • A. Toyoda

    Comparative genomic sequence analysis of the human chromosome 21 Down syndrome critical region

    Genome Res.

    (2002)
  • R.H. Waterston

    Initial sequencing and comparative analysis of the mouse genome

    Nature

    (2002)
  • I. Dubchak

    Active conservation of noncoding sequences revealed by three-way species comparisons

    Genome Res.

    (2000)
  • E.H. Margulies

    Identification and characterization of multi-species conserved sequences

    Genome Res.

    (2003)
  • J.W. Thomas

    Comparative analyses of multi-species sequences from targeted genomic regions

    Nature

    (2003)
  • S. Brenner

    Characterization of the pufferfish (Fugu) genome as a compact model vertebrate genome

    Nature

    (1993)
  • S. Aparicio

    Detecting conserved regulatory elements with the model genome of the Japanese puffer fish, Fugu rubripes

    Proc. Natl. Acad. Sci. U. S. A.

    (1995)
  • M.A. Nobrega

    Scanning human gene deserts for long-range enhancers

    Science

    (2003)
  • E.S. Lander

    Initial sequencing and analysis of the human genome

    Nature

    (2001)
  • Cited by (0)

    View full text