«Annu. Rev. Genet. 2004. 38:525–52 doi: 10.1146/annurev.genet.38.072902.091216 Copyright c 2004 by Annual Reviews. All rights reserved First ...»
Annu. Rev. Genet. 2004. 38:525–52
Copyright c 2004 by Annual Reviews. All rights reserved
First published online as a Review in Advance on July 14, 2004
METAGENOMICS: Genomic Analysis
of Microbial Communities
Christian S. Riesenfeld,1,2 Patrick D. Schloss,1
and Jo Handelsman1,2
Department of Plant Pathology,1 Microbiology Doctoral Training Program,2 University of
Wisconsin-Madison, Madison, Wisconsin 53706; email: email@example.com
Key Words microbial ecology, environmental genomics, community genomics, culture-independent, and unculturable bacteria I
Uncultured microorganisms comprise the majority of the planet’s biological diversity. Microorganisms represent two of the three domains of life and contain vast diversity that is the product of an estimated 3.8 billion years of evolution. In many environments, as many as 99% of the microorganisms cannot be cultured by standard techniques, and the uncultured fraction includes diverse organisms that are only distantly related to the cultured ones. Therefore, culture-independent methods are essential to understand the genetic diversity, population structure, and ecological roles of the majority of microorganisms. Metagenomics, or the culture-independent genomic analysis of an assemblage of microorganisms, has potential to answer fundamental questions in microbial ecology. This review describes progress toward understanding the biology of uncultured Bacteria, Archaea, and viruses through metagenomic analyses.
CONTENTS INTRODUCTION..................................................... 526 METAGENOMICS DEFINED........................................... 527 LINKING PHYLOGENY AND FUNCTION WITHIN SPECIES............... 529 Phylogenetic Anchors................................................ 529 Function Then Phylogeny............................................. 529 Phylogeny Then Function............................................. 533 Acidobacterium Phylogeny and Function................................. 533 Archaeal Phylogeny and Function....................................
INTRODUCTIONObtaining bacteria in pure culture is typically the ﬁrst step in investigating bacterial processes. However, standard culturing techniques account for 1% or less of the bacterial diversity in most environmental samples (2). Although some signiﬁcant breakthroughs have resulted from recent attempts to culture the as-yet-unculturable bacteria (56, 89, 99, 127), a suite of culture-independent techniques are needed to complement efforts to culture the thousands or millions of unknown species in the environment.
A new era of microbial ecology was initiated when sequencing of ribosomal RNAs and the genes encoding them was introduced to describe uncultured bacteria in the environment. The ﬁrst approach was to sequence clones from a 5S rRNA cDNA library derived from the symbiotic community within the tubeworm Riftia pachyptila (109). Variations of this method generated a set of culture-independent techniques to (a) reconstruct phylogenies, (b) compare microbial distributions among samples using either nucleotide sequence or restriction fragment length polymorphisms (RFLPs), and (c) quantify the relative abundance of each taxonomic group using membrane hybridization or ﬂuorescent in situ hybridization (2, 47, 57, 78–80).
The most startling result of the many microbial diversity studies that have employed 16S rRNA culture-independent methods is the richness of the uncultured microbial world. As of April 1, 2004, GenBank contained 21,466 16S rRNA genes from cultured prokaryotes and 54,655 from uncultured prokaryotes, according to the search terms described by Rapp´ & Giovannoni (90), and many of those from e uncultured organisms afﬁliate with phyla that contain no cultured members. When Woese (121) originally proposed a 16S rRNA-based phylogeny, 12 bacterial phyla were recognized, each with cultured representatives. Since then, 14 additional phyla with cultured representatives have been identiﬁed. In addition, 16S rRNA gene sequence analysis suggests 26 candidate phyla that have no known cultured representatives (90). Therefore, half of the known microbial phyla have no cultured representatives.
Among the phyla that contain cultured members, a few contain many isolates and the rest contain too few to represent the full spectrum of diversity in the phylum. For example, Hugenholtz (53) found that 97% of prokaryotes deposited in the Australian Culture of Microorganisms in 2001 were members of just four phyla: the Proteobacteria (54%), Actinobacteria (23%), Firmicutes (14%), and
METAGENOMICSBacteroidetes (6%). Within GenBank, 76% of the 16S rRNA gene sequences of cultured prokaryotes are from these four groups. But other phyla may be more diverse, prevalent, and ecologically consequential in the environment. 16S rRNA gene sequences from the Acidobacterium phylum are among the most abundant in clone libraries obtained from soil and have been found in all soils examined, suggesting that the Acidobacteria play important roles in soil ecosystems. However, of the 684 Acidobacterium 16S rRNA gene sequences in GenBank, only 19 (2.8%) are from cultured isolates, providing an inadequate collection to describe the physiological diversity of the phylum. Other than 16S rRNA gene sequences, little is known about the bacteria within the 22 poorly cultured phyla and 26 candidate phyla. Many terms, such as unculturable, uncultivated, as yet uncultured, and not yet cultured, are used to refer to microorganisms that we know of only through culture-independent means. In this review, we refer to them as uncultured.
Describing the phylogenetic diversity of uncultured microorganisms is only the ﬁrst step. A greater challenge is to assign ecological roles to them. The uncultured microbiota must play pivotal roles in natural environmental processes and are a large untapped resource for biotechnology applications. Exploiting the rich microbial biodiversity for enzyme and natural product discovery is an active research area that has been reviewed elsewhere (39, 45, 46, 65, 66, 77, 97, 104). This review discusses the application of culture-independent genomics-based approaches to understand the genetic diversity, population structure, and ecology of complex microbial assemblages (26, 93, 94).
METAGENOMICS DEFINED“Metagenomics” describes the functional and sequence-based analysis of the collective microbial genomes contained in an environmental sample (Figure 1) (45).
Other terms have been used to describe the same method, including environmental DNA libraries (110), zoolibraries (55), soil DNA libraries (68), eDNA libraries (13), recombinant environmental libraries (22), whole genome treasures (77), community genome (114), whole genome shotgun sequencing (115), and probably others. In this review, we use metagenomics to describe work that has been presented with all of these names because it is the most commonly used term (15, 27, 35, 59–61, 65, 66, 82, 105, 107, 117, 118), was used for the title of the ﬁrst international conference on the topic (“Metagenomics 2003” held in Darmstadt, Germany), and is the focus of an upcoming issue of the journal Environmental Microbiology. The deﬁnition applied here excludes studies that use PCR to amplify gene cassettes (52) or random PCR primers to access genes of interest (17, 32), since these methods do not provide genomic information beyond the genes that are ampliﬁed. Many environments have been the focus of metagenomics, including soil, the oral cavity, feces, and aquatic habitats, as well as the hospital metagenome, a term intended to encompass the genetic potential of organisms in hospitals that conribute to public health concerns such as antibiotic resistance and nosocomial infections (20).
528 RIESENFELD SCHLOSS HANDELSMANFigure 1 Metagenomics involves constructing a DNA library from an environment’s microbial population and then analyzing the functions and sequences in the library.
The concept of cloning DNA directly from an environment was initially suggested by Pace (79) and ﬁrst implemented by Schmidt et al. (106), who constructed a λ phage library from a seawater sample and screened it for 16S rRNA genes.
Advances by the DeLong group in cloning DNA directly from seawater provided the landmark work that launched the ﬁeld (110). Development of metagenomic analyses of soil was slower than with seawater because of the technical challenges of cloning DNA from the complex matrix of soil, which contains many compounds that bind to DNA or inhibit the enzymatic reactions required for cloning. Signiﬁcant progress has been made, producing libraries that have substantially advanced understanding the functions in the soil community (96). The past eight years have witnessed an explosion of interest and activity in metagenomics, accompanied by advances in technology that have facilitated studies at a scale that was not feasible when the ﬁeld began. For example, the seminal paper in 1996 by Stein et al. (110) reported the sequencing and reconstruction of a 40-kb fragment from an uncultured marine archaeon, which was a major undertaking at the time. In 2004, Venter et al.
(115) reported their attempt to sequence the entire metagenome of the Sargasso Sea by obtaining over 1 million kb of nonredundant sequence. The advances in sequencing technology have expanded the approaches and questions that can be
METAGENOMICSconsidered with metagenomics, providing access to a staggering amount of genomic information. Metagenomic technology has been successful at all scales—it has been used to study single genes (e.g., cellulases, 48), pathways (e.g., antibiotic synthesis, 96), organisms (e.g., Archaea, 110), and communities (e.g., acid mine drainage bioﬁlm, 114). Approaches that involve massive sequencing to capture entire communities will likely become more common with further advances in sequencing technology.
LINKING PHYLOGENY AND FUNCTION
WITHIN SPECIESPhylogenetic Anchors The ﬁrst metagenomic studies aimed to link a function with its phylogenetic source, providing information about one species within a community. One of the challenges with this approach is to link a phenotype with the identity of the original host. Three approaches have been taken: Screen a metagenomic library for a phenotype and then attempt to determine the phylogenetic origin of the cloned DNA (Table 1), screen clones for a speciﬁc phylogenetic anchor (e.g., 16S rRNA) or gene and then sequence the entire clone and search for genes of interest among the genes ﬂanking the anchor (Table 2), or sequence the entire metagenome and identify interesting genes and phylogenetic anchors in the resulting reconstructed genomes (Table 3).
Function Then Phylogeny Diverse activities have been discovered by functional analysis of metagenomic libraries. New antibiotics (11–14, 36, 68, 96, 119, 120), hydrolytic and degradative enzymes (21, 48–50, 59, 60, 91, 96, 117), biosynthetic functions (31, 61), antibiotic resistance enzymes (22, 92), and membrane proteins (69) have been identiﬁed. The diversity of functionally active clones discovered in metagenomic libraries validates the use of functional screens as one means to characterize the libraries. Antimicrobial screens have revealed new antibiotics such as terragine (119), turbomycin A and B (36), and acyl tyrosines (13), as well as previously described antibiotics such as indirubin (68) and violacein (12). Most of these compounds are structurally based on common cell substituents, such as amino acids, and none requires more than a few genes for its synthesis. The goal of identifying new polyketide, macrolide, and peptide antibiotics (45) may require different methods. Enhancing expression of genes in metagenomic libraries may lead to discovery of a wider array of natural products. This will be accomplished by moving the libraries into alternative hosts, such as Streptomyces, which was the basis for discovery of terragine (119). Alternative hosts may enhance gene expression or provide starting materials that Escherichia coli does not contain. E. coli can be engineered to express a wider range of functions by introducing genes encoding new sigma factors, rare tRNAs, or functions required to synthesize starting materials TABLE 1 Metagenomics discovery based on functional screens
for antibiotic biosynthesis that are deﬁcient in E. coli. Alternatively, sequences that carry conserved regions of genes associated with antibiotic biosynthesis, such as the polyketide synthases and peptide synthetases, may be identiﬁed by sequencedbased screens that do not require heterologous gene expression. This approach successfully identiﬁed clones carrying a novel hybrid polyketide synthase-peptide synthetase gene cluster from an uncultured bacterial symbiont of a beetle (81).
Novel enzymes have been revealed in metagenomic libraries by screening clones directly for activity (49, 50, 96). Pigments have been identiﬁed by visual inspection (12, 36, 68). These methods require handling individual clones, usually in an array format. Because the frequency of active clones is low, high-throughput methods are essential for efﬁcient screening. Selection for the ability to grow on hydroxyl-butyrate as the sole carbon and nitrogen source provided a powerful selection for clones carrying new degradative enzymes (49), and selection for antibiotic resistance identiﬁed new antibiotic resistance determinants from soil (22,
92) and from oral ﬂora (27).