Microbiome research has rocketed in the last decade and with it, methodologies to characterize and measure it have also expanded. Microbiomes of everything from dirt to the human GI tract are characterized by advanced technology to sequence and characterize the microbial species present. This article will review the most common schemes of microbiome analyses and delve into the factors that influence the best technical approach to some common questions. Within, amplicon-based sequencing versus whole genome shotgun sequencing will be discussed, as will the various sequencing platforms available. Finally, data analysis approaches and computational programs will be reviewed. Microbiome research may be in its infancy, but the technologies that are available to study it are truly great and diverse.
Complex microbial communities inhabit environments ranging from the human body to the Earth’s soil. A collection of microbes inhabiting a certain environment, including bacteria, fungi, viruses, archaea, and protozoa, is referred to as a microbiota or microbiome. In humans, several microbiomes exist, including those of the skin, mouth, GI tract, lungs and reproductive tract. Microbiomes often include species of microbes that both benefit the host, for example, maintaining regular immune response [2], and hold the potential to harm it. Beneficial bacteria may provide nutrients from food otherwise indigestible to humans or prevent colonization by harmful bacteria. While microbiomes are the total biomolecular repertoire of a microbial community and their DNA, RNA, metabolites and immediate surroundings, metagenomes are the total genomic potential of a microbial community.
Basic research, epidemiology, clinical research, diagnostics and therapeutic studies are all exploring the depths of the human microbiome. The Human Microbiome Project (HMP), the results of which were published in 2012, brought the area of microbiome research into the forefront, and it has expanded abundantly since [3]. The HMP identified approximately 10,000 organisms, found that different body sites have unique microbial communities, and that race, gender, weight, age and ethnicity all influence the microbiome [3]. Since 2012, over seven thousand studies of microbiomes have included analyses of microbial communities of the mouth, skin, vagina, GI tract, and more. Epidemiological microbiome studies range from examining how the gut microbiome may affect fractures in older men [4] to how C-section delivery alters infant microbiome composition [5]. Clinical diagnostic applications of microbiome analysis are also being developed. For instance, the human microbiome has been called a biomarker for hepatocellular carcinoma [6] and studies have gone from correlating microbiome composition with inflammatory Bowel diseases to the development of microbial based therapies for it [7] or for immunotherapy-refractory melanoma patients [8]. While applications range from basic research to biomarkers of disease, the methods used to study microbiomes share many processes and methods. This review will describe the major methods of microbiome analysis, including the workflows utilized, sequencing techniques employed, and advanced computational models and software packages employed to analyze the vast amount of sequence data created. For more information about microbiome research tools, animal models, and machine learning approaches to data analysis, please refer to the recent review by Peña and Hanson [9].
Regarding metagenomic studies of the microbiome, multiple gut metagenomes were used to identify generalizable bacterial enterosignatures, which were dominated by Bacteroides, Firmicutes, Prevotella, Bifidobacterium, or Escherichia [10]. The data analysis revealed that the Bacteroides-associated enterosignatures is central to the resilience of gut microbiomes, while combinations with other enterosignatures are often complementary. In addition, the described model effectively identifies atypical gut microbiomes associated with changes in host health status.
In addition, meta-omics analysis, which included both metatranscriptomics and metabolomics, has been used to analyze samples from patients with irritable bowel syndrome (IBS) [11]. The results of the study have revealed the abundance of Bacteroides dorei and several metabolites, such as increased tyramine and decreased gentisate. Furthermore, multi-omics analysis detected the upregulation of enzymes involved in fructose metabolism. Moreover, diarrhea-predominant IBS was characterized by elevated bile acids, polyamines, fructose, mannose and polyol metabolism compared to constipation-predominant IBS.
Depending on the research question being asked, all or only a single component of a microbiome may be studied. If the goal is to determine what types of bacteria are present in a particular microbiome, then 16s rRNA sequencing is a very common technique that can identify the order of most of the bacteria in a sample, and the genus and species in particular circumstances. Shotgun metagenomics approaches also measure DNA sequences but utilize more targeted sequencing approaches. If the focus of the research question is instead upon mRNA expression levels in the microbiome, metatranscriptomics approaches are appropriate. Gene marker analysis such as 16S sequencing, shotgun metagenomics and Metatranscriptomics all employ next generation sequencing methods but utilize varying software to analyze the sequences. Instead, metaproteomic and metabolomic approaches utilize liquid chromatography and mass spectroscopy to measure proteins or metabolites, respectively. Metaproteomic methods allow the study of the proteins present in a microbiome, while metabolomics approaches study the presences of metabolites and metabolism of a microbiome. See Figure 1 for a pictorial diagram of these approaches [1]. In addition, other phage specific approaches may be used for studies directed toward elucidating the viral composition of a microbiome [12].
Two fundamental options for microbiome research are based on whether the research question is about structural aspects of the microbiome, such as what bacteria are present and in what abundance, or if the functional aspects of the microbiome are under study, such as what the community does. For structural microbiome studies, marker gene analysis (also known as amplicon-based analysis) is the standard approach. This targeted sequencing method includes 16S rRNA sequencing for bacteria [8, 13], 18S rRNA sequencing for eukaryotes such as fungi [14], and internal transcribed spacer (ITS) region sequencing, depending on which type of organisms are being studied in a microbiome sample. 16s rRNA has both highly conserved regions which primers are targeted to, and hypervariable regions. Sequencing of 16s rRNA can be used to measure phylogenetic relationships across different taxa, but may not be able to differentiate between closely related species. Internal transcribed spacer sequencing, where housekeeping genes like recA or rpod may be used, often give better strain level resolution with the drawback that the primers are not as universal as those for 16S [15]. The sequencing data gained from this targeted gene analysis provides information that can assign an organism to a specific taxonomy and count the frequency of a group of organisms.
| Kit | Company | Sample | Reference |
|---|---|---|---|
| Purelink Microbiome DNA purification kit | Invitrogen | Stool | [8] |
| GenElute Bacterial Genomic DNA Kit | MilliporeSigma | milk | [16] |
| lyPMA Method: osmotic lysis & propidium monoazide | n/a | Depletes host DNA | [17] |
| NEBNextr® Microbiome DNA Enrichment Kit | New England BioLabs | Enriches for prokaryotic DNA | [18] |
| NucleoSpin® Soil kit | Macherey–Nagel | Manure | [19] |
| PowerSoil® DNA isolation kit | QIAGEN MO BIO Laboratories | Environmental samples, soil, milk, stool | [20, 21] |
| QIAamp® Fast DNA Stool Mini Kit | QIAGEN | Stool, milk | [16, 22] |
| TIANamp Stool DNA Kit | TIANGEN | Colorectal tissues | [23] |
| UltraClean™ Fecal DNA Kit | MO BIO Laboratories | Stool, gut, cecal digesta and biosolid samples | [24] |
Conversely, for functional studies of the microbiome, whole genome shotgun (WGS) metagenomics approaches are more commonly utilized. Metagenomic studies provide a more complete picture of what a total community looks like and gives taxonomic and functional profiles. These types of studies provide information about the function of the community of microorganisms. Shotgun metagenomics is total DNA sequencing and creates a library. The benefit of this approach is twofold. First, it captures genetic information from all the organisms present in a microbiome, including any that are unknown or unculturable. Second, metagenomics allows us to study the microorganisms in their natural state. Metagenomics can be used to study all types of microorganisms, including bacteria, fungi and viruses, in one run, which saves time and reduces costs. In addition, metagenomics can provide information on what organisms are present and their relative frequencies, like amplicon-based sequencing does, but it also provides information about the functions of the microbial community. One potential problem with WGS is that with very diverse samples, a good assembly may be difficult to achieve. In such cases, an enrichment culture or single cell experiment may allow improvement of an assembly.
With regard to the metaproteomics-related approach, a new method to evaluate the proteome-level functional redundancy in the human gut microbiome using metaproteomics has detected significant functional redundancy and increased nestedness in the human gut proteomic networks [25]. The authors reported that gut inflammation and exposure to specific xenobiotics can significantly affect the gut microbiome without robust modifications of taxonomic diversity.
Data-independent acquisition with Parallel Accumulation-Serial Fragmentation (DIA-PASEF) mass spectrometry (MS) has been applied to evaluate the proteins of microbiomes and their interactions with the host [26]. The study has shown that PASEF raised peptide quantification up to 5 times, widened the dynamic range towards low-abundant proteins and increased the quantification of proteins with unknown functions. Also, the described method allowed to profile 131 additional functional pathways. The obtained results have shown the significant enrichment of two bacterial classes upon pain and the increase of metabolic activities related to chronic pain and pain-induced dynamics of proteome complexes involved in the network between the host immune system and the gut microbiome.
Both amplicon-based sequencing and shotgun metagenomics approaches to microbiome research require the preparation of a sample. Amplicon-based sequencing requires intact DNA, while shotgun metagenomics approaches utilize DNA that has been sheared into small fragments. Samples may be taken from any microbiome site, from the mouth to the GI tract. Next, DNA must be isolated. Many approaches to DNA isolation have been used for microbiome research and many companies offer kits for the extraction of DNA from microbial samples. Refer to Table 1 for some common DNA isolation techniques and kits. Different DNA extraction approaches may yield varied microbiome [16]. Typically, host DNA (human) is removed so that sequencing does not get wasted on the DNA of the host. One must choose a DNA isolation kit or technique appropriate for the organism under study. CA Douglas et al evaluated DNeasy PowerLyzer PowerSoil DNA Isolation, Sigma-Aldrich GenElute Bacterial Genomic DNA Kit, QIAamp DNA Stool Mini Kit, QIAamp DNA Stool Mini Kit with bead beating and the phenol-chloroform-based method in breast milk DNA extraction to study its microbiome and yielded different microbiome compositions and spurious contaminant inclusion [16].
| Platform | System | Read Length (bp) | Strengths | Reference |
|---|---|---|---|---|
| Sanger | ABI 3500, 3730 | 50-1000 | Accuracy | [27] |
| Roche 454 | 454 Life Sciences, 454-FLX | [28] | ||
| Illumina | MiSeq, HiSeq, NextSeq, NovaSeq | 30-300 | High Throughput | [21, 29] |
| IonTorrent | PGM, S5, Proton | To 200 or 400 | Speed | [30] |
| Pacific Biosciences | PacBio RSII, Sequel | 10,000-60,000 | Long Read Length | [31] |
| Oxford Nanopore | MInION | To 100,000 | Portable, Long Read Length | [32] |
In culture dependent methods of microbiome analysis, samples are grown in culture media or agar. A major drawback of this method is that many microbes are anaerobic and are not reliably able to be cultured with current technology and approaches. Alternatively, with culture independent methods of microbiome analysis, no growth or culture phase is required. These methods are good at determining the abundance of microbes in a community or sample directly. A majority of microbiome studies take samples directly from a host or environment and proceed with DNA isolation without culturing.
DNA sequencing was initially performed with the Sanger method, which was the primary technique between 1975 and 2005. This sequencing method produces 500-100 bp DNA sequences but is costly and time consuming. The need to achieve rapid sequencing of vast numbers of samples was met by pyrosequencing, a second-generation sequencing technique that produces short sequencing reads in high capacity. 454 Life Sciences introduced pyrosequencing technology in 2005, which produced millions of sequences per run and cost one one-hundredth of the price per read as compared to Sanger sequencing and launched next generation sequencing (NGS) [33]. Since 2005, several other platforms for next-generation sequencing (NGS) have been developed. The major advantages of using NGS compared with Sanger sequencing are the cost-effectiveness and high throughput nature of NGS. See Table 2 for NGS platforms, read lengths and strengths of each system. The Illumina systems have dominated NGS because the costs are even lower than pyrosequencing and it determines billions of reads [34]. See Findley and Harrison, 2014. for an in-depth review of Next Generation Sequencing technologies [35]. The sequencing technology used for a research project should be chosen based on several parameters including throughput, read length, error rate and availability of reference strains. Throughput is the amount of sequence that can be generated for a certain cost. Read length may be long or longer, and the longer reads allow easier to assemble data. Error rates are less important for survey type experiments, but a low error rate is important for population genomics. Finally, the availability of reference strains may impact the bioinformatic approaches used to analyze the sequence data downstream. Typically for 16s survey approaches, the Illumina MiSeq works well. It has somewhat lower throughput than the newer Illumina platforms, but is enough for survey applications. For WGS approaches the Illumina HiSeq or NextSeq systems provide the necessary throughput and acceptable error rates. The Pacific Biosciences platform is especially good for genomic level data since it generates the longest reads, but it is lower in throughput.
Since the population level transcriptomics measurements can only focus on the average bacteria population behaviors, a droplet-based high-throughput single-microbe RNA-seq assay (smRandom-seq) has been developed using random primers for in situ cDNA generation, droplets for single-microbe barcoding and CRISPR-based rRNA depletion for mRNA enrichment [36]. Single-microbe RNA-seq assay showed a high species specificity and sensitive gene detection, identified transcriptome changes and verified resistant subpopulations with specific gene expression patterns and metabolic pathways.
Bioinformatics approaches for microbiome analysis are critical to elucidating the information contained within a sample. Computational algorithms allow the analysis and handling of PCR duplicates and allow visualization of the data to provide a management result. Several sequence data bioinformatic algorithms are used in data analysis of microbiome studies and include Usearch, BLAST and BLAST-like programs, RDP Classifier and Pplacer [37]. Recently, pipelines for 16s amplicon sequencing were compared and found to have various strengths and weaknesses, such as MOTHUR performed well at the OTUT-level, but with lower specificity than Amplicon Sequence Variants (ASVs) level pipelines [38]. Different analysis methods yield varied results [39].
A recent study has evaluated 324 available R packages for microbiome analysis and classified them according to several categories, including diversity, difference, biomarker, correlation and network and functional prediction [40]. In addition, the authors analyzed the advantages and weaknesses of the integrated R packages (phyloseq, microbiome, MicrobiomeAnalystR, Animalcules, microeco, and amplicon) for microbiome research. Overall, an effective pipeline for microbiome studies with multiple examples with R codes has been presented.
A systematic evaluation of the gut microbiome-based machine-learning classifiers for 20 diseases demonstrated significant predictive accuracies in intra-cohort validation, but low accuracies in cross-cohort validation, except the intestinal diseases [41]. The authors reported higher validation performance for classifiers using metagenomic data compared to 16S amplicon data in intestinal diseases. In addition, the study estimated the cross-cohort marker consistency using a Marker Similarity Index.
| Tool | Supplier | Strengths | Reference |
|---|---|---|---|
| QIIME / QIIME2 | Open source | High-Throughput, Identification of large-scale patterns | [21] |
| Mothur | Open source | OUT, alpha and beta diversity | [42] |
| Ion ReporterTM | Thermo Fisher Scientific | Genus/Species level identification | [43] |
| SequenceMatch Algorithm | Ribosomal Database Project | Very sensitive, good for low copy number species detection, good for matching filtered reads to bacteria | [43] |
| HUMAnN2 | Open source | Metagenome Functional Profiling | [44] |
| MetaPath | Open source | Identification of metabolic subnetworks | [45] |
| Usearch | Open source | Speed, clustering at lower identities | [29, 46] |
| BaseSpace | Illumina | Demultiplexing | [47] |
| RDP Classifier | Open source | Taxonomic assignment | [48] |
| Phlan | Open source | Phylogenetic analysis | [49] |
| Pplacer | Open source | Puts sequence on a fixed reference phylogenetic tree, visualization | [50] |
| BLAST | NCBI | Sequence comparison | [51, 52] |
| Canu | Open source | Assembles sequence from PacBio or Oxford Nanopore | [53, 54] |
| MyCC | Open source | Automated binning | [55] |
Taxonomic information like operational taxonomic units, or OTUs, are typically reported in microbiome analyses. OTUs are a group of very similar sequences of a certain group of microorganisms. The abundance of certain species down to strain level is possible with some NGS techniques and algorithms. Functional information like abundance versus function is also generated. Amplicon Sequence Variants (ASVs) are a more recent approach to analysis, and they actually infer the sample’s sequence before sequencing errors occur and distinguish variants [56]. ASV based analysis offer better resolution and accuracy than OTU based analysis and address several of the limitations of OTU analysis [57].
Assembly of DNA sequencing reads to a reference genome makes assembly much easier. De novo assembly takes significantly more computing time and can be very challenging with short read technology. Reads are binned by some genomic property like GC content and a De Brujin graph is typically used. The reads are taken and tried to find overlap and graph the assemblies, but loops or inaccuracies are possible, as well as crashed computers. One approach to meet this challenge is the use of cross platform assemblies. This approach is useful when a reference genome is not available. The use of PacBio for long reads is used to provide a scaffold which potentially many errors due to its higher error rate. Then the Illumina is used to generate short reads that are higher throughput and have lower error rates. The smaller reads are assembled onto the long read to create a more accurate assembly [53, 58].
Functional experiments use binning, where data is put into specific population or operational taxonomic units (OTUs) to learn what the community looks like. This binning may be done either pre-assembly with reads or post assembly with contigs. Fragments are assigned to an OTU with a program and the bin is based on alignment or some genomic property. Another way to do the assembly phase of the analysis is MyCC, a new idea based on genomic signatures. This approach does dimension reduction, clustering and correction of clusters based on a set of marker genes [55].
Methodologies used to study microbiomes have advanced rapidly in the last decade. Even more optimized sequencing technologies as well as more specific computational models will likely continue to be invented by the community of scientists studying microbiomes. More advanced approaches to elucidating the composition of microbiomes and the genomes of the strains of microorganisms within will undoubtedly continue to expand. This review provides a basis for understanding many of the common scientific methodologies used in microbiome research, from sample harvest and sequencing to the basics of analysis.
- Feye K, Rubinelli P, Chaney W, Pavlidis H, Kogut M, Ricke S. The Preliminary Development of an in vitro Poultry Cecal Culture Model to Evaluate the Effects of Original XPCTM for the Reduction of Campylobacter jejuni and Its Potential Effects on the Microbiota. Front Microbiol. 2019;10:3062 pubmed publisher
- Yang Y, Sitanggang N, Kato N, Inoue J, Murakami T, Watanabe T, et al. Beneficial effects of protease preparations derived from Aspergillus on the colonic luminal environment in rats consuming a high-fat diet. Biomed Rep. 2015;3:715-720 pubmed
- Margulies M, Egholm M, Altman W, Attiya S, Bader J, Bemben L, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005;437:376-80 pubmed
- Altschul S, Gish W, Miller W, Myers E, Lipman D. Basic local alignment search tool. J Mol Biol. 1990;215:403-10 pubmed
- Materials and Methods [ISSN : 2329-5139] is a unique online journal with regularly updated review articles on laboratory materials and methods. If you are interested in contributing a manuscript or suggesting a topic, please leave us feedback.
- method
- Antibiotic Discovery
- Antibody Conjugation
- Behavioral Phenotyping in Rats and Mice
- CRISPR and Genomic Engineering
- Current PCR Methods
- DNA Extraction and Purification
- Killifish
- Laboratory Mice and Rats
- Medicinal Chemistry
- Microglia Markers
- Microscopes in Biomedical Research
- Mouse Antibody
- PCR Machines
- PCR Protocol and Troubleshooting
- RNA-seq
- Single Cell Technologies
- Venereal Diseases

