Advances in next-generation sequencing and computation have elucidated the impact of human genomic variation; however, the impact of this variation is largely unexplored in the human microbiome. Now clinical and environmental researchers have the high-resolution tools to explore the emergent science of microbial communities and the ecosystems they inhabit. CHI’s Dynamics of the Microbiome on Health and Disease conference focuses on understanding the role of the microbiome to offer new insights into disease processes and discovery of new therapeutic strategies.
Day 1 | Day 2
Tuesday, August 20
7:30 am Breakfast Technology Workshop (Sponsorship Opportunity Available)
8:15 Chairperson’s Remarks
Aleksandar Milosavljevic, Ph.D., Professor, Molecular and Human Genetics, Baylor College of Medicine
» Featured Presentation
8:20 Principled Probabilistic Machine Learning Models for Analyzing Microbiome Time-Series Data
Georg K. Gerber, M.D., Ph.D., MPH, Instructor in Pathology, Harvard Medical School; Co-Director, Center for Clinical and Translational Metagenomics, Director, Computational Unit, Associate Pathologist, Center for Advanced Molecular Diagnostics, Pathology, Brigham and Women’s Hospital Biography
Although longitudinal microbiome data have the potential to yield new insights into the mechanisms by which the microbiota interact over time with immunologic and other dynamically varying host responses, methods for analyzing these data remain underdeveloped. To address this challenge, we present a machine learning framework that uses continuous-time dynamical models coupled with Bayesian dimensionality adaptation methods to simultaneously infer time-dependent signatures for individual taxa and assignments of taxa to functional groups. This framework enables several new types of analyses, including quantitation of the diversity of time-dependent microbial responses to perturbations, estimation of times required for members of host ecosystems to equilibrate after a perturbation event and automatic identification of sub-communities of microbes within the larger ecosystem that exhibit coordinated responses to perturbations. We demonstrate the application of our tools to two studies measuring changes over time in human or mouse gut flora in response to antibiotic pulses or challenge with an enteric pathogen, and additionally present extensions to our framework, including methods for automated experimental design that leverage data from pilot studies to develop optimized designs for larger studies, and for inference of latent continuous-time processes for correlation with time-dependent host physiologic or immunologic responses.
8:50 Genboree Workbench and Network for Integrative Microbiome Analysis
Aleksandar Milosavljevic, Ph.D., Professor, Molecular and Human Genetics, Baylor College of Medicine Biography
Next-generation sequencing is providing access to multiple layers of “omic” information, including genomic, epigenomic, transcriptomic and metagenomic layers. We present tools for microbiome analysis (16S and metagenome sequencing) deployed through the Genboree Workbench along with the toolsets for other “omic” assays. We also review the need for virtual integration of physically distributed data, tools and computing resources and the solution offered by the Genboree Network.
9:20 An EM Algorithm for Maximum Likelihood Estimation of Community Composition from Short-Read Sequencing Data
John Novembre, Ph.D., Associate Professor, Human Genetics, University of Chicago Biography
Estimation of the relative abundance of strains is a central challenge in microbiome studies. Using an EM-based algorithm that explicitly models sequencing error and multinomial sampling of reads and that in effect uses “soft” assignments of reads to strains, we are able to estimate reads with higher accuracy than methods that use “hard” assignments. We demonstrate the method’s performance in various scenarios using simulation and real data.
9:50 Selected Oral Poster Presentation: A Novel Approach to Unknown Pathogen Detection in Clinical Samples
Sadie La Bauve, Ph.D., Bioenergy and Defense Technologies, Sandia National Laboratories
Critical to effective treatment of infection with unknown viruses is detection and identification early in the course of disease. This is challenging using existing methods for even well-characterized pathogens due to low titer during early stages of infection. Pathogen identification by second-generation sequencing of clinical samples requires no foreknowledge of the agent and is relatively sensitive, but the amount of microbiome-derived noise in the resulting data and few sequence reads specific to the actual pathogen limit its utility at present. We will address this problem by physically separating infected from uninfected cells using vital dyes, hyperspectral microscopy, multivariate analysis, classification algorithms and microfluidics. We hypothesize that fluorescent signatures specific to the infected state derive from the innate antiviral immune response and pathogen-induced cytopathic effects. We will deep sequence individual cells’ RNA and use that data to proportionally subtract the microbiome present in uninfected cells from that of infected cells. We present preliminary data on infected cell identification and single-cell RNA-seq. We envision the methods we are developing stand to revolutionize early diagnosis of unknown pathogen infection.
10:05 Coffee Break in the Exhibit Hall with Poster Viewing
11:00 SPA: A Short Peptide Assembler for Metagenomic Data
Shibu Yooseph, Ph.D., Associate Professor, Informatics, J. Craig Venter Institute Biography
The availability of full-length protein sequences is extremely beneficial to any metagenomic data analysis as this allows for an accurate reconstruction of the functional and metabolic potential of the microbial community being studied. However, inference of long protein sequences from contigs obtained via a de novo assembly of nucleotide reads is hampered by the observation that metagenomic assemblies are often very fragmented. We describe a new method for reconstructing complete protein sequences directly from metagenomic data generated using Next Generation Sequencing technologies. Our framework is based on a novel Short Peptide Assembler (SPA) that uses a de Bruijn graph formulation to assemble protein sequences from their constituent peptide fragments identified on short reads. Using large simulated and real metagenomic datasets, we show that our method outperforms the alternate approach of identifying genes on nucleotide sequence assemblies, and generates longer protein sequences that can be more effectively analyzed.
11:30 Observing Both the Forest and the Trees: Building Metagenomic Analysis Workflows with MetAMOS
Todd Treangen, Ph.D., Senior Bioinformatics Scientist, National Biodefense Analysis and Countermeasures Center, Center for Bioinformatics and Computational Biology, University of Maryland Biography
I will present MetAMOS, an open source and modular metagenomic assembly and analysis pipeline. MetAMOS represents an important step towards fully automated metagenomic analysis, allowing the user to simultaneously observe the forest (relative abundance estimation, functional landscape) and the trees (genome assemblies, annotated genes, variant motifs). Additionally, MetAMOS can aid in reducing assembly errors, commonly encountered when assembling metagenomic samples, and improves taxonomic assignment accuracy while also reducing computational cost.
12:00 pm Technology Spotlight (Sponsorship Opportunity Available)
12:15 Close of Session
12:30 From Reads to Variants: Ten-Fold Reduction in Time and Cost with Improved Accuracy
Rupert Yip, Ph.D., Director, Product Marketing, Bina Technologies
Alignment and variant calling of raw NGS reads has been plagued by expensive HPC hardware and the bioinformatics personnel to support and maintain home-grown, open-source secondary analysis solutions. Such solutions can take up to weeks and $1000s per analysis. We present a genomic analysis platform that reduces, by ten-fold, the time and cost for secondary analysis while improving accuracy compared to standard pipelines. Our innovative model reduces costs by ten-fold while preventing hardware obsolescence.
» Plenary Keynote Session
2:00 Chairperson’s Opening Remarks
Toby Bloom, Ph.D., Deputy Scientific Director, Informatics, New York Genome Center
2:10 A Revolution in DNA Sequencing Technologies: Challenges and Opportunities
Jeffery A. Schloss, Ph.D., Director, Division of Genome Sciences, National Human Genome Research Institute, National Institutes of Health Biography
The initial sequencing of the human genome spurred an appetite for much more human sequence information to better understand the contributions of human sequence variation to health and disease. However, despite dramatic reductions during the Human Genome Project, the cost of sequencing was clearly too high to collect the very large numbers of human and numerous other organism genome sequences needed to achieve that understanding. In 2004, NHGRI launched parallel programs to reduce the cost of sequencing a mammalian genome initially by two (in five years), and eventually by four orders of magnitude (in ten years). This presentation will summarize the technologies that are in high-throughput use to produce stunning amounts of sequence and related data and novel biological insights, and will emphasize technologies currently emerging and on the horizon that may provide human genome sequence data with the nature, quality, cost and turnaround time needed for applications in research and medicine.
2:50 RNA is Everywhere: Characterizing the Spectra and Flux of RNA in Mammalian Circulation
David Galas, Ph.D., Principal Scientist, Pacific Northwest Diabetes Research Institute Biography
The discovery of foreign RNA in blood and tissues of humans and mice raises many questions, including its origins, the mechanisms of its transport and stability and what, if any, functions it has. I will discuss what we know about circulating exRNA in human plasma and the use of NGS in the exploration of this new area of investigation in biology and medicine.
3:30 Refreshment Break in the Exhibit Hall with Poster Viewing
4:15 Genomics and the Single Cell
Sherman Weissman, Ph.D., Sterling Professor of Genetics and Medicine, Yale University School of Medicine Biography
Studies of single cells are being approached by widely different methods, principally either florescence microscopy including super-high resolution methods, cloning and expansion of single cells or most generally applicable, genomic-scale nucleic acid analyses. The last includes single-cell DNA sequence analysis, gene expression analysis and most recently analyses of telomere length, DNA methylation and potentially closed regions of chromatin. Also, in the near future, it may be possible to combine several analyses of a single cell, including mRNA expression, genomic DNA methylation and protein secretion. These approaches will have major value for diverse fields, including molecular analysis of the early stages of development, the nature and heterogeneity of stem cells and transient repopulating cells in various systems including the hematopoietic system, the nature and extent of heterogeneity of neurons, heterogeneity in neoplasia and in functional subsets of cells of the immune system. A substantial experimental challenge is to distinguish technical variation from stochastic and deterministic events in single cells. Another, broader challenge is to correlate the results of genomic properties that necessarily involve destruction of the cell with the functional properties and potential of the individual cell being analyzed. These issues will be discussed briefly in the presentation.
4:55 Genome Hacking
Yaniv Erlich, Ph.D., Principal Investigator, Whitehead Fellow, Whitehead Institute for Biomedical Research Biography
Sharing sequencing datasets without identifiers has become a common practice in genomics. We developed a technique that uses entirely free, publicly accessible Internet resources to fully identify individuals in these studies. I will present quantitative analysis about the probability of identifying U.S. individuals by this technique. In addition, I will demonstrate the power of our approach by tracing back the identities of multiple whole-genome datasets in public sequencing repositories.
Genetic Privacy: Technology and Ethics with Yaniv Erlich
5:35 Close of Dynamics of the Microbiome on Health and Disease Conference / Short Course Registration Open
Day 1 | Day 2