Functionalizing Lists of Unknown TB Entitites (FLUTE)

Flute Logo

Program Director: Eric J. Rubin, Harvard School of Public Health

Investigators: Sarah Fortune, Chris Sassetti, Kyu Y. Rhee, Sabine Ehrt, Dirk Schnappinger, Steven A. Carr, Jonathan Livny, James Sacchettini, Thomas R. Ioerger, Christoph Grundner

Funding Source: NIH U19 AI107774

Project Home Page:

Project Objectives: Mycobacterium tuberculosis is one of the most successful pathogens in the world, still responsible for millions of deaths each year. Nearly half of its protein coding genes have functions that are unknown. FLUTE a Functional Genomics Resource Center funded by NIAID, with the goal of defining functions for unknown ORFs, hypothetical genes, and non-coding RNAs in Mtb.

Project Data: PATRIC FTP Site, PATRIC Public Workspace


High-Resolution Phenotypic Profiling Defines Genes Essential for Mycobacterial Growth and Cholesterol Catabolism


The pathways that comprise cellular metabolism are highly interconnected, and alterations in individual enzymes can have far-reaching effects. As a result, global profiling methods that measure gene expression are of limited value in predicting how the loss of an individual function will affect the cell. In this work, we employed a new method of global phenotypic profiling to directly define the genes required for the growth of Mycobacterium tuberculosis. A combination of high-density mutagenesis and deep-sequencing was used to characterize the composition of complex mutant libraries exposed to different conditions. This allowed the unambiguous identification of the genes that are essential for Mtb to grow in vitro, and proved to be a significant improvement over previous approaches. To further explore functions that are required for persistence in the host, we defined the pathways necessary for the utilization of cholesterol, a critical carbon source during infection. Few of the genes we identified had previously been implicated in this adaptation by transcriptional profiling, and only a fraction were encoded in the chromosomal region known to encode sterol catabolic functions. These genes comprise an unexpectedly large percentage of those previously shown to be required for bacterial growth in mouse tissue. Thus, this single nutritional change accounts for a significant fraction of the adaption to the host. This work provides the most comprehensive genetic characterization of a sterol catabolic pathway to date, suggests putative roles for uncharacterized virulence genes, and precisely maps genes encoding potential drug targets.

Data sets and additional information can be found here.

Peptidoglycan synthesis in Mycobacterium tuberculosis is organized into networks with varying drug susceptibility


Peptidoglycan (PG), a complex polymer composed of saccharide chains cross-linked by short peptides, is a critical component of the bacterial cell wall. PG synthesis has been extensively studied in model organisms but remains poorly understood in mycobacteria, a genus that includes the important human pathogen Mycobacterium tuberculosis (Mtb). The principle PG synthetic enzymes have similar and, at times, overlapping functions. To determine how these are functionally organized,we carried out whole genome transposon mutagenesis screens in Mtb strains deleted for ponA1, ponA2, and ldtB, major PG synthetic enzymes. We identified distinct factors required to sustain bacterial growth in the absence of each of these enzymes. We find that even the homologues PonA1 and PonA2 have unique sets of genetic interactions, suggesting there are distinct PG synthesis pathways in Mtb. Either PonA1 or PonA2 is required for growth of Mtb, but both genetically interact with LdtB, which has its own distinct genetic network. We further provide evidence that each interaction network is differentially susceptible to antibiotics. Thus, Mtb uses alternative pathways to produce PG, each with its own biochemical characteristics and vulnerabilities.

Data sets and additional information can be found here.

Comprehensive Essentiality Analysis of the Mycobacterium tuberculosis Genome via Saturating Transposon Mutagenesis


For decades, identifying the regions of a bacterial chromosome that are necessary for viability has relied on mapping integration sites in libraries of random transposon mutants to find loci that are unable to sustain insertion. To date, these studies have analyzed subsaturated libraries, necessitating the application of statistical methods to estimate the likelihood that a gap in transposon coverage is the result of biological selection and not the stochasticity of insertion. As a result, the essentiality of many genomic features, particularly small ones, could not be reliably assessed. We sought to overcome this limitation by creating a completely saturated transposon library in Mycobacterium tuberculosis. In assessing the composition of this highly saturated library by deep sequencing, we discovered that a previously unknown sequence bias of the Himar1 element rendered approximately 9% of potential TA dinucleotide insertion sites less permissible for insertion. We used a hidden Markov model of essentiality that accounted for this unanticipated bias, allowing us to confidently evaluate the essentiality of features that contained as few as 2 TA sites, including open reading frames (ORF), experimentally identified noncoding RNAs, methylation sites, and promoters. In addition, several essential regions that did not correspond to known features were identified, suggesting uncharacterized functions that are necessary for growth. This work provides an authoritative catalog of essential regions of the M. tuberculosis genome and a statistical framework for applying saturating mutagenesis to other bacteria.

Data sets and additional information can be found here.

Statistical analysis of genetic interactions in Tn-Seq data


Tn-Seq is an experimental method for probing the functions of genes through construction of complex random transposon insertion libraries and quantification of each mutant’s abundance using next-generation sequencing. An important emerging application of Tn-Seq is for identifying genetic interactions, which involves comparing Tn mutant libraries generated in different genetic backgrounds (e.g. wild-type strain versus knockout strain). Several analytical methods have been proposed for analyzing Tn-Seq data to identify genetic interactions, including estimating relative fitness ratios and fitting a generalized linear model. However, these have limitations which necessitate an improved approach. We present a hierarchical Bayesian method for identifying genetic interactions through quantifying the statistical significance of changes in enrichment. The analysis involves a four-way comparison of insertion counts across datasets to identify transposon mutants that differentially affect bacterial fitness depending on genetic background. Our approach was applied to Tn-Seq libraries made in isogenic strains of Mycobacterium tuberculosis lacking three different genes of unknown function previously shown to be necessary for optimal fitness during infection. By analyzing the libraries subjected to selection in mice, we were able to distinguish several distinct classes of genetic interactions for each target gene that shed light on their functions and roles during infection.

Data sets and additional information can be found here.

TnSeq datasets for 10 knockouts in Mtb

In this experiment, knockout strains for 10 genes in Mycobacterium tuberculosis H37Rv and assayed by TnSeq to identify conditional essentials (compared to WT, using the resampling statistical method in Transit). The target genes (knockedout) are: Rv3594, Rv3717, Rv3811, Rv0307c, Rv3916c, Rv0950, Rv1096, Rv3684, Rv0954, and Rv3671c.

Data sets and additional information can be found here.