Downloads for "Mutant Phenotypes for Thousands of Bacterial Genes of Unknown Function"

by Morgan N. Price, Kelly M. Wetmore, R. Jordan Waters, Mark Callaghan, Jayashree Ray, Hualan Liu, Jennifer V. Kuehl, Ryan A. Melnyk, Jacob S. Lamson, Yumi Suh, Hans K. Carlson, Zuelma Esquivel, Harini Sadeeshkumar, Romy Chakraborty, Grant M. Zane, Benjamin E. Rubin, Judy D. Wall, Axel Visel, James Bristow, Matthew J. Blow, Adam P. Arkin, and Adam M. Deutschbauer


Nearly half of all protein-coding genes from bacterial genomes cannot be annotated with a specific function. To systematically explore the functions of these proteins, we generated saturated transposon mutant libraries from 32 diverse bacteria, and for each, we assayed the mutant phenotypes of 1,898 to 6,373 protein-coding genes across 26 to 129 conditions. From these data, we identified a mutant phenotype for 11,779 protein-coding genes that had not been annotated with a specific function. The majority of these genes (62%) had phenotypes that were either specific to a few conditions or were similar to those of another gene in the same bacterium. We show that these associations are informative for understanding protein function. For 2,316 of these poorly-annotated genes, the associations are conserved in other bacteria, which confirms that these associations are genuine. By combining these associations with comparative genomics, we proposed functions for uncharacterized protein families, we identified putative DNA repair proteins, and we improved the annotations of hundreds of transporters and catabolic enzymes. Across all sequenced bacteria, 12% of proteins that lack detailed annotations have a potential ortholog with a functional association in our dataset. Our study demonstrates the scalability of microbial genetics and its utility for improving gene annotations.

See bioRxiv preprint (with just 25 bacteria and a different title)

Data Downloads

The easiest way to view the data is with the Fitness Browser. You can also download the data for each organism here: or as a tarball for all genomes here (large! 84 GB)

You can get information about the organisms and their genomes here:

Alternatively, you can download all of the data in the Fitness Browser from doi: 10.6084/m9.figshare.5134840

Also note that for some organisms, the Fitness Browser contains additional experiments beyond those described here. For these organisms, the cofitness values will not match.

Other Downloads