Mutant Phenotypes for Thousands of Bacterial Genes of Unknown Function

by Morgan N. Price, Kelly M. Wetmore, R. Jordan Waters, Mark Callaghan, Jayashree Ray, Hualan Liu, Jennifer V. Kuehl, Ryan A. Melnyk, Jacob S. Lamson, Yumi Suh, Hans K. Carlson, Zuelma Esquivel, Harini Sadeeshkumar, Romy Chakraborty, Grant M. Zane, Benjamin E. Rubin, Judy D. Wall, Axel Visel, James Bristow, Matthew J. Blow, Adam P. Arkin, and Adam M. Deutschbauer


One third of all protein-coding genes from bacterial genomes cannot be annotated with a function. To investigate these genes’ functions, here we collected genome-wide mutant fitness data from 32 diverse bacteria across dozens of growth conditions each. We identified mutant phenotypes for 11,779 protein-coding genes that had not been annotated with a specific function. Many genes could be associated with a specific condition because the gene affected fitness only in that condition, or with another gene in the same bacterium because they had similar mutant phenotypes. 2,316 of these poorly-annotated genes had associations that are of high confidence because they are conserved in other bacteria. By combining these conserved associations with comparative genomics, we identified putative DNA repair proteins and we proposed specific functions for poorly-annotated enzymes and transporters and for uncharacterized protein families. Our study demonstrates the scalability of microbial genetics and its utility for improving gene annotations.

See the article (paywalled), the final author version (free), or view the data in the Fitness Browser.

Data Downloads

Note added September 10, 2021. After publishing this data, Morgan Price and Adam Deutschbauer discovered that our stock solutions for sucrose and D-mannitol were problematic. In particular, Escherichia coli BW25113 is a K-12 strain (closely related to MG1655) and should not be able to grow on sucrose. In M9 media made with our original stock solution of sucrose, E. coli BW25113 grew, but in media made with a fresh stock solution, it did not. Similarly, growth of E. coli on mannitol should require the phosphotransferase uptake protein mtlA and the dehydrogenase mtlD. In our original fitness assays for E. coli, mtlA and mtlD were not important for growth on mannitol; instead, manX and manY, which encode the mannose phosphotransferase system, were important. When we repeated these experiments with a fresh stock solution for D-mannitol, we found that mtlA and mtlD were important for fitness, and manX and manY were not.

Please disregard any of the data from this publication regarding sucrose or D-mannitol. In the Fitness Browser, the problematic fitness experiments have been removed. As of September 2021, the data for these compounds in the Fitness Browser is from fresh stock solutions of sucrose and D-mannitol. We also checked that the data from these experiments is consistent with prior knowledge of the utilization of these compounds. Finally, we checked the gene re-annotations that were related to sucrose or mannitol utilization.

You can download the data for each organism here:

or as a tarball for all genomes here (large! 84 GB)

You can get information about the organisms and their genomes here:

Alternatively, you can download all of the data in the Fitness Browser (as of June 2017) from doi: 10.6084/m9.figshare.5134840

Also note that for some organisms, the Fitness Browser contains additional experiments beyond those described here. For these organisms, the cofitness values will not match.

Other Downloads


Page by Morgan N. Price, Arkin group, June 2017