A Comparison of the Costs and Benefits of Bacterial Gene Expression
By Morgan N. Price, Kelly M. Wetmore, Adam M. Deutschbauer, and Adam P. Arkin.
To study how a bacterium allocates its resources, we compared the costs and benefits of most (86%) of the proteins in Escherichia coli K-12 during growth in minimal glucose medium. The cost or investment in each protein was estimated from ribosomal profiling data, and the benefit of each protein was measured by assaying a library of transposon mutants. We found that proteins that are important for fitness are usually highly expressed, and 95% of these proteins are expressed at above 13 parts per million (ppm). Conversely, proteins that do not measurably benefit the host (with a benefit of less than 5% per generation) tend to be weakly expressed, with a median expression of 13 ppm. In aggregate, genes with no detectable benefit account for 31% of protein production, or about 22% if we correct for genetic redundancy. Although some of the apparently unnecessary expression could have subtle benefits in minimal glucose medium, the majority of the burden is due to genes that are important in other conditions. We propose that at least 13% of the cell’s protein is "on standby" in case conditions change.
- E. coli fitness data for MOPS minimal + 2 g/L glucose, 200 mL culture
- E. coli fitness compendium
- This is a tab delimited table. The possibility of genetic redundancy was examined only for genes with key=Key.
- Comparison of fitness data to the ribosomal profiling of Li et al 2012
- tab-delimited table
- fmin is the fraction of protein monomers (from Table S1 of Li et al)
- fweight is the fraction of protein production (weighted by #aa)
- fit6A and fit6B are the fitness values at 5 generations (replicates A and B, MiSeq)
- "setAS2 A" and "setAS3 B" are the t values at 5 generations (replicates A and B, MiSeq)
- fit12A and fit12B are fitness values at 12 generations (replicates A and B, HiSeq)
- setAIT095 and setAIT096 are the t values at 12 generations (replicates A and B, HiSeq)
- t6 is the combined t value at 5 generations
- t12_1 is the combined t value at 12 generations using the MiSeq data
- t12_2 is the combined t value at 12 generations using the HiSeq data
- class: 1 = essential, 2 = important for fitness, 3 = detrimental to fitness, 4 = no detectable phenotype, 5 = ambiguous
- sick is TRUE for genes that lack fitness data but are required for normal growth in both LB and minimal media (Baba et al 2006)
- Manual classification of highly expressed genes with no phenotype in minimal glucose
- R image
- R code for figures
- To be used with the R image. This also shows how the R image was created.
- The code for processing the fitness data is available here.