Data for "Indirect and Suboptimal Control of Gene Expression is Widespread in Bacteria"
Morgan N. Price, Adam M. Deutschbauer, Jeffrey M. Skerker, Kelly M. Wetmore, Troy Ruths, Jordan S. Mar, Jennifer V. Kuehl, Wenjun Shao, and Adam P. Arkin
Lawrence Berkeley Lab Physical Biosciences Division
UC Berkeley Dept. of Bioengineering
Energy Biosciences Institute
UCB Dept. of Molecular and Cell Biology
Molecular Systems Biology 9:660
Abstract:
Gene regulation in bacteria is usually described as an adaptive
response to an environmental change so that genes are expressed when
they are required. We instead propose that most genes are under
indirect control: their expression responds to signal(s) that are not
directly related to the genes' function. Indirect control should perform
poorly in artificial conditions, and we show that gene regulation
is often maladaptive in the laboratory.
In
Shewanella oneidensis MR-1, 24% of genes are detrimental to
fitness in some conditions, and detrimental genes tend to be highly
expressed instead of being repressed when not needed.
In diverse bacteria, there is little
correlation between when genes are important for optimal growth or
fitness and when those genes are upregulated.
Two common types of indirect control are constitutive expression and regulation by growth rate;
these occur for genes with diverse functions and often seem to be suboptimal.
Because genes that have closely-related functions can have dissimilar expression patterns,
regulation may be suboptimal in the wild as well as in the laboratory.
Tab-delimited data for Shewanella oneidensis MR-1
- Protein-coding genes in S. oneidensis MR-1
- locusId -- VIMSS or MicrobesOnline identifier
- sysName -- systematic name or LocusTag
- name -- gene name
- desc -- gene description
- TIGR -- TIGR subrole, if any
- conserv -- core, HGT, or other
- confirmed_auxotroph -- TRUE if classified as a biosynthetic gene (by TIGR subrole) and sick in minimal lactate media but not in LB
- CorExprFitness -- the correlation between this gene's expression and fitness across 15 matched experiments
- EFcode -- constitutive (expression), ribo (growth-regulated), const_fit (no strong fitness effects), or variable (both expression and fitness vary in our 15 matched conditions)
- fit_Inosine ... fit_NAGfum -- per-gene fitness in 15 conditions. Also see these fitness experiments at MicrobesOnline.
- expr_Inosine ... expr_NAGfum -- log2 level of expression in 15 conditions (conditions are matched with the fitness data)
- growth_expr_lac020min ... growth_expr_LB540min -- log2 level of expression at various times or ODs during batch growth on lactate, N-acetylglucosamine, or LB
- Expression compendium for S. oneidensis MR-1 (23 MB)
- (This data is from MicrobesOnline, not from this study, but the normalization has changed slightly.)
- See protein-coding genes for metadata for each locusId.
- Fitness compendium for S. oneidensis MR-1 (12 MB)
- Tiling data for S. oneidensis MR-1 in minimal lactate media (185 MB)
- PROBE_ID -- arbitrary identifier
- scaffoldId -- 139 for the main chromosome; 140 for the megaplasmid
- strand -- the strand of the probe's sequence, the same as the strand of the RNA it assays (because we hybridize to 1st-strand cDNA)
- begin and end -- the extent of the probe
- match2 -- non-zero if this is a potentially cross-hybridizing probe.
- code -- 1 if coding, 0 if intergenic, -1 if antisense
- genomic -- log2(raw intensity) for a control hybridization with genomic DNA
- min -- log2(raw intensity) for a hybridiztion to 1st-strand cDNA from minimal media
- nA, nC, nG, nT -- nucleotide composition of the probe
- norm -- normalized log-level expression of the probe
- 5' RNASeq data for S. oneidensis MR-1 in minimal lactate media (18 MB)
- scaffoldId, strand, and start show the beginning of the reads (the putative 5' end of a transcript)
- n shows the number of reads at that location.
- Transcript start sites associated with genes in S. oneidensis MR-1
- n -- the number of reads from 5' RNASeq
- locusId -- the gene
- scaffoldId, strand, start -- the location of the transcript start
- at -- the location of the nearby rise in the tiling data
- r -- the "local correlation" of that rise (|r| near 1 is a very sharp rise)
- gstart -- the 5' end of the gene
Tab-delimited data for Zymomonas mobils ZM4
- Data for genes in Z. mobilis ZM4 (only genes with both expression and fitness data are included).
- locusId -- VIMSS or MicrobesOnline identifier
- scaffoldId -- 277402 is the main chromosome
- begin, end, strand -- location of the gene
- name -- annotated name (if any)
- sysName -- systematic name or locus tag
- desc -- gene description
- type -- 1 for protein-coding, 5 for tRNA
- confirmed_auxotroph -- TRUE if classified as a biosynthetic gene (by TIGR subrole) and sick in minimal glucose medium (ZMMG) but not rich glucose medium (ZMRG)
- CorExprFitness -- the correlation between this gene's expression and fitness across 18 conditions
- EFcode -- constitutive (expression), ribo (growth-regulated), const_fit (no strong fitness effects), or variable (both expression and fitness vary in our 18 conditions)
- ZMMG.fit ... ZMRG1.fit -- per-gene fitness in 18 conditions. Some of the fitness values are averages across replicates or are not at exactly the concentration indicated here, see here
- ZMMG.expr ... ZMRG1.expr -- log2 level of expression in those 18 conditions. (Here the concentrations are correct.)
- Expression data for Z. mobilis ZM4 (this includes all genes that we have expression data for and includes conditions that do not match the fitness dta; these were used help identify constitutive and growth-regulated genes)
Tab-delimited data for Desulfovibrio alaskensis G20
- Protein-coding genes in D. alaskensis G20 (only genes with both expression and fitness data are included)
- locusId -- VIMSS id or MicrobesOnline identifier
- sysName -- systematic name or locus tag
- begin, end, and strand -- location of the gene on the chromosome
- desc -- gene description
- erich and emin -- log2 expression in rich (LS4) and minimal (LS4D) media, respectively, from tiling data
- frich and fmin -- per-gene fitness in rich (LS4) and minimal (LS4D) media, respectively. These are the average of values from 1 or 2 transfers (see day1 and day 2 at MicrobesOnline)
- confirmed_auxotroph -- TRUE if classified as a biosynthetic gene (by TIGR subrole) and sick in LS4D but not LS4
- Tiling data for D. alaskensis G20 in minimal and rich lactate media (176 MB)
- PROBE_ID -- arbitrary identifier
- strand -- the strand of the probe's sequence, the same as the strand of the RNA it assays (because we hybridize to 1st-strand cDNA)
- begin and end -- the extent of the probe
- match2 -- non-zero if this is a potentially cross-hybridizing probe.
- nA,nC,nG,nT -- nucleotide content of the probe
- genomic -- raw intensity from a control hybridization with genomic DNA
- min, rich -- raw intensity from cDNA from minimal or rich lactate medium
- min.norm, rich.norm -- normalized log2 expression in minimal or rich lactate medium
GEO Accessions
Links to Gene Expression omnibus:
- Gene expression data for S. oneidensis MR-1, GSE39462
- High-resolution "tiling" microarray data for S. oneidensis MR-1 in minimal lactate media, GSE39468
- Transcript start sequencing (5' RNASeq) for S. oneidensis MR-1 in minimal lactate media, GSE39474
- Gene expression data for Z. mobilis ZM4, GSE39466
- High-resolution "tiling" microarray data for D. alaskensis G20 in supplemented or minimal lactate-sulfate media, GSE39471
Supplementary Table S1: Pairs of functionally-related genes in Shewanella oneidensis MR-1 that are not in the same operon and are not coexpressed.
See tab-delimited table.
We list pairs of genes that are cofit and in the same functional
category (TIGR subrole) but are not in the same operon, near each other in the genome, or coexpressed.
For each pair, we manually examined their annotations and their
fitness patterns to determine if they truly had closely-related
functions or not. For pairs of flagellar genes, we also report whether they are coregulated and in the same
``class'' in Pseudomonas aeruginosa according to Dasgupta et al 2003.
R source code and image
- AvgPool.R -- code for processing and normalizing fitness data
- exprVfitness.R -- code for making the plots
- utils.R -- utility functions
- R image (17 MB) with data for most of the analyses -- this can be loaded into R with load(). Note the code has not been loaded, use source() for that.
Morgan Price, July 2012
Contact: morgannprice@yahoo.com, aparkin@lbl.gov