Genome-wide fitness data for Shewanella oneidensis MR-1
A paper about this data set: Evidence-Based Annotation of Gene Function in Shewanella oneidensis MR-1 Using Genome-Wide Fitness Profiling across 121 Conditions
Background
Shewanella oneidensis strain MR-1 (formerly known as S. putrefaciens) is a model organism for studying metal reduction, as MR-1 can utilize a wide range of metal ions and solid metals as electron acceptors and also grows aerobically. MR-1 is in the same division of bacteria as E. coli (the Gammaproteobacteria), but they are not closely related. Of the ~4,500 proteins in MR-1, only about a third have orthologs in E. coli. The MR-1 genome sequence was published in 2002 and the annotation has been curated since. A few hundred papers have been published on MR-1, and hundreds of gene expression experiments are publicly available.
The Adam Arkin Lab at UC Berkeley has created a large number of S. oneidensis MR-1 transposon insertions with known location and with a known tag or barcode. These insertions are pooled together into two pools, and the pools are grown under a given (stress) condition for ~6-8 generations. Typically, the stress experiments are performed in LB media with the stressor in well-shaken (aerobic) flasks, and a concentration of the stressor that reduces the growth rate about 2-fold is used.
The abundance of each tagged strain is measured with microarray at the beginning and at the end of the experiment. The fitness of the strain is the log2 ratio of these abundances. (This is not the same scale as fitness in population genetics.) The data is normalized so that the median strain has a fitness of 0. The fitness value of a gene is computed as the average of the values for the insertions in that gene. In this experiment it is assumed that the insertions of a given gene deactivate that gene.
The reliability of these per-gene fitness values is estimated by looking at consistency across different insertions in the same gene and at consistency across the two pools. In a typical experiment, some strains are very sick (fitness < -2 imply little or no growth), some strains are moderately but significantly sick (fitness ~ -1), most strains have fitness near 0 (are neutral), and a handful of strains are advantaged (fitness ~ 1).
Tab-delimited files for download:
- Genes in MR-1
- scaffoldId: 139 for the main chromosome; 140 for the megaplasmid.
- locusId: locus identifier in MicrobesOnline, also referred to as VIMSS id, or as "gene" in the header of some tables.
- ec and ecName -- the b-number identifier and the gene name for the E. coli ortholog, if any.
- Descriptions of the 195 fitness experiments
- Chip: the identifier for the experiment; used as the column name in the data tables.
- Group and info: the type of experiment and a description of the condition.
- Unless otherwise specified, we use a defined medium with lactate, ammonia, sulfate, and phosphate, and in air (well shaken).
- Anaerobic experiments list an alternate electron acceptor and often list an alternate carbon source (otherwise lactate is the carbon source and electron donor).
- Stress experiments are in rich media (LB). Some of these are plain LB experiments (no stress added). We generally add enough stress to cut the growth rate in half.
- Fitness data for 195 experiments
- good: TRUE if we have insertion(s) within the central 5-80% of the gene.
- Numeric column names are chip numbers
- The values are log2 ratios of the abundance at the end of the experiment versus the beginning, which is usually 6-8 generations of growth. These "fitness values" are normalized so that the median is zero.
- More details on the 195 experiments:
Viewing the data in MeV:
- Download MR1_fitness.mev.
- Run MeV, use the File / Load Data command, use Browse to select MR1_fitness.mev, select two-color array, uncheck load annotation, and select the upper/left-most expression value (under Fe(III)).
- Check that the color scheme limits are reasonable (i.e., -3 to 0 to 3) using Display / Set Color Scale Limits.
- Use Display / Gene Row Labels to set "comb" as the gene row label (this will show the VIMSS/MicrobesOnline id, the SO number, the gene's name if any, and the gene's description).
- You might also want to use Display / Set Element Size and Analysis / Clustering / HCL.
R source code and image:
- R image (R-2.11, 199 MB)
- MR1.R -- code to create the image (especially see comment at top documenting the objects in the image)
- AvgPool.R -- code to compute fitness values and Z scores
- utils.R -- utilities used by AvgPool.R and MR1.R
- expr.R -- code to normalize expression data
Other resources:
Page by Morgan Price in the Arkin group
Fitness data collected by Adam M. Deutschbauer and others in the Arkin group