Genome-wide fitness data for Shewanella oneidensis MR-1

A paper about this data set: Evidence-Based Annotation of Gene Function in Shewanella oneidensis MR-1 Using Genome-Wide Fitness Profiling across 121 Conditions

Background

Shewanella oneidensis strain MR-1 (formerly known as S. putrefaciens) is a model organism for studying metal reduction, as MR-1 can utilize a wide range of metal ions and solid metals as electron acceptors and also grows aerobically. MR-1 is in the same division of bacteria as E. coli (the Gammaproteobacteria), but they are not closely related. Of the ~4,500 proteins in MR-1, only about a third have orthologs in E. coli. The MR-1 genome sequence was published in 2002 and the annotation has been curated since. A few hundred papers have been published on MR-1, and hundreds of gene expression experiments are publicly available.

The Adam Arkin Lab at UC Berkeley has created a large number of S. oneidensis MR-1 transposon insertions with known location and with a known tag or barcode. These insertions are pooled together into two pools, and the pools are grown under a given (stress) condition for ~6-8 generations. Typically, the stress experiments are performed in LB media with the stressor in well-shaken (aerobic) flasks, and a concentration of the stressor that reduces the growth rate about 2-fold is used.

The abundance of each tagged strain is measured with microarray at the beginning and at the end of the experiment. The fitness of the strain is the log2 ratio of these abundances. (This is not the same scale as fitness in population genetics.) The data is normalized so that the median strain has a fitness of 0. The fitness value of a gene is computed as the average of the values for the insertions in that gene. In this experiment it is assumed that the insertions of a given gene deactivate that gene.

The reliability of these per-gene fitness values is estimated by looking at consistency across different insertions in the same gene and at consistency across the two pools. In a typical experiment, some strains are very sick (fitness < -2 imply little or no growth), some strains are moderately but significantly sick (fitness ~ -1), most strains have fitness near 0 (are neutral), and a handful of strains are advantaged (fitness ~ 1).

Tab-delimited files for download:

Genes in MR-1
- scaffoldId: 139 for the main chromosome; 140 for the megaplasmid.
- locusId: locus identifier in MicrobesOnline, also referred to as VIMSS id, or as "gene" in the header of some tables.
- ec and ecName -- the b-number identifier and the gene name for the E. coli ortholog, if any.
Descriptions of the 195 fitness experiments
- Chip: the identifier for the experiment; used as the column name in the data tables.
- Group and info: the type of experiment and a description of the condition.
  - Unless otherwise specified, we use a defined medium with lactate, ammonia, sulfate, and phosphate, and in air (well shaken).
  - Anaerobic experiments list an alternate electron acceptor and often list an alternate carbon source (otherwise lactate is the carbon source and electron donor).
  - Stress experiments are in rich media (LB). Some of these are plain LB experiments (no stress added). We generally add enough stress to cut the growth rate in half.
Fitness data for 195 experiments
- good: TRUE if we have insertion(s) within the central 5-80% of the gene.
- Numeric column names are chip numbers
- The values are log2 ratios of the abundance at the end of the experiment versus the beginning, which is usually 6-8 generations of growth. These "fitness values" are normalized so that the median is zero.
More details on the 195 experiments:
- Z scores for the fitness values for 195 experiments
- The up-tag and down-tag pools
- The per-strain and per-pool fitness data for the 195 experiments

Viewing the data in MeV:

Download MR1_fitness.mev.
Run MeV, use the File / Load Data command, use Browse to select MR1_fitness.mev, select two-color array, uncheck load annotation, and select the upper/left-most expression value (under Fe(III)).
Check that the color scheme limits are reasonable (i.e., -3 to 0 to 3) using Display / Set Color Scale Limits.
Use Display / Gene Row Labels to set "comb" as the gene row label (this will show the VIMSS/MicrobesOnline id, the SO number, the gene's name if any, and the gene's description).
You might also want to use Display / Set Element Size and Analysis / Clustering / HCL.

R source code and image:

R image (R-2.11, 199 MB)
MR1.R -- code to create the image (especially see comment at top documenting the objects in the image)
AvgPool.R -- code to compute fitness values and Z scores
utils.R -- utilities used by AvgPool.R and MR1.R
expr.R -- code to normalize expression data

Other resources:

Browse MR-1 fitness data (at MicrobesOnline)
- We have changed our normalization procedure een renormalized since the 195 experiments were published, so the values on the web site do not exactly match the values in the tables.
Browse MR-1 expression data
A universal TagModule collection for parallel genetic analysis of microorganisms (Oh et al. 2010)
- This describes our methods, but we have made some changes since this paper was written: we are now using many more strains; we are using two pools rather than one; and we are using a single time point for most experiments.
Fitness data for E. coli
- Note, the E. coli data uses a rather different measurement technology (camera to measure the size of colonies on 1,000-well plates). Also, the E. coli values are significance scores, not fitness values.

Page by Morgan Price in the Arkin group
Fitness data collected by Adam M. Deutschbauer and others in the Arkin group