Data Analysis | |
Data
analysis for the sorbitol, pH 8, 1M salt, Nystatin, minimal media and
galactose conditions was performed as follows: A sample size of 10-15
hybridizations were collected for the control condition (YPD at 30oC)
for each generation time (5 and 15). The data was centered using the
mean intensity across the chip after censoring poor readings as
described in the preprocessing section. For each generation time, a
Gaussian distribution was fit to the base 10 logarithm of the signal
intensity for each tag across all hybridizations. In any particular
experiment (defined as a condition measured at 5 or 15 generations),
the likelihood of observing a tag's intensity under the control
distribution was calculated. The fitness of a strain is then found by
averaging the likelihood of the 4 tags associated with that ORF (see
below). For computational and data presentation purposes, the
negative natural log of this value is taken and dubbed the 'fitness
defect score'. Therefore, the larger the fitness defect score the
greater the probability a strain has a significant growth defect in
the condition tested. In a statistical sense, this is a hypothesis
test of observing the intensity in the condition experiment in normal
YPD at 30oC conditions. For complete data sets see:
http://genomics.lbl.gov/YeastFitnessData/websitefiles/cel_index.html.(Note: This opens a new window to a different location) Based on comparisons with
known and observed biological significance, our cutoff for
significance included strains with fitness defects of greater than 20
in both 5 generation experiments and greater than 100 in both 15
generation experiments. This cutoff is stringent enough so that we
are confident that these strains exhibited a reproducible decrease in
fitness. Because the genes required for growth in the lys-, trp- and thr- media are confounded by those required for growth in minimal medium, a filter was applied to these data. A likelihood ratio test between the amino acid condition and the minimal medium condition is successful in accomplishing this filtering. If the likelihood ratio was greater than 15, we considered the strain to have a significant fitness defect in the dropout media only. To determine slow growing strains in the control condition (YPD at 30oC), a ratio of the average intensity of a given strain taken directly from 'post -80oC thaw' over the average intensity of the strain after ~10 generations of growth was calculated. The data for these experiments was generated from 10 independent experiments. The strains with the highest ratios at 10 generations are those that grow most slowly in YPD. Based on the comparison of the intensity ratios to individual grow curves we can make an estimate of the relative growth rate with respect to wild-type by comparing these ratios to individual strain growth rates (see Figure S1 [here])
|
|
Tag and microarray Preprocessing | |
Each deletion strain is
associated with 4 hybridization signals on the high-density
oligonucleotide array: UPTAG sense, UPTAG antisense, DNTAG sense and
DNTAG antisense. To classify the tags that do not hybridize to the
array well enough to make valuable predictions, a distribution
describing background tag behavior was generated using ~8,000 tags on
the array that are not associated with any strain. These
oligonucleotides represent the background intensity of the array. A
small fraction of these background oligonucleotides were found to
cross hybridize to tags in the strains. These tags were eliminated
from further analysis. For each tag distribution (generated from 10
time zero hybridizations) a Kolmogorov-Smirnov test of distributional
similarity was applied1. The null hypothesis is that the
two samples (background and tag) come from the same underlying
distribution. Tags with a p-value of greater than 0.05 by this test
(1663) were considered too similar to background to yield predictive
results and were eliminated from the analysis. 77 additional tags
were discarded from the analysis because they hybridized with signal
levels below that of the background distribution (signal intensity
less then 400). Overall, 95.3% of the tags were included in the
analysis.
|
|
Comparison of expression data to fitness data | |
A Spearman's
rank test was performed comparing the expression data of the genes
that exhibited a significant increase in expression (as defined by the
authors) with the fitness data of the genes that exhibited a
significant sensitive fitness defect and no correlation was found.
However, because of the somewhat arbitrary cutoff points for
significance in both datasets, it is difficult to make strong
conclusions based on statistical tests alone and it is possible that a
subset of the data may exhibit such correlation. For these reasons,
we opted not to include any further statistical tests of association
as the results could lead to a false conclusion. Instead, we report
the percent of the genes significantly up regulated that also have a
significant fitness defect score (see Expression Comparison Table [here]). 1. Chakravarti, L., Roy. Handbook of Methods of Applied Statistics (John Wiley and Sons, 1967). |