Greenbaum et al
15
transcriptome data set, which does not have this experimental bias. This may be related to the
enrichment of functional categories that are connected to cytoplasmic proteins, such as "protein
synthesis."
Limitations Given the Small Size of the Protein Abundance Data
Even with the extended coverage made possible by merging many datasets together into our two
reference sets, we still found that the largest complication in our analysis was the limited amount of
data. This was, obviously, most applicable to the protein abundance measurements. In addition to
giving us fewer data points for our statistics, the small number of protein abundance measurements
potentially biased our statistical results towards certain protein families. The 181 proteins in Gprot
are certainly not a random selection from the possible 6280 in yeast. They are, rather, skewed
towards well-studied proteins that are highly expressed. Our methodology attempts to control for
this gene-selection bias through our enrichment formalism, which allows one to rather precisely
gauge various aspects of the bias.
Our results will certainly be more complete and definitive when larger proteomics datasets become
available, which we anticipate to become available soon (Smith 2000). However, we believe that
the essential formalism and approach that we develop will remain quite relevant for all future
datasets.
Although the translatome data we used in our study is small in comparison to the information on
the genome and transcriptome, many protein features in both the translatome and the transcriptome
are dominated by the very highly expressed proteins (to which the 2-DE experiments are biased).
Under this circumstance, it is often sufficient to look at this smaller number of dominating proteins
to approximately characterize the whole population. This is similar in spirit to the development of
the Codon Adaptation Index for Yeast (Sharp & Li 1987). While based on only 24 highly
expressed proteins, it has proven to be robust in predicting expression levels for the entire genome.
In contrast, the experimental bias in the selection of proteins with particular biophysical properties
should be of more concern.
Future Directions
Besides the recapitulation of our computations with the release of new data, we also hope to expand
this analysis to other organisms. While presently we have limited our study to yeast gene
expression, there are other potential model organisms for which there are expression experiments.
Moreover, we have also limited ourselves to Gene Chip experiments, but it may be worthwhile to
analyze cDNA microarray data sets (DeRisi et al. 1997; Cho et al. 1998; Winzeler et al. 1999).
We can use these sizeable microarray data sets to study changes in protein features over time.
Supplementary Material
Supplementary material is available at http://genecensus.org/expression/translatome