text_figures[1]

Greenbaum et al 15 transcriptome data set, which does not have this experimental bias. This may be related to the enrichment of functional categories that are connected to cytoplasmic proteins, such as "protein synthesis." Limitations Given the Small Size of the Protein Abundance Data Even with the extended coverage made possible by merging many datasets together into our two reference sets, we still found that the largest complication in our analysis was the limited amount of data. This was, obviously, most applicable to the protein abundance measurements. In addition to giving us fewer data points for our statistics, the small number of protein abundance measurements potentially biased our statistical results towards certain protein families. The 181 proteins in G_prot are certainly not a random selection from the possible 6280 in yeast. They are, rather, skewed towards well-studied proteins that are highly expressed. Our methodology attempts to control for this gene-selection bias through our enrichment formalism, which allows one to rather precisely gauge various aspects of the bias. Our results will certainly be more complete and definitive when larger proteomics datasets become available, which we anticipate to become available soon (Smith 2000). However, we believe that the essential formalism and approach that we develop will remain quite relevant for all future datasets. Although the translatome data we used in our study is small in comparison to the information on the genome and transcriptome, many protein features in both the translatome and the transcriptome are dominated by the very highly expressed proteins (to which the 2-DE experiments are biased). Under this circumstance, it is often sufficient to look at this smaller number of dominating proteins to approximately characterize the whole population. This is similar in spirit to the development of the Codon Adaptation Index for Yeast (Sharp & Li 1987). While based on only 24 highly expressed proteins, it has proven to be robust in predicting expression levels for the entire genome. In contrast, the experimental bias in the selection of proteins with particular biophysical properties should be of more concern. Future Directions Besides the recapitulation of our computations with the release of new data, we also hope to expand this analysis to other organisms. While presently we have limited our study to yeast gene expression, there are other potential model organisms for which there are expression experiments. Moreover, we have also limited ourselves to Gene Chip experiments, but it may be worthwhile to analyze cDNA microarray data sets (DeRisi et al. 1997; Cho et al. 1998; Winzeler et al. 1999). We can use these sizeable microarray data sets to study changes in protein features over time. Supplementary Material Supplementary material is available at http://genecensus.org/expression/translatome