Greenbaum et al 
15
transcriptome data set, which does not have this experimental bias.   This may be related to the 
enrichment of functional categories that are connected to cytoplasmic proteins, such as "protein 
synthesis." 
 
Limitations Given the Small Size of the Protein Abundance Data 
Even with the extended coverage made possible by merging many datasets together into our two 
reference sets, we still found that the largest complication in our analysis was the limited amount of 
data. This was, obviously, most applicable to the protein abundance measurements. In addition to 
giving us fewer data points for our statistics, the small number of protein abundance measurements 
potentially biased our statistical results towards certain protein families. The 181 proteins in Gprot 
are certainly not a random selection from the possible 6280 in yeast. They are, rather, skewed 
towards well-studied proteins that are highly expressed.  Our methodology attempts to control for 
this gene-selection bias through our enrichment formalism, which allows one to rather precisely 
gauge various aspects of the bias. 
 
Our results will certainly be more complete and definitive when larger proteomics datasets become 
available, which we anticipate to become available soon (Smith 2000). However, we believe that 
the essential formalism and approach that we develop will remain quite relevant for all future 
datasets.  
 
Although the translatome data we used in our study is small in comparison to the information on 
the genome and transcriptome, many protein features in both the translatome and the transcriptome 
are dominated by the very highly expressed proteins (to which the 2-DE experiments are biased).  
Under this circumstance, it is often sufficient to look at this smaller number of dominating proteins 
to approximately characterize the whole population. This is similar in spirit to the development of 
the Codon Adaptation Index for Yeast (Sharp & Li 1987).  While based on only 24 highly 
expressed proteins, it has proven to be robust in predicting expression levels for the entire genome. 
In contrast, the experimental bias in the selection of proteins with particular biophysical properties 
should be of more concern. 
 
Future Directions    
Besides the recapitulation of our computations with the release of new data, we also hope to expand 
this analysis to other organisms.  While presently we have limited our study to yeast gene 
expression, there are other potential model organisms for which there are expression experiments.  
Moreover, we have also limited ourselves to Gene Chip experiments, but it may be worthwhile to 
analyze cDNA microarray data sets  (DeRisi et al. 1997; Cho et al. 1998; Winzeler et al. 1999).  
We can use these sizeable microarray data sets to study changes in protein features over time.   
Supplementary Material 
Supplementary material is available at http://genecensus.org/expression/translatome