Greenbaum et al
11
To measure the statistical significance of the results on amino acid enrichment, we have performed
a control analysis on a randomized dataset (Figure 3D). We randomly permutated the expression
values of the ORFs 1000 times and then recomputed the enrichments. This allowed us to compute
distributions for the amino acid enrichments and, from integrating these, one-sided p-values
indicating the significance of the observed enrichments.
Biomass Enrichment
A corollary to amino acid enrichments is the determination of the average biomass of the
transcriptome and translatome populations. We show this in Figure 3C. We found that the average
molecular weight of a protein in both populations was, on average, lower than in the genome
population. These preliminary observations suggest a cell preference to use less energetically
expensive proteins for those that are highly transcribed or translated. However, we also found that
the average molecular weight per amino acid differed much less between the transcriptome and the
translatome on the one hand, and the genome on the other hand (though it was still slightly less).
This finding indicates that lower molecular weights in the translatome and transcriptome
populations relative to the genome population are predominantly due to greater expression of
shorter proteins rather than the incorporation of smaller amino acids.
Secondary Structure Composition
We also used our methodology to study the enrichment of secondary-structural features. Secondary
structural annotation was derived from structure prediction applied uniformly to all the ORFs in the
yeast genome as described in Table 1. As shown in Figure 4A, all three populations genome,
transcriptome, and translatome had a fairly similar composition of secondary structures -- sheets,
helices, and coils. The differences between populations were marginal and based only on the small
subset of genes. They do, though, point to a possible trend of depletion of random coils relative to
alpha helices and beta sheets in the transcriptome and translatome.
We also found that transmembrane proteins were significantly depleted in the transcriptome (see
website). To identify transmembrane (TM) proteins, we used the GES hydrophobicity scale as
described previously (see caption to Table 1 (Gerstein 1998)). These results are consistent with our
previous analyses (Jansen & Gerstein 2000). This analysis could not be extended to the
translatome because the 181 genes in the protein abundance data set (GProt) do not contain any
membrane proteins, which are difficult to detect in gel electrophoresis (Molloy 2000).
Subcellular Localization
A generalization of the transmembrane protein analysis is subcellular localization. We looked into
the enrichment of proteins associated with the various subcellular compartments. This is shown in
Figure 4C. For clarity, we divided the cell into five distinct subcellular compartments, as described
in Table 1. We found that, in comparison to the genome, both the transcriptome and translatome are
enriched in cytoplasmic proteins. This is true whether we make our comparisons in relation to the
relatively large reference mRNA expression set or the smaller reference protein abundance set. As
figure 4C shows, the 2D gel experiments are clearly biased towards proteins from the cytoplasm.
However, in the biased subset Gprot transcription and translation lead to an even higher fraction of
cytoplasmic proteins in the translatome.