text_figures[1]

Greenbaum et al 11 To measure the statistical significance of the results on amino acid enrichment, we have performed a control analysis on a randomized dataset (Figure 3D). We randomly permutated the expression values of the ORFs 1000 times and then recomputed the enrichments. This allowed us to compute distributions for the amino acid enrichments and, from integrating these, one-sided p-values indicating the significance of the observed enrichments. Biomass Enrichment A corollary to amino acid enrichments is the determination of the average biomass of the transcriptome and translatome populations. We show this in Figure 3C. We found that the average molecular weight of a protein in both populations was, on average, lower than in the genome population. These preliminary observations suggest a cell preference to use less energetically expensive proteins for those that are highly transcribed or translated. However, we also found that the average molecular weight per amino acid differed much less between the transcriptome and the translatome on the one hand, and the genome on the other hand (though it was still slightly less). This finding indicates that lower molecular weights in the translatome and transcriptome populations relative to the genome population are predominantly due to greater expression of shorter proteins rather than the incorporation of smaller amino acids. Secondary Structure Composition We also used our methodology to study the enrichment of secondary-structural features. Secondary structural annotation was derived from structure prediction applied uniformly to all the ORFs in the yeast genome as described in Table 1. As shown in Figure 4A, all three populations – genome, transcriptome, and translatome – had a fairly similar composition of secondary structures -- sheets, helices, and coils. The differences between populations were marginal and based only on the small subset of genes. They do, though, point to a possible trend of depletion of random coils relative to alpha helices and beta sheets in the transcriptome and translatome. We also found that transmembrane proteins were significantly depleted in the transcriptome (see website). To identify transmembrane (TM) proteins, we used the GES hydrophobicity scale as described previously (see caption to Table 1 (Gerstein 1998)). These results are consistent with our previous analyses (Jansen & Gerstein 2000). This analysis could not be extended to the translatome because the 181 genes in the protein abundance data set (G_Prot) do not contain any membrane proteins, which are difficult to detect in gel electrophoresis (Molloy 2000). Subcellular Localization A generalization of the transmembrane protein analysis is subcellular localization. We looked into the enrichment of proteins associated with the various subcellular compartments. This is shown in Figure 4C. For clarity, we divided the cell into five distinct subcellular compartments, as described in Table 1. We found that, in comparison to the genome, both the transcriptome and translatome are enriched in cytoplasmic proteins. This is true whether we make our comparisons in relation to the relatively large reference mRNA expression set or the smaller reference protein abundance set. As figure 4C shows, the 2D gel experiments are clearly biased towards proteins from the cytoplasm. However, in the biased subset G_prot transcription and translation lead to an even higher fraction of cytoplasmic proteins in the translatome.