text_figures[1]

Greenbaum et al 19 distributions of random enrichments that can then be compared against the observed enrichments. In the plot the gray bars represent the observed enrichments already shown in figure 3a. On top of the gray bars we show standard boxplots of enrichment distributions based on 1000 random permutations. (The middle line represents the distribution median. The upper and lower sides of the box coincide with the upper and lower quartiles. Outliers are shown as dots and defined as data points that are outside the range of the whiskers, the length of which is 1.5 the interquartile distance.) Based on the random distributions, we can compute one-sided P-values for the observed enrichments. Amino acids for which the P-values are less than 10^-3 are shown in bold font. Figure 4, Breakdown of the Transcriptome and Translatome in terms of Broad Categories relating to Structure, Localization, and Function All of the subfigures are analogous to the schematic illustration in figure 1. Part A represents the composition of secondary structure in the different populations. In general, the secondary structure compositions appear to be relatively stable across the different populations. The most notable change from genome to translatome is perhaps the depletion of coils -- that is, relatively unordered structures compared to the more structured helices and sheets -- by about 4%. Part B represents the distribution of subcellular localizations associated with proteins in the various populations. We used standardized localizations developed earlier (Drawid & Gerstein 2000), which, in turn, were derived from the MIPS, YPD, and Swiss-Prot databases (Bairoch & Apweiler 2000; Costanzo et al. 2000; Mewes et al. 2000). The subcellular localization has been experimentally determined for less than half of the yeast proteins, so our analysis applies only to this subset. The most notable difference between genome, transcriptome and translatome is the strong enrichment of cytoplasmic proteins. This is in agreement with our previous observations (Drawid et al. 2000). This also explains to some degree the observations for the functional classes in part C. For example, the functional group "energy" is mostly dominated by the highly expressed glycolytic proteins found in the cytoplasm. The depletion of the functional group "transcription" makes sense in the light of the strong depletion for nuclear proteins. We have argued before (Drawid et al. 2000) that the number of proteins in a particular subcellular compartment may be roughly related to the size of the compartment. For instance, membrane proteins occupy the relatively small "two-dimensional" space in lipid bi-layers. We also performed a separate, independent calculation for a more comprehensive list of transmembrane segments, which were predicted computationally (see caption of Table 1). This largely confirms the result. (Data not shown.) Part C shows the division of ORFs into different functional categories (according to the MIPS classification) in the various populations. Only the largest functional categories of the top level of the MIPS classification are shown. The group "Other" contains the smaller top-level categories lumped together. This “Other” group is different from the group "Unclassified," which contains genes without any functional description. One complication is that many genes have multiple functional classifications such that they may be counted in more than one category (this explains why the group "Unclassified" has only a size of 28% for the genome population although the