text_figures[1]

Greenbaum et al 17 Figure 1, Schematic overview of the analysis On the left side we outline the terms we use to describe the process of gene expression. The coding section of the genome is transcribed into a population of mRNA transcripts called the "transcriptome". The transcripts in turn are translated to a population of proteins; we use the term "translatome" for this protein population rather than the alternative "proteome" because the latter term may be confounded with the protein complement of the genome (which is not necessarily associated with a quantitative abundance level). The matrix in the middle schematically shows an analysis of the three stages of expression. In general, we define a protein "population" as a set of genes associated with a corresponding number of expression or abundance levels ("weights"). In the matrix each row represents a weight and each column a gene set. In particular, we differentiate between the mRNA reference expression set (G_mRNA = G_Gen), which essentially covers the complete genome, and the reference protein abundance set (G_Prot) which contains the proteins in data sets 2-DE #1 and 2-DE #2 (see table 1) because the protein abundance set is a significantly smaller subset of the genome. By definition, this subset contains only proteins that can be identified by 2-D gel electrophoresis and is therefore biased in this sense. The enrichment figures throughout this paper, through a comparison of the right and left sides of this figure, show the results of the experimental biases of 2D gels on the data set. Each pie chart represents a composition of a particular protein feature F (for instance, an amino acid composition) in a population (represented by the symbol m) . We can further look at the "enrichment" of this feature in one population relative to another (represented by the symbol D , see section "Methods" for an explanation of the formalism). For simplification, we neglect the effects of post-transcriptional and post-translational modifications that might alter the features of proteins (they affect the expression levels but this is largely accounted for by the measurements). In this study we analyze protein features as they are represented in the genome. Figure 2, mRNA expression levels vs. protein abundance levels Part A of this figure shows the reference protein abundance levels plotted against the mRNA reference expression levels on a log-log scale; this plot is similar to the one reported by Futcher et al. (1999) earlier. The trend line is described by the equation y = 5.20x^0.61 where y represents the protein abundance level (in units of 10³ copies/cell) and x the mRNA expression level (in units of copies/cell). The dashed lines indicate a distance of 1.85 standard deviations (in the log scale) from the trend line. The outliers beyond the dashed lines are listed in Part B. For each of these outlier ORFs we show a description of their function and their respective MIPS categories (the numbers are defined in Figure 4C). With one exception, all outliers are associated with cellular organization (MIPS category 30). Those outliers that have a high level of protein abundance relative to the expected amount of mRNA expression are dominated by the alcohol and G3P dehydrogenases. Translation-related proteins are prominent in the group of those proteins with low protein