NIDA Overall Progress

Development of PARE

To date, the Gerstein lab has directed its efforts towards comparing mRNA expression and protein abundance levels. Datasets have been obtained from several sources, including published datasets from the literature, public databanks such as the NIH Gene Expression Omnibus (GEO), and collaborations with other Yale NIDA investigators (in particular, the Nairn lab).

Experimentally, the measurement of mRNA expression levels is rapid relative to the direct measurement of protein abundance; ideally, determining the factors that influence the correlation would allow protein abundance to be predicted from mRNA expression. In the absence of other competing processes, the translation of mRNA to protein would be expected to have a direct relationship; i.e., follow first-order kinetics. However, comparison of genome-wide datasets has shown only low correlations between the two types of measurements. We are currently focusing on determining biologically relevant subsets for which the correlation is particularly high or low in order to determine factors by which a predictive model of quantitative correlations can be improved.

We developed a web-based tool, PARE, for automated comparison of mRNA expression levels with protein abundance measurements. This tool, located at http://proteomics.gersteinlab.org, can be used to correlate currently available datasets (selected from menus by the user) or to analyze user-uploaded datasets.

Specifically, once the mRNA expression and protein abundance datasets have been selected or uploaded, PARE generates a plot of mRNA expression versus protein abundance as output to the web browser (as direct correlation or in log-log format). The user can select whether the plot contains all of the data for the selected datasets or correlates a focused subset of the data: available subsets include GO categories (according to biological process, molecular function, or cellular component) and MIPS defined complexes. Using these focused categories, we have identified subsets that have substantially higher or lower correlations between mRNA expression levels and protein abundance. We also highlight proteins that deviate significantly from the mRNA-protein correlation. More detailed experimental study of these outliers (which are labeled on the plots) should give clues to the processes that cause deviation from the expected first-order relation. As this information is obtained, a feedback loop between experiment and model can be established to update the correlation model.