The Bioinformatics and Biostatistics Core (BBC) at the Yale/NIDA Neuroproteomics Research Center aims at meeting the computational, statistical, and database challenge (including data quality, management, integration, analysis, and archiving) posed by the Center’s ongoing neuroproteomics research. As a key bioinformatics participant, the Gerstein lab will interact synergistically with the other Core members to achieve the following specific aims:
1. Expanding the research directed at more accurately predicting
protein levels
from mRNA expression data to the analysis of relative changes in mRNA
and protein
expression;
2. Further develop our approaches to identify classes of proteins whose
mRNA
level and protein expression show high degrees of correlation (Greenbaum
et
al, 2001, 2003), and apply these approaches to existing mRNA expression
data
sets from other Neuroproteomics Center members (e.g., Dr. Ron Duman) to
identify
the proteins that are most likely to be significantly differentially
regulated
and target those among these proteins suitable for directed MS/MS and/or
protein
microarray analysis;
3. Routinely consult with other members of the Neuroproteomics Center to
assist
in interpreting the biological significance of the results of protein
expression
studies and their correlation with mRNA expression data.
Background and Significance
Even with the significant developments in the technologies used to
quantify
protein abundance over the past few years, protein identification and
quantification
still lags very far behind the high-throughput experimental techniques
used
to interrogate the mRNA expression levels of 25,000 or more genes and
ESTs.
In hope of determining protein abundance levels from the more copious
and technically
easier mRNA experiments, researchers have tried to find correlations
between
mRNA expression data and the limited protein abundance data. To date,
there
have been only a handful of efforts dedicated to this study, most
notably in
human cancers and yeast cells; for the most part, they have reported
only minimal
and/or limited correlations. One of the earliest analyses of correlation
looked
at only 19 proteins in the human liver. Anderson and Seilhamer (1997)
found
a somewhat positive correlation of 0.48. Another limited analysis, of
only three
genes MMP-2, MMP-9 and TIMP-1 in human prostate cancers, showed no
significant
relationship (Lichtinghagen, 2002). An additional cancer study (Chen,
2002)
showed a significant correlation in only a small subset of the proteins
studied.
Conversely, Orntoft et al (2002) found highly significant correlations
in human
carcinomas when looking at changes in mRNA and protein expression
levels.
Many of the present efforts at correlating mRNA and protein expression
have
been conducted in yeast as well. Using two-dimensional electrophoresis
techniques,
Gygi et al. (1999) found that even similar mRNA expression levels could
be accompanied
by a wide range (up to 20-fold difference) of protein abundance levels,
and
vice versa. These results contrast with those of Futcher et al (1999),
who found
relatively high correlations (r = 0.76) after transforming the data to
normal
distributions. In a previous analysis (Greenbaum, 2002), we merged the
data
from both of these datasets (referred to as 2DE-1 (Gygi, 1999) and 2DE-2
(Futcher,
1999)), comparing the resulting new larger protein abundance set
('merged data-set
1') with a comprehensive mRNA expression dataset. The mRNA expression
reference
set was constructed through iteratively combining, in a non-trivial
fashion,
three sets that used Affymetrix chips and a SAGE dataset (Greenbaum,
2002).
Using these reference datasets, we were able to do an all-against-all
comparison
of mRNA and protein expression levels, in addition to a number of
analyses comparing
protein and mRNA expression using smaller, but broad categories
(Greenbaum,
2001; Greenbaum, 2002).
Given the difficult, laborious, and limiting nature of two-dimensional
electrophoresis
analysis (Arthur, 2003), many of the newer protein abundance
determinations
have been done using MudPit and derivative technologies. Washburn et al.
(2001)
used MudPit to analyze and detect 1,484 arbitrary proteins: they were
able to
detect a somewhat random sampling of proteins independent of abundance,
localization,
size or hydrophobicity (we refer to this dataset as MudPit-1). In a
further
experiment the authors, comparing expression ratios for both proteins
and mRNA
levels, found that although they could not find correlations for
individual
loci, they could find overall correlations when looking at pathways and
complexes
of proteins that functioned together (Washburn, 2003). Peng et al.
(2003) analyzed
1,504 yeast proteins with a false-positive rate - misidentification of a
protein
- of less than 1% (we refer to this dataset as MudPit-2). In their
analysis,
they contrasted their methodology with that of Washburn et al (2001)
with which
there was significant overlap of proteins.
Although the literature is ambiguous in terms of whether or not mRNA
expression
and protein abundance can be correlated, we believe that our newer
methodologies
described in Greenbaum et al (2003) and in Preliminary Results provide a
context
for finding correlations. One of the main limitations in finding
correlations
between mRNA and protein data has been the significant degree of error
inherent
to the experimental process of determining both mRNA and protein
concentrations
on a global scale. However, we have found that by examining smaller
homogenous
subpopulations of genes, as defined by information such as function
(Mewes,
2002), subcellular localization (Drawid, 2000) or secondary structure,
we minimize
the noise resulting from experimental error. Using these smaller groups
of genes,
we have been able to find significantly higher correlations than are
found in
a global “all against all” comparison. Further examination
of which
functional classifications allow for very high correlations, along with
incorporation
of data associated with mRNA and protein turnover rates, will allow us
to create
a more rigorous methodology that may allow more accurate extrapolation
of protein
from mRNA abundance levels. Because of the paucity of published human
protein
and mRNA expression data on neurological (as well as on other tissues),
our
initial work has been carried out on yeast. Hence, one of the goals of
the Bioinformatics
Core will be to extend a similar type of analysis to the protein
expression
data that will be obtained by the proposed Protein and Lipid Separation
and
Profiling Core of the Neuroproteomics Research Center. In this regard we
are
fortunate that significant mRNA expression data has already been
obtained by
some members of the Center (e.g., Ron Duman). If we could confidently
predict
(at least some classes of) protein from mRNA expression, the resulting
(relatively
fewer) proteins of interest could then be targeted for directed MS/MS
and/or
protein microarray analysis.
Preliminary Results
We have made considerable progress on trying to predict gene and
protein expression
levels, based on various proteomic features. In particular, we have
published
three papers relating mRNA abundance and gene expression levels
(Greenbaum et
al, 2001, 2003 Lian et al 2002), and integrated this with a variety of
protein
features, looking at the degree to which mRNA abundance and gene
expression
levels were different with respect to different protein features. More
specifically,
we have proposed a methodology that creates reference data sets
(Greenbaum et
al., 2002) removing the biases of individual data sets. Additionally,
instead
of comparing individual genes, we compared broad categories of genomic
information,
finding significant trends in the underlying data. These included an
overall
weak correlation between mRNA and protein levels, although there were
many outlying
genes, which are also of interest given the degree to which their mRNA
and protein
values differ. We also found consistent enrichments of amino acids and
depletions
of random secondary structures in both forms of data relative to the
coded DNA.
We then went on from there to develop a model that tried to predict
levels of
gene expression and protein abundance from various features of the
protein sequence
(Jansen et al., 2003), in particular, from calculating the CAI (Codon
Adaptation
Index) from various compositional biases. We employed a statistical
approach
where we fit a number of simple models to the observed gene expression
and protein
abundance data, first trying to re-parameterize the classic CAI method
that
was developed ~15 years ago and then trying to do a simple linear
regression,
just a straight fit on the observed codon frequencies. We found,
actually, that
we could not improve that much on the classic CAI, though we could
improve slightly
by using the modern genomic data. We have made available over the web
and in
our paper (Jansen et al., 2003) our new models and parameters that
investigators
can use to better predict protein expression and CAI.
Expanding upon our previous merged protein and mRNA expression dataset,
in our
Greenbaum et al (2003) study we constructed a new merged dataset (merged
data
set-2) using two published 2D electrophoresis and two MudPit datasets.
Succinctly
(more information is available on our website at [http://bioinfo.mbb.yale.edu/expression/mrna-v-protein/]),
we transformed each of the protein-abundance datasets into more
quantitative
data by fitting each protein dataset individually onto the reference
mRNA expression
dataset. Each of the new, fitted datasets was then inversely transformed
back
into protein space. These derived protein datasets were then combined
into a
larger reference dataset; when we had more than one abundance value for
an open
reading frame (ORF), we chose the value from the dataset according to a
prescribed
quality ranking. The resulting set contained protein abundance
information for
approximately 2,000 ORFs. Using the resulting data we could compare mRNA
expression
and protein abundance globally as well as looking at smaller, broad
categories,
such as function or localization (see Figure 1b, 1c in Greenbaum et al
(2003).
In particular, we show that some localization categories - for example,
the
nucleolus - have significantly higher correlations than the global
correlation.
Other localizations seem to present less of a correlation between mRNA
and protein
data, for example, the mitochondria - possibly reflecting the
heterogeneous
nature and function of the latter organelle. In terms of MIPS functional
categories,
we show that although some categories, such as cell rescue, show a lower
correlation
than the whole merged set, other functional categories, such as cell
cycle,
show a significant increase in correlation. Logically, this increased
correlation
reflects the co-regulated nature of the proteins in this functional
category.
There are presumably at least three reasons for the poor correlations
generally
reported in the literature between the level of mRNA and the level of
protein,
and these may not be mutually exclusive. First, there are many
complicated and
varied post-transcriptional mechanisms involved in turning mRNA into
protein
that are not yet sufficiently well defined to be able to compute protein
concentrations
from mRNA; second, proteins may differ substantially in their in vivo
half lives;
and/or third, there is a significant amount of error and noise in both
protein
and mRNA experiments that limit our ability to get a clear picture.
Examining the first option - that there are a number of complex steps
between
transcription and translation - we looked at correlations between mRNA
and protein
abundance for those ORFs that had varied or steady levels of mRNA
expression
over the course of the cell cycle. To normalize for the varied degrees
of expression
for different ORFs, we took the standard deviation divided by the
average expression
level as representative of the variation of each ORF over the course of
the
yeast cell cycle (Fig. 1, below). Broadly speaking, the cell can control
the
levels of protein at the transcriptional level and/or at the
translational level.
Logically, we would assume that those ORFs that show a large degree of
variation
in their expression are controlled at the transcriptional level - the
variability
of the mRNA expression is indicative of the cell controlling mRNA
expression
at different points of the cell cycle to achieve the resulting and
desired protein
levels. Thus we would expect, and we found, a high degree of correlation
(r
= 0.89) between the reference mRNA and protein levels for these
particular ORFs;
the cell has already put significant energy into dictating the final
level of
protein through tightly controlling the mRNA expression, and we assume
that
there would then be minimal control at the protein level. In contrast,
those
genes that show minimal variation in their mRNA expression throughout
the cell
cycle are more likely to have little or no correlation with the final
protein
level; the cell would be controlling these ORFs at the translational
and/or
post-translational level, with the mRNA levels being somewhat
independent of
the final protein concentration. And indeed, we found only minimal
correlation
between protein and mRNA expression for these ORFs (r = 0.2).
Furthermore, we
found that those ORFs that have higher than average levels of ribosomal
occupancy
- that is that a large percentage of their cellular mRNA concentration
is associated
with ribosomes (being translated) - have well correlated mRNA and
protein expression
levels (Fig. 1). These cases probably represent a situation wherein the
cell,
having significantly controlled the mRNA expression to produce a
specific level
of protein, will probably not also employ mechanisms to control the
translation.
Alternatively, those proteins that have very low occupancy rates have
uncorrelated
mRNA and protein expression; thus, given that the cell has not tightly
controlled
the mRNA expression for this ORF, it will dictate the resulting protein
levels
through rigorous controls of its translation (that is, through tight
limits
on occupancy) (Arava, 2003).
A
second option for a general lack of correlation between mRNA and
protein abundance
may be that proteins have very different half-lives as the result of
varied
protein synthesis and degradation. Protein turnover can vary
significantly
depending on a number of different conditions (Glickman, 2002); the
cell can
control the rates of degradation or synthesis for a given protein, and
there
is significant heterogeneity even within proteins that have similar
functions
(Pratt, 2002). Recent efforts have been made to computationally
measure these
rates (Lian, 2002).
Simplistically, it can be presumed that the change in a protein's
concentration
over time will be equal to the rate of translation minus the rate of
degradation.
By analogy to concepts in chemical kinetics, we can approximate this
equation:
dP(i,t)/dt = SE(i,t) - DP(i,t), where P is protein abundance i at time
t,
E is the mRNA expression level of protein P, S is a general rate of
protein
synthesis per mRNA, and D is a general rate of protein degradation per
protein
(Gerner, 2002). Additionally there are some experimental methods that
can
also be used to measure turnover and the translational control of
protein
levels (e.g., Serikawa, 2003). Given the degenerate nature of the
genetic
code, there are many synonymous codons (codons that translate into the
same
amino acid). As the cell is biased in its usage of synonymous codons -
that
is, the usage of a subset of codons results in a higher level of mRNA
expression,
possibly as a result of differing cellular tRNA levels - the CAI can
be used
to predict the expression of a gene (Sharp, 1987). We recently
calculated
new parameters for this model, with some improvement in predictive
strength
(Jansen, 2003). It is thought that the CAI will correlate differently
with
mRNA levels than with protein abundance levels due, in part, to
protein turnover
rates (Coghlan, 2000). Ranking the ORFs in terms of their CAI value,
we found
that although those ORFs that ranked the highest in terms of CAI did
not show
a very strong correlation between mRNA and protein levels, they
nevertheless
showed a significantly higher correlation than ORFs that were ranked
as having
the lower CAI values (r = 0.48 versus 0.02). The low correlations
reflect
the fact that the CAI will correlate differently for protein and mRNA
values
because of the additional cellular controls on protein translation,
namely
the effect of protein turnover rates. Nevertheless, the sizable
difference
in correlations between the two groups of ORFs with high- and
low-ranking
CAI values (Figure 1) shows there is some relationship between mRNA
and protein
values, possibly indicating that highly expressed genes tend to result
in
a more correlated level of protein abundance than lower expressed
ones.
Although proteomics is still in its infancy, given the pace of
technological
advancement in protein quantification, mRNA expression analysis and
noise
reduction, more comprehensive correlation studies will soon be
feasible. This
will allow for more robust analyses of the relationship between mRNA
expression
and protein abundance values. Indeed, we are in the process of
continuing
this line of research by careful examination and correlation of the
mRNA expression
data already obtained by R. Duman and other Yale neurobiologists in
the proposed
Center of Excellence in Neuroproteomics. One obvious goal of these
studies
will be to be able to reach the point where we can more accurately
extrapolate
mRNA to protein expression data and thereby, for instance, guide the
selection
of antibodies to be spotted onto microarrays and those proteins that
will
be targeted by directed MS/MS-based technologies to permit the
independent
measurement of selected proteins of high potential interest.
Literature Cited
Anderson L, Seilhamer J (1997) A comparison of selected mRNA and
protein abundances
in human liver. Electrophoresis 18: 533-537.
Arava Y, Wang Y, Storey JD, Liu CL, Brown PO, Herschlag D (2003)
Genome-wide
analysis of mRNA translation profiles in Saccharomyces cerevisiae. Proc
Natl
Acad Sci USA 100: 3889-3894.
Arthur, JM (2003) Proteomics : Curr Opin Nephrol Hypertens. 12(4):
423-30.
Bader, G.D. and Hogue, C.W. (2000). BINDA data specification for storing
and
describing biomolecular interactions, molecular complexes and pathways.
Bioinformatics
16: 465-477.
Chen G, Gharib TG, Huang CC, Taylor JM, Misek DE, Kardia SL, Giordano
TJ, Iannettoni
MD, Orringer MB, Hanash SM, et al. (2002) Discordant protein and mRNA
expression
in lung adenocarcinomas. Mol Cell Proteomics 1:304-313.
Chothia, C. and Gerstein, M. (1997) Protein evolution. How far can
sequences
diverge? Nature 385 : 579, 581.
Coghlan A, Wolfe KH (2000) Relationship of codon bias to mRNA
concentration
and protein length in Saccharomyces cerevisiae.Yeast 16:1131-1145.
Day, D. A. and M. F. Tuite (1998). Post-transcriptional gene regulatory
mechanisms
in eukaryotes:an overview. J Endocrinol 157(3): 361-71.
Drawid A, Gerstein M. (2000) A Bayesian system integrating expression
data with
sequence patterns for localizing proteins: comprehensive application to
the
yeast genome. J Mol Biol. 301(4):1059-75.
Futcher B, Latter GI, Monardo P, McLaughlin CS, Garrels JI (1999) A
sampling
of the yeast proteome. Mol Cell Biol 19:7357-7368.
Gerner C, Vejda S, Gelbmann D, Bayer E, Gotzmann J, Schulte-Hermann R,
Mikulits
W (2002) Concomitant determination of absolute values of cellular
protein amounts,
synthesis rates, and turnover rates by quantitative proteome profiling.
Mol
Cell Proteomics 1:528-537.
Gerstein, M. (1998) Measurement of the effectiveness of transitive
sequence
comparison, through a third 'intermediate' sequence. Bioinformatics 14 :
707-14.
Gerstein, M. and Jansen, R. (2000). The current excitement in
bioinformatics-analysis
of whole- genome expression data: how does it relate to protein
structure and
function?
Curr Opin Struct Biol 10 : 574-84.
Glickman MH, Ciechanover A (2002) The ubiquitin-proteasome proteolytic
pathway:
destruction for the sake of construction. Physiol Rev 82:373-428.
Greenbaum, D.,Colangelo, C., Williams, K., and Gerstein, M. (2003)
Comparing
protein abundance and mRNA expression levels on a genomic scale. Genome
Biology,
4, 117.1-117.8.
Greenbaum D, Jansen R, Gerstein M (2002) Analysis of mRNA expression and
protein
abundance data: an approach for the comparison of the enrichment of
features
in the cellular population of proteins and transcripts. Bioinformatics
18:585-596.
Greenbaum D, Luscombe NM, Jansen R, Qian J, Gerstein M (2001)
Interrelating
different types of genomic data, from proteome to secretome: 'oming in
on function.
Genome Res 11:1463-1468.
Gygi, SP, Corthals, GL, Zhang, Y, Rochon, Y & Aebersold, R (2000)
Evaluation
of two-dimensional gel electrophoresis-based proteome analysis
technology. Proc
Natl Acad Sci USA, 97, 9390-5.
Gygi, SP, Rochon, Y, Franza, BR & Aebersold, R (1999). Correlation
between
protein and mRNA abundance in yeast. Mol Cell Biol 19,1720-30.
Hegyi H, Gerstein M. (2001) Annotation transfer for genomics: measuring
functional
divergence in multi-domain proteins. Genome Res.11(10):1632-40.
Holstege, F. C., E. G. Jennings, et al. (1998). Dissecting the
regulatory circuitry
of a eukaryotic genome. Cell 95(5): 717-728.
Ito, T., Chiba, T., Ozawa, R., Yoshida, M., Hattori, M., and Sakaki,
Y.(2001).
A comprehensive two-hybrid analysis to explore the yeast protein
interactome.
Proc. Natl. Acad. Sci. 98: 4569-4574.
Jacobs Anderson JS, Parker R. Computational identification of cis-acting
elements
affecting post transcriptional control of gene expression in
Saccharomyces cerevisiae.
Nucleic Acids Res. 2000 Apr 1;28(7):1604-17.
Jackson, R. J. and M. Wickens (1997). Translational controls impinging
on the
5'-untranslated region and initiation factor proteins. Curr Opin Genet
Dev 7(2):
233-41.
Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, Emili A,
Snyder M,
Greenblatt JF, Gerstein M. (2003) A bayesian networks approach for
predicting
protein-protein interactions from genomic data. Science.
302(5644):449-53.
Jansen R, Bussemaker HJ, Gerstein M (2003) Revisiting the codon
adaptation index
from a whole-genome perspective: analyzing the relationship between gene
expression
and codon occurrence in yeast using a variety of models. Nucleic Acids
Res 31:2242-2251.
Jansen R, Greenbaum D, Gerstein M (2002) Relating whole-genome
expression data
with protein-protein interactions. Genome Res 12:37-46.
Jansen R, Lan N, Qian J, Gerstein M. (2002) Integration of genomic
datasets
to predict protein complexes in yeast J Struct Funct Genomics.
2002b;2(2):71-81.
Jelinsky, S. A. and L. D. Samson (1999). Global response of
Saccharomyces cerevisiae
to an alkylating agent. Proc Natl Acad Sci U S A 96(4): 1486-91.
Lian Z, Kluger Y, Greenbaum DS, Tuck D, Gerstein M, Berliner N, Weissman
SM,
Newburger PE (2002) Genomic and proteomic analysis of the myeloid
differentiation
program: global analysis of gene expression during induced
differentiation in
the MPRO cell line. Blood 100:3209-3220.
Lindahl, L. and A. Hinnebusch (1992). Diversity of mechanisms in the
regulation
of translation in prokaryotes and lower eukaryotes. Curr Opin Genet Dev
2(5):
720-6.
McCarthy, J. E. (1998). Posttranscriptional control of gene expression
in yeast.
Microbiol Mol Biol Rev 62(4): 1492-553.
Mewes HW, Frishman D, Guldener U, Mannhaupt G, Mayer K, Mokrejs M,
Morgenstern
B, Munsterkotter M, Rudd S, Weil B. (2002) MIPS: a database for genomes
and
protein sequences. Nucleic Acids Res. 30(1):31-4.
Morris, D. R. and A. P. Geballe (2000). Upstream open reading frames as
regulators
of mRNA translation. Mol Cell Biol 20(23): 8635-42.
Naylor, G. J. and Gerstein, M. (2000) Measuring shifts in function and
evolutionary
opportunity using variability profiles: A case study of the globins. J
Mol Evol
51: 223-33.
Orntoft TF, Thykjaer T, Waldman FM, Wolf H, Celis JE (2002) Genome-wide
study
of gene copy numbers, transcripts, and protein levels in pairs of
non-invasive
and invasive human transitional cell carcinomas. Mol Cell Proteomics 1:
37-45.
Peng J, Elias JE, Thoreen CC, Licklider LJ, Gygi SP (2003) Evaluation of
multidimensional
chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for
large-scale
protein analysis: the yeast proteome. J Proteome Res 2: 43-50.
Pratt JM, Petty J, Riba-Garcia I, Robertson DH, Gaskell SJ, Oliver SG,
Beynon
RJ (2002) Dynamics of protein turnover, a missing dimension in
proteomics. Mol
Cell Proteomics 1:579-591.
Qian J, Lin J, Luscombe NM, Yu H, Gerstein M. (2003) Prediction of
regulatory
networks: genome-wide identification of transcription factor targets
from gene
expression data. Bioinformatics.19(15):1917-26.
Roth, F. P., J. D. Hughes, et al. (1998). Finding DNA regulatory motifs
within
unaligned noncoding sequences clustered by whole-genome mRNA
quantitation. Nat
Biotechnology 16(10): 939-45.
Serikawa KA, Xu XL, MacKay VL, Law GL, Zong Q, Zhao LP, Bumgarner R,
Morris
DR (2003) The transcriptome and its translation during recovery from
cell cycle
arrest in Saccharomyces cerevisiae. Mol Cell Proteomics 2: 191-204.
Sharp PM, Li WH (1987) The codon adaptation index --- a measure of
directional
synonymous codon usage bias, and its potential applications. Nucleic
Acids Res
15:1281- 1295.
Uetz, P., Giot, L., Cagney, G., Mansfield, T.A., Judson, R.S., Knight,
J.R.,
Lockshon, D., Narayan, V., Srinivasan, M., Pochart, P. (2000). A
comprehensive
analysis of protein-protein interactions in Saccharomyces cerevisiae.
Nature
403: 623-627.
Velculescu VE, Zhang L, Zhou W, Vogelstein J, Basrai MA, Bassett DE Jr,
Hieter
P, Vogelstein B, Kinzler KW (1997). Characterization of the yeast
transcriptome.
Cell 88(2):243-51.
Vilela, C., B. Linz, et al. (1998). The yeast transcription factor genes
YAP1
and YAP2 are subject to differential control at the levels of both
translation
and mRNA stability. Nucleic AcidsRes 26(5): 1150-9.
Vilela, C., C. V. Ramirez, et al. (1999). Post-termination ribosome
interactions
with the 5'UTR modulate yeast mRNA stability. Embo J 18(11):
3139-52.
Washburn MP, Wolters D, Yates 3rd JR (2001) Large-scale analysis of the
yeast
proteome by multidimensional protein identification technology. Nat
Biotechnol
19:242-247.
Washburn MP, Koller A, Oshiro G, Ulaszek RR, Plouffe D, Deciu C,
Winzeler E,
Yates 3rd JR (2003) Protein pathway and complex clustering of correlated
mRNA
and protein expression analyses in Saccharomyces cerevisiae. Proc Natl
Acad
Sci USA 100:3107-3112.
Wilson CA, Kreychman J, Gerstein M. (2000) Assessing annotation transfer
for
genomics: quantifying the relations between protein sequence, structure
and
function through traditional and probabilistic scores. J Mol Biol.
297(1):233-49.
Xenarios, I., Rice, D.W., Salwinski, L., Baron, M.K., Marcotte, E.M.,
and Eisenberg,
D. (2000). DIP: The database of interacting proteins. Nucleic Acids Res.
28:
289-291.