Background

Many fundamental cellular processes involve protein networks, and comprehensively identifying them is important to systematically defining protein function (Eisenberg et al., 2000; Lan et al., 2002, 2003). Complex networks are also used to describe the structure of a number of wide-ranging systems including the internet, power grids, the ecological food web and scientific collaborations. Despite the seemingly huge differences among these systems, it has been shown that they all share common features in terms of network topology(Albert and Barabasi, 2001; Albert et al., 1999, 2000; Amaral et al., 2000; Barabasi and Albert, 1999; Huberman and Adamic, 1999; Jeong et al., 2001; Watts and Strogatz, 1998; Gavin et al. 2006, Krogan et al. 2006). Thus, networks provide a framework for describing biology in a universal language understandable to a broad audience (Girvan and Newman, 2002).

Currently, large-scale experiments have created a great variety of genome-wide information related to protein networks, especially in the yeast Saccharomyces cerevisiae. There are datasets of explicit protein-protein interactions (Gavin et al., 2002; Ho et al., 2002; Ito et al., 2000; Uetz et al., 2000), experimentally derived regulatory relationships (Lee et al., 2002), manually curated interactions such as MIPS, BIND, and DIP (Bader et al., 2003; Mewes et al., 2002) and systems for automatically finding interactions in the literature (Friedman et al., 2001). In addition to the experimentally-derived interaction networks, there are also predicted interactions (Valencia and Pazos, 2002). The most common methods used in predicting protein-protein interactions are based on âguilt-by-associationâ. Two proteins are more likely to interact if they share several correlated genomic features. Examples of these genomic features are gene expression profiles (DiRisi et al., 1997), phylogenetic profiles (Pellegrini et al., 1999), essentiality (Winzeler et al., 1999), localization (Kumar et al., 2002), and gene neighborhood (Tamames et al., 1997). Comparative genomics also provides an efficient way for mapping genome-wide interactions between different organisms (Walhout et al., 2000; Yu et al., 2004).

Summary of Some Past Results

Predicting Protein Networks

Correlating Interactions with Complexes and Genomic Features

We have developed methods to assess protein-protein interactions and also regulatory relationships, correlating them with structures of known complexes and with function. In particular, Jansen et al. (2002a) developed a method for looking at the correlation between expression levels and their fluctuations and known interactions. This allowed us to find significant differences in expression correlations between transient and permanent complexes. Edwards et al. (2002) compared known complexes to the interactions from the database. Finally, Lan et al. (2002, 2003) looked at the relationship between functional categories and interactions, showing that interactions could be used to systematically circumscribe and define function.

Predicting Protein Networks from Individual Genomic Features:

We have developed methods for predicting regulatory relationships and protein-protein interactions from individual types of genomic data. Jansen et al. (2002) and Qian et al. (2001) looked at the degree to which expression correlations could predict interactions and found that a subset of known interactions could be predicted with high confidence. In addition, Qian et al. (2001) looked at new types of expression correlations, those that had a specifically time shifted or inverted relationship. Finally, we developed an approach based on support vector machines to predict the target of a transcription factor based on finding relatively subtle relationships between their expression profiles (Qian et al., 2003).

Data Integration of Multiple Features to Improve Prediction

We have developed methods of combining various genomic features that produces an interaction prediction that is stronger than each of the individual features. This is important both for known protein-protein interaction data sets, which suffer from a great degree of noise, and also for genomic features such as expression correlation which are only weekly predictive of interactions. Our first analyses used simple combinations of features (Edwards et al., 2002; Jansen et al., 2002b; Gerstein et al., 2002). Then we moved on to developing more sophisticated Bayesian-network approaches that combine features in a way that optimizes their predictive value (Jansen et al, 2003b). In Lu et al (2005), we saw how this result scaled with the number of features. In Xia et al (2006), we extended it to membrane proteins.

Analysis of the Network Structure

Analysis of the Global Structure of Networks and Comparison of Networks

We have carried out a number of studies looking at the overall statistics of gene networks, finding that a number of them have a very similar overall power law type of distribution to those found for the occurrence of gene families (Luscombe et al., 2002; Xia et al., 2004; Yu et al., 2007). In terms of smaller scale structures, Yu et al (2006a) analyzed regulatory networks in yeast and showed that they have a pyramid-shaped hierarchical structure, similar in some sense to governmental "org-charts", with a small number of master transcription-factors on top. Finally, Yu et al (2006b) showed how defective cliques within networks could be completed, defining large complexes and potentially predicting more interactions.

We interrelated regulatory and expression networks and found that genes targeted by the same transcription factor tend to have correlated expression (Yu et al., 2003). In Yu et al (2006c), we compared many different types of networks, defining composite hubs and motifs.

Mapping Networks between Organisms

In Yu et al (2004), we showed how one could compare networks between organisms and use this comparison to help in network prediction. In particular, we showed how interologs could be transferred between organisms as a function of sequence similarity. We also defined a related concept for the transferring of regulatory relationships ("regulog"), and we established a database of mapped relationships (interolog.gersteinlab.org).

Analysis of the Dynamics of Networks

We examined the dynamics of the regulatory system in yeast on a genomic scale by integrating gene expression data for five cellular conditions with known transcriptional regulatory relationships (Luscombe et al., 2004). To rigorously compare these condition-specific subnetworks we developed SANDY (Statistical Analysis of Network Dynamics). We found that these subnetworks exhibit vastly different topologies on both a local and global level and uncovered two separate groups of cellular states. Moreover, we showed that different sets of transcription factors become key regulatory hubs at different times, portraying a network that shifts its weight between different foci to bring about distinct cellular states. Following on in a subsequent analysis, in Yu et al (2006c), we analyzed the expression relationships in small network motifs, showing that many of them in metabolic pathways have a time-shifted quality.

3D Structural analysis of protein interaction networks
While there has been considerable interest in protein interaction networks and their role in cell function, most studies, thus far, have neglected the biophysical properties of the proteins involved. We have pioneered the use of 3D protein structures for analysis of protein networks (Kim et al., 2006). This approach gave us a unique perspective on protein networks and showed that many network properties previously thought to relate to biological features were actually more reflective of biophysical quantities.

References

Akerley, B. J., E. J. Rubin, A. Camilli, D. J. Lampe, H. M. Robertson, and J. J. Mekalanos. (1998). Systematic identification of essential genes by in vitro mariner mutagenesis. Proc Natl Acad Sci USA, 95:8927-32.

Albert, R., H. Jeong and A. L. Barabasi (1999). Diameter of the World-Wide Web. Nature 401: 130-131.

Albert, R., H. Jeong and A. L. Barabasi (2000). Error and attack tolerance of complex networks. Nature 406: 378-382.

Albert, R. and A. L. Barabasi (2001). Statistical Mechanics of Complex Networks. arXiv:cond-mat/0106096: 1-53.

Alexandrov, V., Gerstein, M (2001). Calculating populations of subcellular compartments using density matrix formalism International Journal of Quantum Chemistry 85: 693-696.

Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 25:3389-402.

Amaral, L. A., A. Scala, M. Barthelemy and H. E. Stanley (2000). Classes of small-world networks. Proc Natl Acad Sci USA 97: 11149-52.

Arigoni, F., F. Talabot, M. Peitsch, M. D. Edgerton, E. Meldrum, E. Allet, R. Fish, T. Jamotte, M. L. Curchod, and H. Loferer. (1998). A genome-based approach for the identification of essential bacterial genes. Nature Biotechnology, 16:851-6.

Arkin, I, Brunger, A & Engelman, D. (1997) Are there dominant membrane protein families with a given number of helices? Proteins 28: 465-466.

Bader, G. D., D. Betel and C. W. Hogue (2003). BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res 31: 248-50.

Bailey, T. L., and W. S. Noble. (2003). Searching for statistically significant regulatory modules. Bioinformatics 19 Suppl 2:II16-II25

Barabasi, A. L. and R. Albert (1999). Emergence of Scaling in Random Networks. Science 286: 509-512.

Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, Studholme DJ, Yeats C, Eddy SR. (2004) Nucleic Acids Res 32 D138-D141

Berger A, Della Pietra S, Della Pietra V. A Maximum Entropy Approach to Natural Language Processing. Computational Linguistics (22-1) March 1996.

Bertone, P ., Gerstein, M (2001). Integrative data mining: the new direction in bioinformatics. IEEE Eng Med Biol Mag 20: 33-40.

Bertone, P .,Y Kluger, N Lan, D Zheng, D Christendat, A Yee, A M Edwards, C H Arrowsmith, G T Montelione, Gerstein, M (2001). SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics. Nucleic Acids Res 29: 2884-98.

Bochner, B. R. (2003). New technologies to assess genotype-phenotype relationships. Nature Reviews Genetics, 4:309-14.

Cheung KH, Liu Y, Kumar K, Snyder M, Gerstein, M, Miller P. (2001) An XML Application for Genomic Data Interoperation. IEEE International Symposium on Bio-Informatics and Biomedical Engineering (BIBE), pp. 97-103

Christendat, D., A. Yee, A. Dharamsi, Y. Kluger, M. Gerstein, C. H. Arrowsmith, and A. M. Edwards. (2000). Structural proteomics: prospects for high throughput sample preparation. Prog Biophys Mol Biol, 73:339-45

Cowart, L. A., Y. Okamoto, F. R. Pinto, J. L. Gandy, J. S. Almeida, and Y. A. Hannun. (2003). Roles for sphingolipid biosynthesis in mediation of specific programs of the heat stress response determined through gene expression profiling. J Biol Chem, 278:30328-38

Csank, C., M. C. Costanzo, J. Hirschman, P. Hodges, J. E. Kranz, M. Mangan, K. O'Neill, L. S. Robertson, M. S. Skrzypek, J. Brooks, and J. I. Garrels. (2002). Three yeast proteome databases: YPD, PombePD, and CalPD (MycoPathPD). Methods in Enzymology, 350:347-73.

DeRisi, J. L., V. R. Iyer and P. O. Brown (1997). Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278: 680-6.

Deutschbauer, A. M., R. M. Williams, A. M. Chu, and R. W. Davis. (2002). Parallel phenotypic analysis of sporulation and postgermination growth in Saccharomycescerevisiae. Proc Natl Acad Sci USA, 99:15530-5.

Drawid, A. and Gerstein, M (2000). A Bayesian system integrating expression data with sequence patterns for localizing proteins: comprehensive application to the yeast genome. J Mol Biol 301: 1059-75.

Drawid., R Jansen, Gerstein, M (2000). Genome-wide analysis relating expression level with protein subcellular localization. Trends Genet 16: 426-30.

Edwards, A.M., B. Kus, R. Jansen, D. Greenbaum, J. Greenblatt, and Gerstein, M. (2002). Bridging structural biology and genomics: assessing protein interaction data with known complexes. Trends in Genetics 18: 529-536.

Eisenberg, D., E. M. Marcotte, I. Xenarios and T. O. Yeates (2000). Protein function in the post-genomic era. Nature 405: 823-6.

Entian, K. D., T. Schuster, J. H. Hegemann, D. Becher, H. Feldmann, U. Guldener, R. Gotz, M. Hansen, C. P. Hollenberg, G. Jansen, W. Kramer, S. Klein, P. Kotter, J. Kricke, H. Launhardt, G. Mannhaupt, A. Maierl, P. Meyer, W. Mewes, T. Munder, R. K. Niedenthal, M. Ramezani Rad, A. Rohmer, A. Romer, and A. Hinnen. (1999). Functional analysis of 150 deletion mutants in Saccharomyces cerevisiae by a systematic approach. Molecular & General Genetics 262:683-702.

Erdos, P. and A. Renyi (1959). On random graphs I. Publ. Math. (Debrecen) 6: 290-297.

Fraser, H. B., A. E. Hirsh, L. M. Steinmetz, C. Scharfe, and M. W. Feldman. (2002). Evolutionary rate in the protein interaction network. Science 296:750-2.

Friedman, C., P. Kra, H. Yu, M. Krauthammer and A. Rzhetsky (2001). GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 17 Suppl 1: S74-82.

Gasch, A. P., P. T. Spellman, C. M. Kao, O. Carmel-Harel, M. B. Eisen, G. Storz, D. Botstein, and P. O. Brown. (2000). Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell, 11:4241-57.

Gasch, A. P., and M. Werner-Washburne. (2002). The genomics of yeast responses to environmental stress and starvation. Funct Integr Genomics, 2:181-92.

Gavin, A.C., M. Bosche, R. Krause, P. Grandi, M. Marzioch, A. Bauer, J. Schultz, J.M. Rick, A.M. Michon, C.M. Cruciat, M. Remor, C. Hofert, M. Schelder, M. Brajenovic, H. Ruffner, A. Merino, K. Klein, M. Hudak, D. Dickson, T. Rudi, V. Gnau, A. Bauch, S. Bastuck, B. Huhse, C. Leutwein, M.A. Heurtier, R.R. Copley, A. Edelmann, E. Querfurth, V. Rybin, G. Drewes, M. Raida, T. Bouwmeester, P. Bork, B. Seraphin, B. Kuster, G. Neubauer, and G. Superti-Furga. (2002). Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415: 141-147.

Gerstein, M. (1998). Patterns of Protein-Fold Usage in Eight Microbial Genomes: A Comprehensive Structural Census. Proteins 33: 518-534.

Gerstein, M., and R. Jansen. (2000). The current excitement in bioinformatics-analysis of whole-genome expression data: how does it relate to protein structure and function? Curr Opin Struct Biol 10:574-84.

Gerstein, M., N. Lan, R. Jansen (2002). Proteomics. Integrating interactomes. Science 29: 284-7.

Girvan, M., and M. E. Newman. (2002). Community structure in social and biological networks. Proc Natl Acad Sci U S A 99:7821-6.

Goh, C.S., N. Lan, N. Echols, S.M. Douglas, D. Milburn, P. Bertone, R. Xiao, L.C. Ma, D. Zheng, Z. Wunderlich, T. Acton, G.T. Montelione, Gerstein, M (2003). SPINE 2: a system for collaborative structural proteomics within a federated database framework. Nucleic Acids Res 31: 2833-8.

Goh, C. S., N. Lan, S. M. Douglas, B. Wu, N. Echols, A. Smith, D. Milburn, G. T. Montelione, H. Zhao, and M. Gerstein. (2004). Mining the Structural Genomics Pipeline: Identification of Protein Properties that Affect High-throughput Experimental Analysis. J Mol Biol 336:115-30.

Greenbaum, D., C. Colangelo, K. Williams, Gerstein, M (2003). Comparing protein abundance and mRNA expression levels on a genomic scale. Genome Biol 4: 117.

Greenbaum, D., N. M. Luscombe, R. Jansen, J. Qian and Gerstein, M (2001). Interrelating different types of genomic data, from proteome to secretome: 'oming in on function. Gen Res 11: 1463-8.

Greenbaum, D., R. Jansen and Gerstein, M (2002). Analysis of mRNA expression and protein abundance data: an approach for the comparison of the enrichment of features in the cellular population of proteins and transcripts. Bioinformatics 18: 585-96.

Guelzim, N., S. Bottani, P. Bourgine, and F. Kepes. (2002). Topological and causal structure of the yeast transcriptional regulatory network. Nature Genetics,31:60-3.

Hampsey, M. (1997). A review of phenotypes in Saccharomyces cerevisiae. Yeast 13:1099-133.

Harrison, P. M., and M. Gerstein. (2003). A method to assess compositional bias in biological sequences and its application to prion-like glutamine/asparagine-rich domains in eukaryotic proteomes. Genome Biol 4:R40.

Harrison, PM, Gerstein, M (2002). Studying genomes through the aeons: protein families, pseudogenes and proteome evolution. J Mol Biol 318: 1155-74.

Hartwell, L. H., J. J. Hopfield, S. Leibler and A. W. Murray (1999). From molecular to modular cell biology. Nature 402: C47-52.

Hegyi, H., and M. Gerstein. (1999). The relationship between protein structure and function: a comprehensive survey with application to the yeast genome. J Mol Biol 288:147-64.

Hegyi, H. and Gerstein, M. (2001). Annotation transfer for genomics: measuring functional divergence in multi-domain proteins. Genome Research 11: 1632-1640.

Hirsh, A. E., and H. B. Fraser.(2001). Protein dispensability and rate of evolution. Nature 411:1046-9.

Ho, Y., A. Gruhler, A. Heilbut, G.D. Bader, L. Moore, S.L. Adams, A. Millar, P. Taylor, K. Bennett, K. Boutilier, L. Yang, C. Wolting, I. Donaldson, S. Schandorff, J. Shewnarane, M. Vo, J. Taggart, M. Goudreault, B. Muskat, C. Alfarano, D. Dewar, Z. Lin, K. Michalickova, A.R. Willems, H. Sassi, P.A. Nielsen, K.J. Rasmussen, J.R. Andersen, L.E. Johansen, L.H. Hansen, H. Jespersen, A. Podtelejnikov, E. Nielsen, J. Crawford, V. Poulsen, B.D. Sorensen, J. Matthiesen, R.C. Hendrickson, F. Gleeson, T. Pawson, M.F. Moran, D. Durocher, M. Mann, C.W. Hogue, D. Figeys, and M. Tyers. (2002). Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415: 180-183.

Holstege, F. C. P., Jennings, E. G., Wyrick, J. J., Lee, T. I., Hengartner, C. J., Green, M. R., Golub, T. R., Lander, E. S. & Young, R. A. (1998). Dissecting the regulatory circuitry of a eukaryotic genome. Cell, 95: 717-728.

Horak, C.E., N.M. Luscombe, J. Qian, P. Bertone, S. Piccirrillo, Gerstein, M, and M. Snyder. (2002). Complex transcriptional circuitry at the G1/S transition in Saccharomyces cerevisiae. Genes Dev. 16: 3017-3033.

Huberman, B. A. and L. A. Adamic (1999). Growth dynamics of the World-Wide Web. Nature 401: 131.

Hughes, J. D., P. W. Estep, S. Tavazoie, and G. M. Church. (2000). Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J Mol Biol, 296:1205-14.

Ito, T., K. Tashiro, S. Muta, R. Ozawa, T. Chiba, M. Nishizawa, K. Yamamoto, S. Kuhara, and Y. Sakaki. (2000). Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proc Natl Acad of Sci USA 97: 1143-1147.

Jansen, R ., D. Greenbaum, Gerstein, M (2002). Relating whole-genome expression data with protein-protein interactions. Genome Res 12: 37-46.

Jansen, R, Gerstein, M (2000). Analysis of the yeast transcriptome with structural and functional categories: characterizing highly expressed proteins. Nucleic Acids Res 28: 1481-8.

Jansen, R, Yu, H, Greenbaum, D, Kluger, Y, Krogan, N, Chung, S, Snyder, M, Greeblatt, J, Gerstein, M (2003). A Bayesian networks approach to predict protein complexes from genomic data. Science 302: 449-453.

Jansen, R., N. Lan, J. Qian, and Gerstein, M. (2002a). Integration of genomic datasets to predict protein complexes in yeast. Journal of Structural and Functional Genomics 2: 71-81.

Jensen, F V, Bayesian Networks and Decision Graphs (Springer, New York, 2001).

Jeong, H., S. P. Mason, A. L. Barabasi and Z. N. Oltvai (2001). Lethality and centrality in protein networks. Nature 411: 41-2.

Kanehisa, M. (2002). The KEGG database. Novartis Found Symp, 247:91-101; discussion 101-3, 119-28, 244-52.

Kanehisa, M., S. Goto, S. Kawashima, Y. Okuno, and M. Hattori. (2004). The KEGG resource for deciphering the genome. Nucleic Acids Res 32:D277-80.

Kluger, Y., R. Basri, J. T. Chang, and M. Gerstein. (2003). Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res 13:703-16.

Koonin EV, Wolf YI, Karev GP. (2002). The structure of the protein universe and genome evolution. Nature 420:218-23.

Kumar, A. and M. Snyder. (2002). Protein complexes take the bait. Nature 415: 123-124.

Kumar, A., S. Agarwal, J.A. Heyman, S. Matson, M. Heidtman, S. Piccirillo, L. Umansky, A. Drawid, R. Jansen, Y. Liu, K.H. Cheung, P. Miller, Gerstein, M, G.S. Roeder, and M. Snyder. (2002). Subcellular localization of the yeast proteome. Genes & Development 16: 707-719.

Lan, N, GT Montelione, Gerstein, M (2003). Ontologies for proteomics: towards a systematic definition of structure and function that scales to the genome level. Curr Opin Chem Biol 7: 44-54.

Lan, N., R. Jansen, and Gerstein, M. (2002). Toward a Systematic Definition of Protein Function That Scales to the Genome Level: Defining Function in Terms of Interactions. Proceeding of the IEEE 90: 1848-1858.

Lee, T.I., N.J. Rinaldi, F. Robert, D.T. Odom, Z. Bar-Joseph, G.K. Gerber, N.M. Hannett, C.T. Harbison, C.M. Thompson, I. Simon, J. Zeitlinger, E.G. Jennings, H.L. Murray, D.B. Gordon, B. Ren, J.J. Wyrick, J.B. Tagne, T.L. Volkert, E. Fraenkel, D.K. Gifford, and R.A. Young. (2002). Transcriptional regulatory networks in Saccharomyces cerevisiae. Science. Online 298: 799-804.

Lin, J, J Qian, D Greenbaum, P Bertone, R Das, N Echols, A Senes, B Stenger, Gerstein, M (2002). GeneCensus: genome comparisons in terms of metabolic pathway activity and protein family sharing. Nucleic Acids Res 30: 4574-82.

Lin, J., and M. Gerstein. (2000). Whole-genome trees based on the occurrence of folds and orthologs: implications for comparing genomes on different levels. Genome Res 10:808-18.

Liu, Y., D. M. Engelman, and M. Gerstein. (2002). Genomic analysis of membrane protein families: abundance and conserved motifs. Genome Biol 3:research0054.

Lu H, L. Lu, J. Skolnick (2003a). Development of unified statistical potentials describing protein-protein interactions. Biophys J Mar;84: 1895-901.

Luscombe, NM, J Qian, Z Zhang, T Johnson, Gerstein, M (2002). The dominance of the population by a selected few: power-law behaviour applies to a wide variety of genomic properties. Genome Biol 3: RESEARCH0040.

Luscombe NM, Royce TE, Bertone P, Echols N, Horak CE, Chang JT, Snyder M, Gerstein M (2003) ExpressYourself: A modular platform for processing and visualizing microarray data. Nucleic Acids Res 31 3477-3482.

Marcotte, E.M., M. Pellegrini, M.J. Thompson, T.O. Yeates, and D. Eisenberg. (1999). A combined algorithm for genome-wide prediction of protein function. Nature 402: 83-86.

Martone, R., G. Euskirchen, P. Bertone, S. Hartman, T.E. Royce, N. M. Luscombe, J. L. Rinn, F. K. Nelson, P. Miller, Gerstein, M, S. Weissman, and M. Snyder. (2003) Distribution of NF-kappa B-binding sites across human chromosome 22. Proc Natl Acad Sci USA (in press)

Mewes, H. W., D. Frishman, U. Guldener, G. Mannhaupt, K. Mayer, M. Mokrejs, B. Morgenstern, M. Munsterkotter, S. Rudd and B. Weil (2002). MIPS: a database for genomes and protein sequences. Nucleic Acids Res 30: 31-4.

Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Barrell D, Bateman A, Binns D, Biswas M, Bradley P, Bork P, Bucher P, Copley RR, Courcelle E, Das U, Durbin R, Falquet L, Fleischmann W, Griffiths-Jones S, Haft D, Harte N, Hulo N Kahn D, Kanapin A, Krestyaninova M, Lopez R, Letunic I, Lonsdale D, Silventoinen V, Orchard SE, Pagni M, Peyruc D, Ponting CP, Selengut JD, Servant F, Sigrist CJ, Vaughan R, Zdobnov EM (2004) The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res 31: 315-318.

Molina L, Belanche L, Nebot A. Feature Selection Algorithms: A Survey and Experimental Evaluation. (2002) IEEE International Conference on Data Mining (ICDM'02)

Naylor, G. J., and Gerstein M. (2000). Measuring shifts in function and evolutionary opportunity using variability profiles: a case study of the globins. J Mol Evol, 51:223-33.

Overbeek, R., N. Larsen, G. D. Pusch, M. D'Souza, E. Selkov, Jr., N. Kyrpides, M. Fonstein, N. Maltsev, and E. Selkov. (2000). WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction. Nucleic Acids Res, 28:123-5.

Pal, C., B. Papp, and L. D. Hurst.( 2003). Genomic function: Rate of evolution and gene dispensability.[comment]. Nature 421:496-7; discussion 497-8.

Pearl, J, Probabilistic reasoning in intelligent systems (1988) (Morgan Kaufmann, San Mateo).

Pearson, W.R. and D.J. Lipman. (1988). Improved tools for biological sequence comparison. Proc Natl Acad Sci USA 85: 2444-2448.

Pellegrini, M., E.M. Marcotte, M.J. Thompson, D. Eisenberg, and T.O. Yeates. (1999). Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci USA 96: 4285-4288.

Qian J, Stenger B, Wilson CA, Lin J, Jansen R, Teichmann SA, Park J, Krebs WG, Yu H, Alexandrov V, Echols N, Gerstein, M (2001a). PartsList: a web-based system for dynamically ranking protein folds based on disparate attributes, including whole-genome expression and interaction information. Nucleic Acids Res 29: 1750-64.

Qian, J ., M. Dolled-Filhart, J. Lin, H. Yu, Gerstein, M (2001b). Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions. J Mol Biol 314: 1053-66.

Qian, J., J. Lin, N.M. Luscombe H. Yu, Gerstein, M. (2003). Predictions of regulatory networks: genome-wide identification of transcription factor targets from gene expression data. Bioinformatics 19: 1917-1926.

Qian, J., Luscombe NM, and Gerstein M. (2001). Protein family and fold occurrence in genomes: power-law behaviour and evolutionary model. J Mol Biol 313:673-81.

Rieger, K. J., M. El-Alama, G. Stein, C. Bradshaw, P. P. Slonimski, and K. Maundrell. (1999). Chemotyping of yeast mutants using robotics. Yeast 15:973-86.

Ross-Macdonald, P., P. S. Coelho, T. Roemer, S. Agarwal, A. Kumar, R. Jansen, K. H. Cheung, A. Sheehan, D. Symoniatis, L. Umansky, M. Heidtman, F. K. Nelson, H. Iwasaki, K. Hager, M. Gerstein, P. Miller, G. S. Roeder, and M. Snyder. (1999). Large-scale analysis of the yeast genome by transposon tagging and gene disruption.[comment]. Nature 402:413-8.

Rzhetsky A, Gomez SM (2002). Birth of scale-free molecular networks and the number of distinct DNA and protein domains per genome. Bioinformatics 17:988-996.

Sakumoto, N., I. Matsuoka, Y. Mukai, N. Ogawa, Y. Kaneko, and S. Harashima. (2002). A series of double disruptants for protein phosphatase genes in Saccharomyces cerevisiae and their phenotypic analysis. Yeast 19:587-99.

Smith, V., K. N. Chou, D. Lashkari, D. Botstein, and P. O. Brown. (1996). Functional analysis of the genes of yeast chromosome V by genetic footprinting. Science 274:2069-74.

Steinmetz, L. M., C. Scharfe, A. M. Deutschbauer, D. Mokranjac, Z. S. Herman, T. Jones, A. M. Chu, G. Giaever, H. Prokisch, P. J. Oefner, and R. W. Davis. (2002). Systematic screen for human disease genes in yeast. Nature Genetics 31:400-4.

Roven, C., and H. J. Bussemaker. (2003). REDUCE: An online tool for inferring cis-regulatory elements and transcriptional module activities from microarray data. Nucleic Acids Res, 31:3487-90.

Tamames, J., G. Casari, C. Ouzounis, and A. Valencia. (1997). Conserved clusters of functionally related genes in two bacterial genomes. J Mol Evolution 44: 66-73.

Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT, Rao BS, Kiryutin B, Galperin MY, Fedorova ND, Koonin EV. (2001) The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. Jan 1;29: 22-8.

Tatusov, R.L., E.V. Koonin, and D.J. Lipman. (1997). A genomic perspective on protein families. Science 278: 631-637.

Thanassi, J. A., S. L. Hartman-Neumann, T. J. Dougherty, B. A. Dougherty, and M. J. Pucci. (2002). Identification of 113 conserved essential genes using a high-throughput gene disruption system in Streptococcus pneumoniae. Nucleic Acids Res 30:3152-62.

Thatcher, J. W., J. M. Shaw, and W. J. Dickinson. (1998) Marginal fitness contributions of nonessential genes in yeast. Proc Natl Acad Sci USA 95:253-7.

Tong, A. H., M. Evangelista, A. B. Parsons, H. Xu, G. D. Bader, N. Page, M. Robinson, S. Raghibizadeh, C. W. Hogue, H. Bussey, B. Andrews, M. Tyers, and C. Boone. (2001). Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science 294:2364-8.

True, H. L., and S. L. Lindquist. (2000). A yeast prion provides a mechanism for genetic variation and phenotypic diversity.[comment]. Nature 407:477-83.

Uetz, P., L. Giot, G. Cagney, T.A. Mansfield, R.S. Judson, J.R. Knight, D. Lockshon, V. Narayan, M. Srinivasan, P. Pochart, A. Qureshi-Emili, Y. Li, B. Godwin, D. Conover, T. Kalbfleisch, G. Vijayadamodar, M. Yang, M. Johnston, S. Fields, and J.M. Rothberg. (2000). A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403: 623-627.

Valencia, A. and F. Pazos. (2002). Computational methods for the prediction of protein interactions. Curr Opin Struct Biol 12: 368-373.

von Mering, C., R. Krause, B. Snel, M. Cornell, S.G. Oliver, S. Fields, and P. Bork. (2002). Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417: 399-403.

Walhout, A.J., R. Sordella, X. Lu, J.L. Hartley, G.F. Temple, M.A. Brasch, N. Thierry-Mieg, and M. Vidal. (2000). Protein interaction mapping in C. elegans using proteins involved in vulval development. Science 287: 116-122.

Warringer, J., and A. Blomberg. (2003). Automated screening in environmental arrays allows analysis of quantitative phenotypic profiles in Saccharomyces cerevisiae. Yeast 20:53-67.

Watts, D. J. and S. H. Strogatz (1998). Collective dynamics of 'small-world' networks. Nature 393(6684): 440-2.

Wilson, C.A., J. Kreychman, and Gerstein, M. (2000). Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. J Mol Biol 297: 233-249.

Wingender, E., X. Chen, E. Fricke, R. Geffers, R. Hehl, I. Liebich, M. Krull, V. Matys, H. Michael, R. Ohnhauser, M. Pruss, F. Schacherer, S. Thiele, and S. Urbach. (2001). The TRANSFAC system on gene expression regulation. Nucleic Acids Res 29:281-3.

Winzeler EA, Shoemaker DD, Astromoff A, Liang H, Anderson K, Andre B, Bangham R, Benito R, Boeke JD, Bussey H, Chu AM, Connelly C, Davis K, Dietrich F, Dow SW, El Bakkoury M, Foury F, Friend SH, Gentalen E, Giaever G, Hegemann JH, Jones T, Laub M, Liao H, Davis RW, et al. (1999a). Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 285(5429): 901-6.

Winzeler, E. A., D. D. Shoemaker, A. Astromoff, H. Liang, K. Anderson, B. Andre, R. Bangham, R. Benito, J. D. Boeke, H. Bussey, A. M. Chu, C. Connelly, K. Davis, F. Dietrich, S. W. Dow, M. El Bakkoury, F. Foury, S. H. Friend, E. Gentalen, G. Giaever, J. H. Hegemann, T. Jones, M. Laub, H. Liao, and R. W. Davis (1999b). Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 285:901-6.

Xenarios, I., L. Salwinski, X. J. Duan, P. Higney, S. M. Kim and D. Eisenberg (2002). DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res 30: 303-5.

Xia, Y. and M. Levitt (2000). Extracting knowledge-based energy functions from protein structures by error rate minimization: Comparison of methods using lattice model. J Chem Phys 113: 9318-9330.

Xia Y, H Yu, R Jansen, M Seringhaus, S Baxter, D Greenbaum, H Zhao, Gerstein M. (in press) Analyzing Cellular Biochemistry in Terms of Molecular Networks Annual Review of Biochemistry.

Yang, L., Z. Gu, and W.-H. Li. (2003) Rate of Protein Evolution Versus Fitness Effect of Gene Deletion. Mol. Biol. Evol. 20:772-774.

Yu, H., N. M. Luscombe, J. Qian and Gerstein, M. (2003). Genomic analysis of gene expression relationships in transcriptional regulatory networks. Trends Genet 19: 422-7.

Zewail, A., M. W. Xie, Y. Xing, L. Lin, P. F. Zhang, W. Zou, J. P. Saxe, and J. Huang. (2003). Novel functions of the phosphatidylinositol metabolic pathway discovered by a chemical genomics screen with wortmannin. Proc Natl Acad Sci USA, 100:3345-50.