Most fundamental cellular processes are not performed by isolated molecules. Instead, they are the consequence of a series of coordinated activities, mediated by complex interactions between genes, proteins and various small molecules. The presence of complex interplay and emergent properties is not unique to cells but shared by a wide variety of complex systems, including the Internet, the electrical grid, and social networks. In describing these and other complex systems, network concepts have emerged as a universal language. The Gerstein lab has been involved in network research for some time, focused on how to apply network ideas to improving our understanding of biology in the post-genomic era.
In our early work, we developed methods for predicting networks from individual genome features (Jansen et al. 2002a; Qian et al. 2001; Yu et al. 2003). Later, we combined different biological datasets to increase the power of our network prediction algorithms (Edwards et al. 2002; Gerstein et al. 2002; Jansen et al. 2002b; Jansen et al. 2003; Lu et al. 2005; Xia et al. 2006) and developed new machine learning techniques (Yip and Gerstein 2009). In a sense this work culminated in the third DREAM competition in 2008 (which is similar to CASP for systems biology), where we finished first in the in silico network prediction challenge. We have also participated in many experimental network determination projects (Borneman et al. 2007; Krogan et al. 2006; Li et al. 2004; Ptacek et al. 2005). We have constructed many web tools for network analysis including Topnet (Yu et al. 2004b), tYNA (Yip et al. 2006), and PubNet (Douglas et al. 2005).
In network science, hubs are nodes having many more connections than average and tend to be essential. We have done numerous studies correlating "hubbiness" with forms of essentiality (Yu et al. 2006a; Yu et al. 2006b). We have found that apart from hubs, bottlenecks are also important (Yu et al. 2007). We recently performed an initial comparison of social and biological networks and showed that the yeast regulatory network and corporate and governmental management structures are pyramidal, with a few global regulators or leaders at the top, highly connected layers of middle management, and a bottom tier of regulators or managers whose role is to implement specific plans (Fig. 1A) (Yu and Gerstein 2006). Both structures are dominated by information flow bottlenecks in the middle layers.
Defining Functional Modules
In simple topological analysis, a module refers to a set of nodes which are densely connected, and modularity describes the degree to which a network can be divided into modules (Girvan and Newman 2002; Yu et al. 2006a). Often modules are not characterized merely by the density of connections, but by the shared function of most of their nodes. We have used several approaches to find modularity in molecular networks. By mapping gene expression data onto the yeast regulatory network, we found different subnetworks active in different conditions (Luscombe et al. 2004). We also developed a method to extract metabolic modules from metagenomic data, finding which pathways are expressed under different environmental conditions (Fig. 1B) (Gianoulis et al. 2009). Finally, we developed a way to find almost completed fully connected modules (cliques) in interaction networks (Yu et al. 2006a).
We have also explored the evolution of networks and studied the conservation and variability of different parts of the network. We defined "interologs" and showed how to compare interaction networks between organisms (Yu et al. 2004a). We also defined "regulogs" for transferring regulatory relationships between organisms. We pioneered using 3D molecular structures for analysis of protein networks (Kim et al. 2006; Kim et al. 2008). This work showed that much of the debate on the degree of conservation of hubs could be resolved by focusing on the number of structural interfaces of a protein rather than its number of partners. It also suggested different models for network evolution through gene duplication, depending on whether or not a newly created protein connects to a pre-existing structural interface. Finally, we showed that proteins under positive selection are found on the network and cellular periphery, suggesting how human variation is arranged with respect to the interactome (Fig. 1C) (Kim et al. 2007).
A) Comparison of management (left) and transcriptional regulatory (right) hierarchies (Yu and Gerstein 2006). B) Metabolic network modules active in a specific environment (Gianoulis et al. 2009). C) Positive selection occurs on the periphery of the human protein interaction network (Kim et al. 2007).
Borneman, A.R., Gianoulis, T.A., Zhang, Z.D., Yu, H., Rozowsky, J., Seringhaus, M.R., Wang, L.Y., Gerstein, M. & Snyder, M. Divergence of Transcription Factor Binding Sites Across Related Yeast Species. Science 317, 815-819 (2007).
Douglas, S.M., Montelione, G.T. & Gerstein, M. PubNet: a flexible system for visualizing literature derived networks. Genome Biol 6, 10 (2005).
Edwards, A.M., Kus, B., Jansen, R., Greenbaum, D., Greenblatt, J. & Gerstein, M. Bridging structural biology and genomics: assessing protein interaction data with known complexes. Trends Genet 18, 529-536 (2002).
Gerstein, M., Lan, N. & Jansen, R. Proteomics - Integrating interactomes. Science 295, 284-287 (2002).
Gianoulis, T.A. et al. Quantifying environmental adaptation of metabolic pathways in metagenomics. Proc Natl Acad Sci USA 106, 1374-1379 (2009).
Girvan, M. & Newman, M.E.J. Community structure in social and biological networks. Proc Natl Acad Sci USA 99, 7821-7826 (2002).
Jansen, R., Greenbaum, D. & Gerstein, M. Relating whole-genome expression data with protein-protein interactions. Genome Res 12, 37-46 (2002a).
Jansen, R., Lan, N., Qian, J. & Gerstein, M. Integration of genomic datasets to predict protein complexes in yeast. J Struct Funct Genomics 2, 71-81 (2002b).
Jansen, R. et al. A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302, 449-453 (2003).
Kim, P.M., Lu, L.J., Xia, Y. & Gerstein, M.B. Relating three-dimensional structures to protein networks provides evolutionary insights. Science 314, 1938-1941 (2006).
Kim, P.M., Korbel, J.O. & Gerstein, M.B. Positive selection at the protein network periphery: Evaluation in terms of structural constraints and cellular context. Proc Natl Acad Sci USA 104, 20274-20279 (2007).
Kim, P.M., Sboner, A., Xia, Y. & Gerstein, M. The role of disorder in interaction networks: a structural analysis. Mol Syst Biol 4, 7 (2008).
Krogan, N.J. et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440, 637-643 (2006).
Li, S.M. et al. A map of the interactome network of the metazoan C-elegans. Science 303, 540-543 (2004).
Lu, L.J., Xia, Y., Paccanaro, A., Yu, H.Y. & Gerstein, M. Assessing the limits of genomic data integration for predicting protein networks. Genome Res 15, 945-953 (2005).
Luscombe, N.M., Babu, M.M., Yu, H.Y., Snyder, M., Teichmann, S.A. & Gerstein, M. Genomic analysis of regulatory network dynamics reveals large topological changes. Nature 431, 308-312 (2004).
Ptacek, J. et al. Global analysis of protein phosphorylation in yeast. Nature 438, 679-684 (2005).
Qian, J., Dolled-Filhart, M., Lin, J., Yu, H.Y. & Gerstein, M. Beyond synexpression relationships: Local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions. J Mol Biol 314, 1053-1066 (2001).
Xia, Y., Lu, L.J. & Gerstein, M. Integrated prediction of the helical membrane protein interactome in yeast. J Mol Biol 357, 339-349 (2006).
Yip, K.Y., Yu, H.Y., Kim, P.M., Schultz, M. & Gerstein, M. The tYNA platform for comparative interactomics: a web tool for managing, comparing and mining multiple networks. Bioinformatics 22, 2968-2970 (2006).
Yip, K.Y. & Gerstein, M. Training set expansion: an approach to improving the reconstruction of biological networks from limited and uneven reliable interactions. Bioinformatics 25, 243-250 (2009).
Yu, H.Y., Luscombe, N.M., Qian, J. & Gerstein, M. Genomic analysis of gene expression relationships in transcriptional regulatory networks. Trends Genet 19, 422-427 (2003).
Yu, H.Y. et al. Annotation transfer between genomes: Protein-protein interologs and protein-DNA regulogs. Genome Res 14, 1107-1118 (2004a).
Yu, H.Y., Zhu, X.W., Greenbaum, D., Karro, J. & Gerstein, M. TopNet: a tool for comparing biological sub-networks, correlating protein properties with topological statistics. Nucleic Acids Res 32, 328-337 (2004b).
Yu, H.Y. & Gerstein, M. Genomic analysis of the hierarchical structure of regulatory networks. Proc Natl Acad Sci USA 103, 14724-14731 (2006).
Yu, H.Y., Paccanaro, A., Trifonov, V. & Gerstein, M. Predicting interactions in protein networks by completing defective cliques. Bioinformatics 22, 823-829 (2006a).
Yu, H.Y., Xia, Y., Trifonov, V. & Gerstein, M. Design principles of molecular networks revealed by global comparisons and composite motifs. Genome Biol 7, 11 (2006b).
Yu, H.Y., Kim, P.M., Sprecher, E., Trifonov, V. & Gerstein, M. The importance of bottlenecks in protein networks: Correlation with gene essentiality and expression dynamics. PLoS Comput Biol 3, 713-720 (2007).