Many fundamental cellular processes involve protein networks, and comprehensively identifying them is important to systematically defining protein function (Eisenberg et al., 2000; Lan et al., 2002, 2003). Complex networks are also used to describe the structure of a number of wide-ranging systems including the internet, power grids, the ecological food web and scientific collaborations. Despite the seemingly huge differences among these systems, it has been shown that they all share common features in terms of network topology(Albert and Barabasi, 2001; Albert et al., 1999, 2000; Amaral et al., 2000; Barabasi and Albert, 1999; Huberman and Adamic, 1999; Jeong et al., 2001; Watts and Strogatz, 1998; Gavin et al. 2006, Krogan et al. 2006). Thus, networks provide a framework for describing biology in a universal language understandable to a broad audience (Girvan and Newman, 2002).

Currently, large-scale experiments have created a great variety of genome-wide information related to protein networks, especially in the yeast Saccharomyces cerevisiae. There are datasets of explicit protein-protein interactions (Gavin et al., 2002; Ho et al., 2002; Ito et al., 2000; Uetz et al., 2000), experimentally derived regulatory relationships (Lee et al., 2002), manually curated interactions such as MIPS, BIND, and DIP (Bader et al., 2003; Mewes et al., 2002) and systems for automatically finding interactions in the literature (Friedman et al., 2001). In addition to the experimentally-derived interaction networks, there are also predicted interactions (Valencia and Pazos, 2002). The most common methods used in predicting protein-protein interactions are based on âœguilt-by-associationâ. Two proteins are more likely to interact if they share several correlated genomic features. Examples of these genomic features are gene expression profiles (DiRisi et al., 1997), phylogenetic profiles (Pellegrini et al., 1999), essentiality (Winzeler et al., 1999), localization (Kumar et al., 2002), and gene neighborhood (Tamames et al., 1997). Comparative genomics also provides an efficient way for mapping genome-wide interactions between different organisms (Walhout et al., 2000; Yu et al., 2004).

Summary of Some Past Results

Predicting Protein Networks

Correlating Interactions with Complexes and Genomic Features

We have developed methods to assess protein-protein interactions and also regulatory relationships, correlating them with structures of known complexes and with function. In particular, Jansen et al. (2002a) developed a method for looking at the correlation between expression levels and their fluctuations and known interactions. This allowed us to find significant differences in expression correlations between transient and permanent complexes. Edwards et al. (2002) compared known complexes to the interactions from the database. Finally, Lan et al. (2002, 2003) looked at the relationship between functional categories and interactions, showing that interactions could be used to systematically circumscribe and define function.

Predicting Protein Networks from Individual Genomic Features:

We have developed methods for predicting regulatory relationships and protein-protein interactions from individual types of genomic data. Jansen et al. (2002) and Qian et al. (2001) looked at the degree to which expression correlations could predict interactions and found that a subset of known interactions could be predicted with high confidence. In addition, Qian et al. (2001) looked at new types of expression correlations, those that had a specifically time shifted or inverted relationship. Finally, we developed an approach based on support vector machines to predict the target of a transcription factor based on finding relatively subtle relationships between their expression profiles (Qian et al., 2003).

Data Integration of Multiple Features to Improve Prediction

We have developed methods of combining various genomic features that produces an interaction prediction that is stronger than each of the individual features. This is important both for known protein-protein interaction data sets, which suffer from a great degree of noise, and also for genomic features such as expression correlation which are only weekly predictive of interactions. Our first analyses used simple combinations of features (Edwards et al., 2002; Jansen et al., 2002b; Gerstein et al., 2002). Then we moved on to developing more sophisticated Bayesian-network approaches that combine features in a way that optimizes their predictive value (Jansen et al, 2003 b). In Lu et al (2005), we saw how this result scaled with the number of features. In Xia et al (2006), we extended it to membrane proteins.

Analysis of the Network Structure

Analysis of the Global Structure of Networks and Comparison of Networks

We have carried out a number of studies looking at the overall statistics of gene networks, finding that a number of them have a very similar overall power law type of distribution to those found for the occurrence of gene families (Luscombe et al., 2002; Xia et al., 2004; Yu et al., 2007). In terms of smaller scale structures, Yu et al (2006a) analyzed regulatory networks in yeast and showed that they have a pyramid-shaped hierarchical structure, similar in some sense to governmental "org-charts", with a small number of master transcription-factors on top. Finally, Yu et al (2006b) showed how defective cliques within networks could be completed, defining large complexes and potentially predicting more interactions.

We interrelated regulatory and expression networks and found that genes targeted by the same transcription factor tend to have correlated expression (Yu et al., 2003). In Yu et al (2006c), we compared many different types of networks, defining composite hubs and motifs.

Mapping Networks between Organisms

In Yu et al (2004), we showed how one could compare networks between organisms and use this comparison to help in network prediction. In particular, we showed how interologs could be transferred between organisms as a function of sequence similarity. We also defined a related concept for the transferring of regulatory relationships ("regulog"), and we established a database of mapped relationships (

Analysis of the Dynamics of Networks

We examined the dynamics of the regulatory system in yeast on a genomic scale by integrating gene expression data for five cellular conditions with known transcriptional regulatory relationships (Luscombe et al., 2004). To rigorously compare these condition-specific subnetworks we developed SANDY (Statistical Analysis of Network Dynamics). We found that these subnetworks exhibit vastly different topologies on both a local and global level and uncovered two separate groups of cellular states. Moreover, we showed that different sets of transcription factors become key regulatory hubs at different times, portraying a network that shifts its weight between different foci to bring about distinct cellular states. Following on in a subsequent analysis, in Yu et al (2006c), we analyzed the expression relationships in small network motifs, showing that many of them in metabolic pathways have a time-shifted quality.

3D Structural analysis of protein interaction networks
While there has been considerable interest in protein interaction networks and their role in cell function, most studies, thus far, have neglected the biophysical properties of the proteins involved. We have pioneered the use of 3D protein structures for analysis of protein networks (Kim et al., 2006). This approach gave us a unique perspective on protein networks and showed that many network properties previously thought to relate to biological features were actually more reflective of biophysical quantities.


