Networks: A common language for complex systems

The formalism of networks provides a common language for understanding a wide range of complex systems. Network science has progressed rapidly in the past decade as large network datasets have become available in a variety of fields. Our knowledge of protein interaction networks, gene-regulatory networks, neural networks, social networks, and the Internet has expanded greatly because of technological advances that have made collecting network data easier.

An irony of the current post-genomic era is that we are simultaneously drowning in so much information that the human mind cannot comprehend it, but algorithms for inferring patterns from the data are starved for more. Biological network inference problems are under-determined, with an infinite family of solutions, since the available data is of lower dimension than the inference problem. A recent analysis of protein interactions in the yeast model organism S. cerevisiae estimated that only ~20% of the predicted interactions are known to date.

A key finding of network science has been that in many classes of real networks, connectivity is not evenly distributed. Highly connected hub nodes play a central role in network function. We and others have shown that the "hubbiness" of particular parts of biological networks is associated with how essential and conserved they are. Furthermore, we have shown that bottleneck nodes that lie between many other nodes in the network may be even more central and important than hubs in many biological networks.



In recent work, we have tried to turn directed regulatory networks into hierarchies and determine "master regulators" that sit at the top of these structures. We have found that biological regulatory networks and management organization charts for corporations and governments have similar hierarchical structures. Both biological regulatory and management hierarchies are dominated by bottlenecks in the middle layers. Participation by middle management in both decision making and implementation gives it a vital role. This fact helps explain the paradox between influence and essentiality: top management has great influence over the direction of the company, but an organization with a bad leader can survive mismanagement. On the other hand, when no decisions, either good or bad, get implemented, the entity cannot continue to function. As the saying goes, losing a good system administrator can be worse than replacing an incompetent CEO.


A functional module refers to a set of nodes which perform a specific function. Knowing the function of a network invariably aids in understanding its structure. However, the opposite is often not true. It is a challenge to infer the real network function merely from knowledge of its structure. Breaking a complex biological network into functional modules is a powerful tool in systems biology for inferring function from structure. Using gene expression data, we identified the active modules in the yeast regulatory network under different environmental and temporal conditions. We found that modules active in intricate cellular programs, such as the cell cycle, where regulation involves decision-making, have more feedback and feed forward loops. In contrast, those active in direct responses to the environment (e.g. responding to DNA damage) consist of simple linear pathways implementing specific output functions.


            More recently, using metagenomics data, we have developed a method to extract metabolic modules characteristic of different types of environments. While organisms have to be evolvable in order to adapt and survive ecological changes, the details of linkage between metabolic versatility and the environment remain to be explored.

Mapping Conservation and Evolution onto Networks

Mapping evolutionary signatures onto networks is extremely helpful in understanding their design and function. By mapping known molecular structures onto the set of interacting proteins in yeast – the interactome – we identified two classes of hub proteins and showed that their difference could resolve a long-standing debate about the evolutionary rate of hub proteins. We found that the number of structural interfaces, rather than the absolute number of partners, was a better correlate of evolutionary rate. The number of interfaces in a hub also suggests different models of network growth depending on whether or not a newly duplicated gene docks onto an existing interface.

More recently, in looking at how fast hub proteins evolve, we found that in the human interactome, proteins at the periphery are under positive selection, i.e. they evolve faster than those at the center. This highlighted an obvious correspondence between network organization and cellular organization. Proteins at the network periphery are also at the cell periphery, so selection pressure to adapt to changing signals from the external environment can explain their increased evolutionary rate.