Simulating Water and the Molecules of Life
Computer modeling reveals how water affects the structure and dynamics of biological molecules, yielding clues to their function
by Mark Gerstein and Michael Levitt
(Scientific American, Nov. 1998, pg. 100-105)
In most places in the world, water is cheap, if not free. But during the summer of 1986, one of us (Levitt) spent half a million dollars on an amount of water that would scarcely wet the point of a sharp pin. The money was not to buy a vanishingly small amount of water. Rather, it was to pay for about two weeks of time on a gigantic, state-of-the-art supercomputer required to create a model of how the water affected the structure and movement of a particular protein.
The protein was bovine pancreatic trypsin inhibitor (BPTI), which is found in the pancreases of cattle. BPTI is a favorite subject of computer modelers simply because it is relatively small, and therefore easier to study than most other proteins. It had been modeled before, but only "in vacuo," as if in a vacuum--without any other molecules interacting with it. No one had visualized BPTI as it really exists in a living cell, with thousands of water molecules surrounding it.
The half million dollars turned out to be well spent. Not only did Levitt and colleague Ruth Sharon find the previous, in vacuo model of BPTI a poor predictor of how the protein looked and behaved in the real world, the discovery helped pave the way for computational chemists to simulate the structures of other biological molecules in their native, watery environs.
Today, given the great advances in computing technology, we can model proteins such as BPTI and their associated water molecules on a desktop computer in a couple of days, using roughly 80 cents of electricity. Scientists have now simulated the aqueous ("in water") structures of more than 50 proteins and nucleic acids, such as DNA and RNA.
Why is understanding the effects of water on the shapes of biological molecules so important? Principally, because a molecule's structure yields clues to how it functions, helping scientists to decipher the intricate biochemical interactions that add up to life. On a more practical level, understanding structure is crucial for researchers designing new drugs to block or enhance various biochemical pathways.
The Water Within
In order to understand how water affects the structures of biological molecules, however, we must first appreciate the distinctive properties of water itself. These properties principally stem from waters unique hydrogen bonded structure and the way this structure allows water to "manage" the charges on other dissolved molecules.
A single water molecule (H2O) has an essentially tetrahedral ("four-sided") geometry with an oxygen at the center, hydrogens at two vertices, and clouds of negative charge at the other two vertices. As the clouds of negative charge are not usually drawn, water is conventionally depicted with a "V-like" shape. Each side of the V (an oxygen-hydrogen bond) is almost 10-8 centimeters long and the angle between the two of side is close to 106 degrees (slightly off the 109 degrees of a perfect tetrahedron).
Because of the details of the electron structure around the water molecule, the oxygen is more electronegative than the hydrogens. This means that while overall the water molecule is electrically neutral, charge is distributed across it in a polar fashion, with slightly more negative charge on the oxygen and counterbalancing positive charges on the hydrogens.
Because of this "polarity," interactions between the positively charged hydrogen of one molecule and the negatively charged oxygen of another are particularly favorable. These are called hydrogen bonds (see illustration on page tk). Reflecting its essentially tetrahedral geometry, each molecule in liquid water often forms four hydrogen bonds: two between its hydrogens and the oxygen atoms of two other water molecules and two between its oxygen atom and the hydrogens of two other water molecules. However, the detailed structure of liquid water, unlike that of regular crystalline ice, is quite random and irregular, so the actual number of hydrogen bonds per water molecule can vary quite a bit, usually ranging from 3 to 6. (The average value is 4.5.) The necessity of maintaining a tetrahedral, hydrogen-bonded structure gives water a very "open" and loosely packed structure in comparison to most other liquids, such as oil or liquid nitrogen. In fact, one explanation for the unusual properties of water arises from its need to satisfy two sets of conflicting requirements: making an tetrahedral network of hydrogen bonds and packing together as closely as possible so as to eliminate empty spaces in the liquid.
To construct a computer model of pure water, we need to take into account two different types of forces: intra-molecular and inter-molecular. The interactions within a water molecule (intra-molecular) are modeled in terms of the short-range, spring-like forces created by the chemical bonds between each molecule's hydrogens and oxygen, and the interactions between water molecules (inter-molecular) are modeled in terms of long-range, electrical forces. The intra-molecular spring-like forces restrain the lengths of the bonds between the oxygen of each water molecule and its hydrogens--and the angle formed between each of these bonds--to certain set values. These forces behave like springs in the sense that the more an outside force distorts the bonds, the more the bonds resist the force.
The long-range forces between water molecules behave oppositely from the spring-like, intra-molecular forces: they decrease in magnitude with increasing distance. Fundamentally, the long-range forces arise from the natural attraction between opposite charges and the natural repulsion between similar charges. They give rise to the specific hydrogen bonds, discussed above, as well as to van der Waals forces that generally bring the atoms together.
In the late 1960s at Bell Laboratories, Anessur Rahman and Frank Stillinger pioneered computer simulation of water molecules. They simulated the motion of 216 water molecules in a rectangular box. In their five-picosecond simulation--the longest possible using the computing technology of the time--Rahman and Stillinger found that the complicated behavior of water was a direct consequence of the simple energy terms operating between the many hundred of water molecules. The simulation was able to quantitatively reproduce many of the bulk properties of water, i.e. its average structure, rate of diffusion, heat of vaporization, etc.
The first computer simulation of a protein in vacuo was performed by Martin Karplus and his colleagues in 1977. They determined that BPTI moves a great deal at room temperature leading to a revision of the rigid view of molecules provided by static crystallographic structures. In fact, this motion was a little too large mainly due to the omission of the surrounding water molecules.
Simulating Life
The importance of water in living processes, however, derives not only from its ability to form hydrogen bonds with other water molecules, but from its capacity to interact with various types of biological molecules as well. Because of its polar nature, water readily interacts with other polar and charged molecules, such as acids, salts, and sugars. It, thus, has the ability to solubilize and dissolve these molecules, which are, consequently, called hydrophilic ("water-loving"). In contrast, water does not interact well with non-polar molecules, such as fats and oils, giving rise to the commonplace observation that oil and water do not mix. Non-polar molecules are, consequently, termed hydrophobic ("water-fearing").
The essential difference between hydrophobicity and hydrophilicity is involved the formation of much biological structure. Biological macromolecules, such as proteins and DNA, contain both hydrophobic and hydrophilic parts arranged in long chains. The formation of structure involves the folding of the chain into a more compact arrangement so that hydrophilic groups are exposed on the surface where they interact with water and hydrophobic groups are buried in the interior where they clump together. In particular, two chains of DNA are usually found in a double helix. This structure has the effect of burying the hydrophobic atoms of the base pairs and exposing the charged phosphate backbone. Similarly, a single protein chain usually folds into an irregular, ball-like structure. Again this structure has the effect of burying the hydrophobic "sidechain" groups at the center of the ball and exposing hydrophilic ones on the outside. Walter Kauzmann first proposed such a hydrophobic effect was crucial for protein folding in 1959, and the role of hydrophobicity in protein folding is still a subject of great interest, many questions remaining unanswered. Biological membranes and micelles are also formed by a similar process but on a much larger scale, involving the collective orientation of many lipid molecules to point their hydrophilic "heads" toward the water and the hydrophobic "tails" away.
There are three types of waters that must be considered when building a computer model of a biological macromolecule in aqueous solution: the "ordered waters" that immediately surround and strongly interact with the macromolecule, the "bulk waters" beyond, and any waters that may be buried within the macromolecule (see illustration on page tk). A single cell contains many billions of water molecules. Almost all of the space not occupied by the atoms of biological molecules is filled with water. Human cells are, in fact, mostly water, and overall the human body is roughly 90 percent water by weight.
How do we model all of these waters, together with the individual atoms of a biological molecule? In simple terms, we first describe the basic interactions between all of the atoms and then let the system evolve according to the laws of Newtonian physics. Such a simulation requires two basic ingredients: a way to describe the interactions within and among water and biological molecules--the intra- and intermolecular forces discussed previously--and a procedure for charting their movement through time, which is called molecular dynamics.
Molecular dynamics produces a sequence of configurations very much like "frames" in a movie. Each atom moves through time in a series of discrete steps, called timesteps. Essentially, the new position of an atom is its old position plus the distance it traveled during a given timestep. If no forces acted on an atom, the distance it traveled would simply be a function of its velocity at the previous position, because distance equals speed multiplied by time. However, during the course of a timestep, the forces exerted by other atoms cause the atom to accelerate, which in turn changes its velocity. If the forces are constant during the timestep, Newton's laws dictate that the change in velocity is proportional to the force, so we can calculated an updated velocity. We then use this updated velocity to calculate the new position of the atom. In the strongly interacting atoms of a liquid, atoms cannot move very far, and it is necessary to use a very short timestep, one femtosecond (10-15 second). During this period of time, a water molecule moves only 1/500th of its molecular diameter.
In capsule, one can visualize the molecular-dynamics calculation as follows: Imagine you are moving about in a crowded room at a party. Each time you take a step, you look around the room, find an open place nearby, close your eyes, and step to the open place. While you are closing your eyes and taking your step, everyone else in the room is doing the same thing. Thus, only if everyone takes small steps, can you be sure there will be no embarrassing collisions.
In a long simulation, calculating each timestep for all the atoms in a biological molecule together with its ordered waters yields an enormous amount of data. A small protein in water, for instance, produces half a million sets of Cartesian coordinates in a nanosecond, describing the positions of about 10,000 atoms (see box on page tk). The "movie" generated by such a simulation is very detailed. We can see every water molecule rotating, shifting and vibrating over millions of frames.
To illustrate how computer simulation can depict the way water affects molecular dynamics, let us consider two simple organic molecules, isobutene and urea, which have similar shapes but very different properties. Isobutene is a Y-shaped, nonpolar (and therefore hydrophobic) molecule whose backbone consists of a four carbon atoms, two of which are linked by a double bond ((CH3)2C=CH2). It is generated in oil refineries. Urea has the same Y-shaped structure as isobutene, but with the latter molecule's double bonded carbon atoms replaced by a carbonyl group and its other two carbon atoms replaced by amino groups ((NH2)2C=O). It results from protein metabolism and is excreted in urine. Unlike isobutene, urea is a strongly polar molecule that is hydrophilic.
When we carry out molecular dynamics simulations on both isobutene and urea, we see that water behaves very differently with the two molecules (see illustration on page tk). Water molecules directly interact with urea, forming hydrogen bonds with urea's oxygen and hydrogen atoms as well as with one another. In contrast, water molecules turn away from isobutene and form hydrogen bonds only among themselves, creating a cage of ordered waters surrounding the molecule.
Visualizing how water molecules interact with such simple molecules helps us to understand the behavior of water with more complex biological molecules such as proteins and nucleic acids.
Water is integral to the structure of DNA. Dehydrating the normal form of DNA, called the "B" form, yields a so-called "A" form whose double helix is more squat and fat (see illustration on page tk). This is evident in simulations of water around DNA, where one can see how the water molecules are able to interact with almost all parts of the relatively open double-helical structure.
In contrast, water is not able to penetrate as deeply into protein structure. So analysis of protein-water simulations has naturally focussed on the protein surface, which forms a unique interface (see illustration on page tk). Because of water's tetrahedral geometry, it is much less tightly packed than the protein interior, which has a close-fitting structure resembling that of crystalline solids. The "meshing" at the protein surface of the differently arranged lattices for water and protein gives rise to much interesting geometry.
This is particularly true at deep grooves on the protein surface. Hydrogen-bonded water molecules have difficulty fitting into these clefts and are easily displaced. This suggests why the active sites of enzymes are often in clefts: a small, complementary shaped molecule such a ligand can bind more tightly within these spaces and readily expel the water molecules. Moreover, in water simulations one often finds that arrangement of water molecules in an empty active site mimics the geometry and structure of the actual binding ligand, a fact that is sometimes used in drug design.
Living in the Real World
How closely do these simulations resemble reality? Unfortunately, we cannot answer this question definitively because there is no experimental technique that can provide as much detailed information about individual water molecules and their interactions as computer modeling. What we can do is to compare various aggregated and averaged values derived from simulations to experimental results.
One of most important experimental approaches that can be used to verify water simulation is the scattering of neutron and X-rays. In a neutron scattering experiment, we direct a beam of neutrons at a small sample and record how these are scattered. Each space between the molecules in the sample acts as a tiny slit, yielding a characteristic diffraction pattern. By analyzing these patterns, we can determine the spacing between the molecules. When we compare neutron scattering results with computer simulations, we find good agreement in terms of the average distance between hydrogen and oxygens, i.e. in terms of the average structure of water. Recently, Axel Brünger and Bill Weis have extended diffraction techniques (this time using X-rays) to allow one to visualize the detailed density and position of water molecules around proteins without as much averaging. This allows us to compare experiment to simulation in much greater detail.
Diffraction experiments tell us about spatial structure. Information about the dynamical behavior of water is provided by the diffusion constant. This quantity is directly measured by fast a drop of red dye spreads through a beaker of water. To relate diffusion to simulation, one merely determines the average rate over many timesteps that a simulated water molecule increases its distance from its initial position.
When scientists publish models of biological molecules in journals, they usually draw their models in bright colors and place them against a blank, black background. We now know that the background in which these molecules exist--water--is just as important as they are.
The Authors
MARK GERSTEIN and MICHAEL LEVITT have collaborated on studies of how water affects the structure and function of biological molecules since 1993, when Gerstein became a postdoctoral fellow in Levitt's laboratory in the Department of Structural Biology--which Levitt chairs--at Stanford University. Levitt obtained his Ph.D. in 1971 from the University of Cambridge. He has held academic positions at a variety of institutions, including the Laboratory of Molecular Biology in Cambridge, the Salk Institute for Biological Studies in San Diego and the Weizmann Institute of Science in Rehovot, and has consulted widely for the pharmaceutical industry and founded the Molecular Applications Group. Gerstein is an assistant professor at Yale University. He completed is Ph.D. at Cambridge University in 1993.
Further Reading
ELECTRONIC ARCHIVES WITH MOVIES AND PICTURES RELEVANT TO WATER SIMULATION. Available from the author's web sites http://bioinfo.mbb.yale.edu and http://hyper.stanford.edu
M Gerstein & C Chothia (1996). "Packing at the Protein-Water Interface," Proceedings of the National Academy of Sciences USA 93: 10167-10172.
ACCURATE SIMULATION OF PROTEIN DYNAMICS IN SOLUTION. M. Levitt and R. Sharon in Proceedings of the National Academy of Sciences USA, Vol. 85, pages 7557-7561; October 1988.
Burling, F T, Weis, W I, Flaherty, K M & Brünger, A T (1996). Direct Observation of Protein Solvation and Discrete Disorder with Experimental Crystallographic Phases. Science 271: 72-77
S825+edit