a. Specific Aims

The aims for this year were:

* Our first aim is to develop methods to inventory all the TM-proteins in the recently sequenced microbial genomes.

* Our second aim was to look at protein-protein interactions among helical membrane proteins from a database perspective. We wanted to put the TM-helix oligomerization motifs found in the genetic screens by the Beckwith and Engelman groups (e.g. GXXXG) into a context by comparing them to the helix-helix interfaces in the database of known structures -- both of the many soluble proteins and the few TM ones.

* Our final aim was to integrate into a comprehensive database the information on the occurrence and interaction of membrane proteins generated in the first two aims with further information, e.g. related to expression.

b. Studies and Results

* Senes et al. JMB (2000). In a collaborative project with the Engelman lab, we were able to comprehensively survey the occurrence of residue pairs and triplets in transmembrane helices. This study highlighted the commonness of the GXXXG motif, which had been previously identified in the experimental screens. This work has now been published.

* Jansen & Gerstein, NAR (2000). Using our integrated database system, we were able to connect the prediction of transmembrane helices in yeast with a number of datasets giving measurements of whole genome expression levels. We had the notable result that membrane proteins are expressed at a considerably lower level than soluble proteins -- by ~22% -- and that certain broad groups of membrane proteins are expressed more highly than others -- e.g. 4-TMs are expressed at a higher level than 2-TMs.

c. Significance

We believe our analysis linking protein structure to gene-expression levels was a new type of study that potentially can highlight the overall structural characteristics of highly expressed proteins.

Our statistical results about the commonness of particular motifs in membrane proteins may be very useful in membrane protein design -- i.e. in designing dimerization motifs.

d. Plans

We plan to continue with the project as outlined in the original proposal. In particular, we hope to implement a more sophisticated transmembrane identification program and to extend the analysis of residue pair occurrences to all the recently sequenced genomes -- looking, for instance, for particular pairs that are common in one genome but not in another. We are very enthusiastic about the integration with the expression data. We hope to use this data as novel way to predict membrane localization and protein subcellular localization.

We will continue our wor on building an integrated project database.

e. Publications

R Jansen & M Gerstein (2000). “Analysis of the Yeast Transcriptome with Broad Structural and Functional Categories: Characterizing Highly Expressed Proteins,” Nuc. Acids Res. 28:1481-8.

A Senes, M Gerstein & D Engelman (2000). Statistical analysis of amino acid patterns in transmembrane helices: the GXXXG motif occurs frequently and in association with ?-branched residues in neighboring position,” J. Mol. Biol. 296:921-36.

f. Project-generated Resources

From our website, http://bioinfo.mbb.yale.edu, we make available:
- all of our transmembrane helix identifications
- statistics on the occurrence of pairs and triplets of amino acids in TM helices