* Genomic surveys of membrane proteins and membrane protein motifs: Liu et al.,
GenomeBiology (2002) and Zhang et al., JMB (2002). [Related to Aim 1]
During this year, we completed two surveys of membrane proteins and membrane
protein motifs. In the first, we grouped membrane proteins into families and
looked at their relative abundance in a number of different genomes. We also
looked at the abundance of a number of different motifs -- in particular, GXXXG.
In the second paper, we extended our motif work further, looking at the occurrence
of protein motifs, not only in known proteins but also in the intergenic regions
of genome. We called the motifs found "pseudomotifs" (in the spirit
of pseudogenes). We compared a number of eukaryotic genomes in terms of their
occurrence of these motifs.
* Helix Packing Calculations: Tsai & Gerstein, Bioinformatics (2002) [Related
to Aim 2]
This year we continued on the work on 3D helix packing calculations. We performed
a detailed sensitivity analysis of our calculations and created a database relevant
parameters, which is available through the web. This sensitivity analysis was
invaluable in helping us understand which parameters were most useful in our
packing calculations.
* Helix Interaction Motifs: Schneider et al., FEBS (2002) [Related to Aim 2]
We did a composition analysis of membrane proteins, focusing on the implications
of composition for helix-helix interactions.
* Development of Integrative Database Systems: Lin et al., NAR (2002) and Mateos
et al., Genome Research (2002) [Related to Aim 3]
This year we set up a new integrated resource, GeneCensus.org, which followed
on from last year's system PartsList.org. GeneCensus takes a more sequence and
less structural view of genome comparisons focusing on expression data, pathway
activities, and protein interactions. It has an extensive section devoted to
the occurrence of transmembrane helix interaction motifs in different genomes.
In a second paper, we collaborated with IBM researchers on developing a neural
network system for integrating many different types of genomic information.
This was used in relation to predicting various aspects of protein function
and protein class.
c. Significance
* Our survey of membrane proteins highlights the importance and overrepresentation of specific motifs .
* Our various parameter sets for packing calculations and detailed sensitivity analysis will be generally useful for helix packing calculations.
* Our GeneCensus website represents new ways of organizing "poly-genomic" information .
d. Plans
Next year, we plan to continue with the project as outlined in the original proposal. In particular:
* Ursula Lehnert will be continuing to work on the project and, as usual, will continue on with the packing calculation in collaboration with Neil Voss, Julian Graham, and Nat Echols (individuals not funded by the project). We believe we should have a significant amount of membrane helix packing and dynamics calculations done.
* We plan to continue on database integration. We hope to combine the PartsList and GeneCensus systems together with a smart-linking technology.
* We plan to continue on with our inventory membrane proteins, looking in more detail at the occurrence of particular domain combinations.
* Finally, we plan to continue surveying the occurrence of membrane to membrane proteins and membrane associated proteins in intergenic regions, looking at the fossil history of these proteins. In particular, we are keen to survey the pseudogenes associated with cytochrome c.
e. Publications
Calculations of protein volumes: sensitivity analysis and parameter database.
J Tsai, M Gerstein (2002) Bioinformatics 18: 985-95.
Thermostability of membrane protein helix-helix interaction elucidated by statistical
analysis.
D Schneider, Y Liu, M Gerstein, DM Engelman (2002) FEBS Lett 532: 231-6.
Genomic analysis of membrane protein families: abundance and conserved motifs.
Y Liu, DM Engelman, M Gerstein (2002) Genome Biol 3: research0054.
Digging deep for ancient relics: a survey of protein motifs in the intergenic
sequences of four eukaryotic genomes.
ZL Zhang, PM Harrison, M Gerstein (2002) J Mol Biol 323: 811-22.
GeneCensus: genome comparisons in terms of metabolic pathway activity and protein
family sharing.
J Lin, J Qian, D Greenbaum, P Bertone, R Das, N Echols, A Senes, B Stenger,
M Gerstein (2002) Nucleic Acids Res 30: 4574-82.
Systematic learning of gene functional classes from DNA array expression data
by using multilayer perceptrons.
A Mateos, J Dopazo, R Jansen, Y Tu, M Gerstein, G Stolovitzky (2002) Genome
Res 12: 1703-15.
f. Project-generated Resources
This past year, we have created a number of new project web resources for this grant, in particular GeneCensus.org. This is added to http://bioinfo.mbb.yale.edu and http://www.partslist.org. These sites make available:
- all of our transmembrane helix identifications and surveys of helix interaction
motifs
- the expression level of membrane proteins with known structure
- the parameters from a our packing calculations
We also have a page devoted exclusively to this grant: http://thorin.csb.yale.edu/papers/grant/mem.