The lab's research has
progressed in all areas elaborated on last year. Broadly, we have developed a number of methods for design of
microarrays (tiling arrays) in order to elucidate the relevance of intergenic
regions and thus refine the human genome annotation. In addition, we have used this kind of analyses to further
enhance our understanding of pseudogenes, which comprise a substantial part of
the intergenic regions. As an
extension of microarray expression data analysis and ChIP-chip experiments, we
have developed new methods to integrate the data generated from varied
experiments to obtain a more network-oriented view of molecules in the cell. We
have critically evaluated the power of genomic feature integration for
predicting protein-protein interactions and thus the function of proteins on a
genomic scale.
Here
I am including a brief summary of the most exciting research developments in my
laboratory over the last year.
We are part of the ENCODE
Consortium and collectively our aim is to identify functional elements of DNA
in 1% of the human genome. To this end, we are involved in developing and
analyzing chip-based transcriptome analysis and detailed analyses of the ENCODE
region, which consists of about 30 Mb of human genome sequence selected based
on several criteria. We are also carefully mapping both processed and
unprocessed pseudogenes in the ENCODE region in order to thoroughly annotate
this region. This was elaborated in the Consortium paper published in Science in 2004.
On a broader level, we have
identified transcriptionally active regions in the entire human genome using
tiling arrays representing sense and anti-sense strands of the entire
non-repetitive sequence of the human genome. This work was published in Science (Bertone et al, 2004). In addition to known protein
coding regions, we found about 10,500 novel transcribed sequences. These
results provide a draft expression map for the entire human genome.
Interestingly, a large amount of such sequences are in intergenic regions and,
in addition, many of these sequences are conserved amongst several mammalian
species. This suggests that such regions of intergenic DNA must have specific
cellular roles, perhaps of regulatory or structural importance.
We have recently published a
very detailed analysis of pseudogenes in chromosome 22 and evidence for some
transcribed pseudogenes (Zheng et al., 2005). We integrated pseudogene
annotations published by various research groups and made a unified list of 525
pseudogenes in Chromosome 22.
We integrated this information with the global transcription analyses of
the human genome as described in the previous paragraph. About one fifth of the pseudogenes on
Chr22 appear to be transcribed, again hinting at hitherto unknown functions for
pseudogenes.
Furthermore, we have
computationally assessed processed pseudogenes throughout the genome to look
for evidence of transcription. Based on intersection with the global tiling
data and expressed sequence tags, we find that about 5% of processed
pseudogenes are indeed transcribed. This is an exciting new era for study of
pseudogenes. This was published
recently in Nucleic Acids Research
(Harrison et al., 2005). While pseudogenes have been widely regarded as
non-functional, these latest two studies suggest that, at least in some cases,
this may not hold true.
Molecular network analysis is
at the core of current functional genomic research. Integration of several different kinds of high-throughput
data such as mRNA expression profiles, ChIP-chip data, protein-protein
interaction data is essential to understand biological processes as all of them
are interconnected. We have made
significant progress in developing a methodology for network analysis. We
briefly summarize the results of two important papers in this regard. We have
also written an extensive review on biological networks and analysis (Xia et
al, 2004).
Transcriptional regulatory
networks play a central role in directing gene expression changes in response
to internal and external stimuli. Biological networks exhibit complex dynamic
behavior, thereby enabling cells to react to varied conditions. We examined the
dynamics of the regulatory system in yeast on a genomic scale by integrating
gene expression data for five cellular conditions with known transcriptional
regulatory relationships (Luscombe et al., 2004). This work was published in Nature. By integrating (condition unspecific) regulation and
condition specific data (e.g. microarray data from different conditions) we
developed a trace-back algorithm to uncover subnetworks that are active under
specific conditions. To rigorously compare these condition-specific subnetworks,
we developed SANDY (Statistical Analysis of Network Dynamics). We show that
different sets of transcription factors become key regulatory hubs at different
times, portraying a network that shifts its weight between different foci to
bring about distinct cellular states. As highly connected transcription factors
have a tendency to be lethal when removed from the system, the transient nature
of hubs has implications for the condition-dependent lethality for these
transcription factors.
We have quantitatively
evaluated the importance of non-essential genes based on protein-protein
interaction networks and analysis of the topological characteristics of
networks (Yu et al, 2004). Non-essential genes make significant, but small,
contributions to the fitness of the cell, but the effects may not be large
enough to be detected by conventional methods. We have evaluated Ňmarginal
essentialityÓ which we define as a quantitative measure of a non-essential
geneŐs importance to a cell. We calculated marginal essentiality values for
yeast genes using the results from a diverse set of four large-scale functional
genomics experiments examining different aspects of a proteinŐs impact on cell
fitness. Marginally essential genes tend to occupy network hubs just as
essential genes do. Marginally essential genes are more likely to have more
interaction partners than non-essential genes. Such kinds of network analyses
enable one to identify backup genes -- i.e. genes that are not essential for
survival of an organism by themselves, but in the absence of another gene are
lethal. If moved to pathogens, such
studies could be potentially useful in identifying drug targets in these microbes.
It is possible that a drug intended for a specific target protein may not work
because of the presence of a backup gene. Targeting this latter gene could result
in a better antibiotic against the invading pathogen.
Our collaborations with experimentalists are continuing
through a number of centers, as they were last year.
NBC. We are involved with the NBC, the Northeast
Biodefense Center, which involves Columbia, Yale, and a number of other
northeastern universities. The Gerstein lab contribution is part of the
informatics core.
CEGS.
Our pseudogene research is undertaken as part of an NIH Center of
Excellence in Genomic Sciences (CEGS), which is focused on constructing large
human microarrays. In particular,
our pseudogene work forms a valuable backdrop to experimental work aimed at
accurately identifying genes and annotating genomes, as well as probing the
sequence characteristics of intergenic regions.
NESG.
As was the case last year, we have performed a variety of computational
analyses designed to interface with experimental structural genomics. This is the direct experimental
complement of the computational analysis proposed in the grant. Our efforts are part of the North East
Structural Genomics Consortium (NESG). Continuing last yearŐs efforts, we have
designed approaches to pick targets prospectively for subsequent structural
analysis, followed by retrospective data mining on the results.
Mass-Spec Finally, our work relating mRNA abundance
and levels of gene expression has become an integral part of the NHLBI/Yale
Proteomics Center, focusing on proteomics and mass-spec. As part of the work associated with
this center; we have been collaborating with Ken Williams, Director of the Keck
Foundation Biotechnology Resource Laboratory at Yale, on relating mRNA
expression and protein abundance.
One of the exciting
developments resulting from my Keck funding has been the collaborations that
have sprouted with other Keck Distinguished Young Scholars from several
different classes. These
collaborations have essentially developed through interactions promoted by the
Keck symposium. Specifically, these collaborations are:
Kevin White, Yale
University (2003 awardee).
We are working with him on identifying primers in the human genome for
microarrays and we are also interacting a bit on the construction of genetic
networks. We just recently had a collaborative paper published on microarray
analysis (Gilad et al., 2005).
John Moran, University
of Michigan (2000 awardee). We
have decided to co-organize a FASEB meeting on retro-elements and pseudogenes
in the eukaryotic genomes. I will chair a session at this meeting in June of
2005.
Judith Frydman,
Stanford University (1999 awardee). Our collaboration focuses on determining
whether there are different proteomic properties associated with yeast proteins
that are substrates for the chaperonin TRiC rather than for yeast proteins in
general. We have found some
interesting indications that certain folds, such as the WD40, tend to be
preferred by the chaperone and we are investigating other properties such as
function and contact order. We
currently have one paper on this topic about to be submitted.
Below are listed all of my papers published during the
past two years. Reprints are enclosed for those highlighted "{keck reprint}". I have made a
website of all relevant Keck Foundation publications and URLs: http://bioinfo.mbb.yale.edu/papers/grant/keck.
D Zheng, Z Zhang, PM Harrison, J Karro, N
Carriero, M Gerstein (2005). "Integrated pseudogene annotation for
human chromosome 22: evidence for transcription." J Mol Biol 349: 27-45. {keck
reprint}
P Bertone, M
Gerstein, M Snyder (2005). "Applications of DNA tiling arrays to
experimental genome annotation and regulatory pathway discovery." Chromosome Res 13: 259-74.
Y Gilad, SA Rifkin, P Bertone, M Gerstein,
KP White (2005). "Multi-species microarrays reveal the effect of sequence
divergence on gene expression profiles." Genome Res 15: 674-80. {keck
reprint}
PM Harrison, D Zheng, Z Zhang, N Carriero, M Gerstein (2005). "Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability." Nucleic Acids Res 33: 2374-83.
D Huber, D Boyd, Y Xia, MH Olma, M Gerstein, J Beckwith (2005). "Use of Thioredoxin as a Reporter To Identify a Subset of Escherichia coli Signal Sequences That Promote Signal Recognition Particle-Dependent Translocation." J Bacteriol 187: 2983-91.
TB Acton, KC
Gunsalus, R Xiao, LC Ma, J Aramini, MC Baran, YW Chiang, T Climent, B Cooper,
NG Denissova, SM Douglas, JK Everett, CK Ho, D Macapagal, PK Rajan, R Shastry,
LY Shih, GV Swapna, M Wilson, M Wu, M Gerstein, M Inouye, JF Hunt, GT
Montelione (2005). "Robotic cloning and protein production platform of the
northeast structural genomics consortium." Methods Enzymol 394:
210-43.
S Balasubramanian, Y Xia, E Freinkman, M Gerstein (2005). "Sequence variation in G-protein-coupled receptors: analysis of single nucleotide polymorphisms." Nucleic Acids Res 33: 1710-21. {keck reprint}
H Grosshans,
T Johnson, KL Reinert, M Gerstein, FJ Slack (2005). "The temporal
patterning microRNA let-7 regulates several transcription factors at the larval
to adult transition in C. elegans." Dev Cell 8:
321-30.
V Alexandrov, U Lehnert, N Echols, D Milburn, D Engelman, M Gerstein (2005). "Normal modes for predicting protein motions: a comprehensive database assessment and associated Web tool." Protein Sci 14: 633-43.
NR Voss, M Gerstein (2005). "Calculation of standard atomic volumes for RNA and comparison with proteins: RNA is packed more tightly." J Mol Biol 346: 477-92.
D Greenbaum, A Smith, M Gerstein (2005). "Impediments to database interoperation: legal issues and security concerns." Nucleic Acids Res 33: D3-4.
N Carriero, MV Osier, KH Cheung, PL Miller, M Gerstein, H Zhao, B Wu, S Rifkin, J Chang, H Zhang, K White, K Williams, M Schultz (2005). "A high productivity/low maintenance approach to high-performance computation for biomedicine: four case studies." J Am Med Inform Assoc 12: 90-8.
The ENCODE (ENCyclopedia Of DNA Elements) Project. ENCODE Project Consortium (2004) Science 306: 636-40.
Computational analysis of membrane proteins: genomic occurrence, structure prediction and helix interactions. U Lehnert, Y Xia, TE Royce, CS Goh,Y Liu, A Senes, H Yu, Z Zhang, DM Engelman, M Gerstein (2004). Quaterly Review in Biophysics 37: 1-26.
An XML-Based Approach to Integrating Heterogeneous Yeast Genome Data. KH Cheung, D Pan, A Smith, M Seringhaus, SM Douglas, M Gerstein. 2004 International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences (METMBS); pp 236-242.
EJ White, O Emanuelsson, D Scalzo, T Royce, S Kosak, EJ Oakeley, S Weissman, M Gerstein, M Groudine, M Snyder, D SchAbeler (2004). "DNA replication-timing analysis of human chromosome 22 at high resolution and different developmental states." Proc Natl Acad Sci U S A 101: 17771-6.
P Berman, P Bertone, B Dasgupta, M Gerstein, MY Kao, M Snyder (2004). "Fast optimal genome tiling with applications to microarray design and homology search." J Comput Biol 11: 766-85.
P Bertone, V
Stolc, TE Royce, JS Rozowsky, AE Urban, X Zhu, JL Rinn, W Tongprasit, M
Samanta, S Weissman, M Gerstein, M Snyder (2004). "Global identification
of human transcribed sequences with genome tiling arrays." Science 306: 2242-6. {keck
reprint}
N Lin, B Wu, R
Jansen, M Gerstein, H Zhao (2004). "Information
assessment on predicting protein-protein interactions." BMC
Bioinformatics 5:
154.
DA Hall, H Zhu, X Zhu, T Royce, M Gerstein, M Snyder (2004). "Regulation of gene expression by a metabolic enzyme." Science 306: 482-4.
A Kumar, M Seringhaus, MC Biery, RJ Sarnovsky, L Umansky, S Piccirillo, M Heidtman, KH Cheung, CJ Dobry, MB Gerstein, NL Craig, M Snyder (2004). "Large-scale mutagenesis of the yeast genome using a Tn7-derived multipurpose transposon." Genome Res 14: 1975-86.
R Jansen, M Gerstein (2004). "Analyzing protein function on a genomic scale: the importance of gold-standard positives and negatives for network prediction." Curr Opin Microbiol 7: 535-45.
NM Luscombe, MM Babu, H Yu, M Snyder, SA Teichmann, M Gerstein (2004). "Genomic analysis of regulatory network dynamics reveals large topological changes." Nature 431: 308-12. {keck reprint}
Y Liu, PM Harrison, V Kunin, M Gerstein (2004). "Comprehensive analysis of pseudogenes in prokaryotes: widespread gene decay and failure of putative horizontally transferred genes." Genome Biol 5: R64.
Z Zhang, M Gerstein (2004). "Large-scale analysis of pseudogenes in the human genome." Curr Opin Genet Dev 14: 328-35.
Z Wunderlich, TB
Acton, J Liu, G Kornhaber, J Everett, P Carter, N Lan, N Echols, M Gerstein, B
Rost, GT Montelione (2004). "The protein target list of the Northeast
Structural Genomics Consortium." Proteins 56:
181-7.
MM Babu, NM Luscombe, L Aravind, M Gerstein, SA Teichmann (2004). "Structure and evolution of transcriptional regulatory networks." Curr Opin Struct Biol 14: 283-91.
Y Xia, H Yu, R Jansen, M Seringhaus, S Baxter, D Greenbaum, H Zhao, M Gerstein (2004). "Analyzing cellular biochemistry in terms of molecular networks." Annu Rev Biochem 73: 1051-87. {keck reprint}
JL Rinn, JS Rozowsky, IJ Laurenzi, PH Petersen, K Zou, W Zhong, M Gerstein, M Snyder (2004). "Major molecular differences between mammalian sexes are involved in drug metabolism and renal function." Dev Cell 6: 791-800.
Research Scientists |
|
|
|
Joel Rozowsky (Assoc.) |
|
|
Suganthi Balasubramanian (Assoc.)
|
|
|
Valery Trifonov (partial
commitment) Nick Carriero (partial
commitment) |
|
|
|
|
Systems People |
|
|
|
Mihali Felipe |
|
|
Michael Wilson |
|
|
|
|
Post-doctoral Associates
& Fellows |
|
|
|
Anne Counterman |
NIH fellowship |
|
Olof Emanuelsson |
Wallenberg Foundation |
|
Chern-Sing Goh |
NIH fellowship |
|
Philip Kim |
|
|
Long (Jason) Lu |
|
|
Alberto Paccanaro Alexander Karpikov |
|
|
Rajkumar Sasidharan |
|
|
Yu (Brandon) Xia |
Jane Coffin Childs |
|
Deyou Zheng |
|
|
|
|
PhD Students |
|
|
|
Paul Bertone (MCDB) |
|
|
Samuel Flores (Physics) |
|
|
Tara Gianoulis (CBB) |
|
|
Thomas Royce (CBB) |
|
|
Michael Seringhaus
(MB&B) |
NSERC |
|
Andrew Smith (CS) |
|
|
Haiyuan Yu (MB&B) |
|
|
|
|
Medical Students |
|
|
|
Yuen-Jong Liu |
|
|
|
|
Undergrads |
|
|
|
Brian Wayda |
|
|
David Lu |
|
|
Chantelle Southerland |
|
|
Michael Schwartz |
|
|
Scott Zhu |
|
|
Stephen Kriss |
|
|
|
|
Assistant |
|
|
|
Joann DelVecchio |
|
|
|
|
Genomics & Bioinformatics (MB&B 452a/752a) |
|