During the past year (2002 to 2003, academic year), the Gerstein lab has participated in the Yale CEGS in a number of ways:
1. We have developed useful tools for microarray analysis.
2. We have developed methods for assigning genes and pseudogenes in higher eukaryotic genomes, particularly the human genome.
3. We have used these microrarray tools as gene identifications and pseudogene identifications with the Snyder and Weissman laboratories.
We developed the following tool and tool approaches for the CEGS center. First, Luscombe et al. (2003) developed ExpressYourself, a generic web based microarray processing platform that we have used extensively in the center. Kluger et al. (2003) developed a spectral biclustering method to co-cluster genes and particular tissues or conditions. Qian et al. (2003) developed a way of measuring and quantifying a common spatial artifact in microarray experiments that produces various colorations. We are currently trying to implement this on method of detection for discoloration into the Express Yourself platform. Finally, Bertone et al. (2002) developed a method for tiling of whole chromosomes -- breaking them up into small bits and removing repeats. This publication is just preliminary; a more extensive one will soon follow.
Zhang & Gerstein assigned pseudogenes to the entire human draft genome sequence. In two publications, we focused on a number of large families of these psudogenes carefully annotating them. The largest psudogene family is those for ribosomal proteins and associated with this are also psudogenes for mitochondrial ribosomal protein. These are featured in two publications (Zhang et al., 2002; Zhang & Gerstein, 2003). Additional publications by Harrison & Gerstein (2002) and Harrison et al. (2002) surveyed general issues in gene identification throughout these eukaryotes, and how gene identification is closely coupled to issues of psudogene identification and protein family identification (The Snyder & Gerstein (2003) review also touches on this). Finally, Harrison & Gerstein (2003) investigated assigning pseudogenes to the fly genome. This was done as part of our method to generalize our approaches to a number of eukaryotic organisms.
During the past year, we coupled on microarray analysis tools and our gene and pseudogene assignment to aide in specific experimental collaborations associated with the CEGS Center. The main one of course is Rinn et al. (2003) paper, looking at the transcriptional activity of chromosome 22. We also participated in collaborations with Dr. Weissman looking at the myeloid differentiation program and gene expression patterns associated with it.
Identification and correction of spurious spatial correlations in microarray data
J Qian, Y Kluger, H Yu, M Gerstein. BioTechniques (in press) .
Z Zhang, M Gerstein (2003) Genomics 81: 468-80.
Genomics. Defining genes in the genomics era.
M Snyder, M Gerstein (2003) Science 300: 258-60.
Spectral biclustering of microarray data: coclustering genes and conditions.
Y Kluger, R Basri, JT Chang, M Gerstein (2003) Genome Res 13: 703-16.
The transcriptional activity of human Chromosome 22.
JL Rinn, G Euskirchen, P Bertone, R Martone, NM Luscombe, S Hartman, PM Harrison, FK Nelson, P Miller, M Gerstein, S Weissman, M Snyder (2003) Genes Dev 17: 529-40.
Identification of pseudogenes in the Drosophila melanogaster genome.
PM Harrison, D Milburn, Z Zhang, P Bertone, M Gerstein (2003) Nucleic Acids Res 31: 1033-7.
ExpressYourself: a modular platform for processing and visualizing microarray data.
NM Luscombe, TE Royce, P Bertone, N Echols, CE Horak, JT Chang, M Snyder, M Gerstein (2003) Nucleic Acids Res 31: 3477-82.
Of mice and men: phylogenetic footprinting aids the discovery of regulatory elements.
Z Zhang, M Gerstein (2003) J Biol 2: 11.
Studying genomes through the aeons: protein families, pseudogenes and proteome evolution.
PM Harrison, M Gerstein (2002) J Mol Biol 318: 1155-74.
A question of size: the eukaryotic proteome and the problems in defining it.
PM Harrison, A Kumar, N Lang, M Snyder, M Gerstein (2002) Nucleic Acids Res 30: 1083-90.
Complex transcriptional circuitry at the G1/S transition in Saccharomyces cerevisiae.
CE Horak, NM Luscombe, J Qian, P Bertone, S Piccirrillo, M Gerstein, M Snyder (2002) Genes Dev 16: 3017-33.
YMD: a microarray database for large-scale gene expression analysis.
KH Cheung, K White, J Hager, M Gerstein, V Reinke, K Nelson, P Masiar, R Srivastava, Y Li, J Li, H Zhao, J Li, DB Allison, M Snyder, P Miller, K Williams (2002) Proc AMIA Symp 140-4.
Z Lian, Y Kluger, DS Greenbaum, D Tuck, M Gerstein, N Berliner, SM Weissman, PE Newburger (2002) Blood 100: 3209-20.
Identification and analysis of over 2000 ribosomal protein pseudogenes in the human genome.
Z Zhang, P Harrison, M Gerstein (2002) Genome Res 12: 1466-82.
Fast optimal genome tiling with applications to microarray design and homology search.
P Berman, P Bertone, B DasGupta, M Gerstein, M-Y Kao, M Snyder. (2002) Proceedings of the 2nd International Workshop on Algorithms in Bioinformatics. Springer-Verlag LNCS 2452: 419-433 .