6. Progress Report Summary-
Biomedical
Informatics Core
1.0 A tool to rapidly characterize pathogen genomes in high throughput pipeline
PI: Mark Gerstein (Yale)
Milestones:
Predicting phenotypes and pathogenicity from known genomes
Gerstein and Lussier have invested a significant amount of time in
building
a prediction system over clusters of Genes (COGs) and phenotypes
(GIDEON database).
It also illustrates that the informatics work products are provided
from increasingly
intertwined collaborations between the inter institutional research
groups.
Predicting Essential Genes in S. cerevisiae
We have integrated over fifteen genome-scale characteristics in S.
cerevisiae,
and generated a modest predictive framework. Within three months, we
aim to
have a robust predictive system, capable of correctly identifying
essential
genes with 80% frequency, based solely on genomic and sequence data.
This
system will then be applied to bacterial systems such as E.coli, and
pathogenic
eukaryotic microbes such as C. albicans. Within six months we aim to
have
preliminary predictive system running on these two platforms.
Predicting Pathogens Pseudogenes
· Identification and characterization of pseudogenes in bacteria pathogens and other prokaryotes on genomic scale.
· A broad range of bacteria genomes, including 4 agents in CDC’s category A, B and C list (Escherichia coli O157:H7,Vibrio cholerae, Brucella melitensis, Yersinia pestis ) and many other pathogens, have been selected with archaea genomes. A comprehensive method has been developed to perform genome wide analysis and identification of pseudogenes.
· A total of about 7000 pseudogenes have been identified in 64 genomes studied. The identified pseudogenes occur in at least 1 to 5% of all gene-like sequences in prokaryote genomes. The pseudogenes have been classified by their functional categories. Although many large populations of pseudogenes arise from large, diverse protein families (for example, the ABC transporters), notable numbers of pseudogenes are associated with specific families that do not occur that widely. These include the cytochrome P450 and PPE families (PF00067 and PF00823) and others that have a direct role in DNA transposition.
· It was also demonstrated that a large fraction of prokaryote pseudogenes arose from failed horizontal transfer events. In particular, we find that pseudogenes are more than twice as likely as genes to have anomalous codon usage associated with horizontal transfer. Moreover, we found a significant difference in the number of horizontally transferred pseudogenes in pathogenic (O157:H7) and non-pathogenic strain (k12) of Escherichia coli.
Publications:
· Y Liu, Harrison PM, Kunin V, Gerstein M. Comprehensive
analysis of
pseudogenes in prokaryotes: widespread gene decay and failure of
putative
horizontally transferred genes. Genome Bio 2004;l 5: R64.
· H Yu, D Greenbaum, H Xin Lu, X Zhu, M Gerstein. Genomic analysis of essentiality within protein networks. Trends Genet 2004 20: 227-31. Review.