RECENT YALE CEGS TOOLS AND DATASETS AVAILABLE ON THE WEB (17-Sep-08)

II ) Specific datasets

a) SVs

Sequence data were deposited in the small reads archive. Variants were deposited in the database of genome variants. Expression data was deposited into GEO. Accessions are listed at: http://sv.gersteinlab.org/index_files/data.html


Accession

Data Type

Sample(s)

 

 

 

GSE9002

Array-CGH (microarray)*

NA15510 vs. NA18505

SRA000197

PEM paired-ends (DNA)

NA15510

SRA000198

PEM paired-ends (DNA)

NA15510

SRA000199

PEM paired-ends (DNA)

NA18505

SRA000200

PEM paired-ends (DNA)

NA18505

SRA000201

PEM paired-ends (DNA)

NA18505

SRA000202

PEM paired-ends (DNA)

NA18505

SRA000203

PEM paired-ends (DNA)

NA18505

SRA000204

Amplicon pool sequences (DNA)

NA15510

SRA000205

Amplicon pool sequences (DNA)

NA18505

SV data was loaded into our breakpoint database, BreakDB. BreakDB is located at http://sv.gersteinlab.org/breakdb. To access the Korbel et al. data, for example, either a text file containing all the SVs can be downloaded by selecting Korbel Release 1, or individual breakpoint events can be viewed within BreakDB. To view breakpoint events, first, select View by Source, then Korbel et al. (2007). Each breakpoint event is listed and contains information such as location, event type, flanking sequences and a suggested mechanism.


b) RNA Sequencing


Data were deposited in the NCBI GEO. Three types of files exist:


1. raw sequence reads/quality

2. processed data (exonic, junction and polyA reads)

3. a meta file describing the experiment

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE11209

Ours was the first file of this type sent to GEO.


In addition the Science website contains a file with all new annotation/expression level info (Figure S4).


In addition our website has:

1. Gbrowse track for the new annotations.

2. Gbrowse track for novel transcribed regions.

3. table for ORFs have heterogenous polyA sites.

4. the list of introns confirmed by RNA-Seq

http://www.yale.edu/snyder/Naga2008sup.html


Software:

The core of the software is maintained at:

https://sourceforge.net/projects/nxgview/


This is a very visible host for all sorts of open-source software. There we have the version for our published yeast data.

IV) Informatics Tools and Websites:

RNA Seq is described above. Other technologies and websites are:

Jan O. Korbel, Alexander Eckehart Urban, Fabian Grubert, Jiang Du, Thomas E. Royce, Peter Starr, Guoneng Zhong, Beverly S. Emanuel, Sherman M. Weissman, M. Snyder & M. B. Gerstein (2007). Systematic prediction and validation of breakpoints associated with copy-number variants in the human genome. PNAS (2007) 104: 10110-5.

[TOOL] WEBSITE: http://tiling.mbb.yale.edu/BreakPtr/index.html

TE Royce, JS Rozowsky, MB Gerstein (2007) Assessing the need for sequence-based normalization in tiling microarray experiments. Bioinformatics 23: 988-97.

[TOOL] WEBSITE: http://tiling.gersteinlab.org/sequence_effects

H Yu, K Nguyen, T Royce, J Qian, K Nelson, M Snyder, M Gerstein (2007). Positional artifacts in microarrays: experimental verification and construction of COP, an automated detection tool. Nucleic Acids Res 35: e8.

[TOOL] WEBSITE: http://bioinfo.mbb.yale.edu/ExpressYourself (COP submodule)

JE Karro, Y Yan, D Zheng, Z Zhang, N Carriero, P Cayting, P Harrrison, M Gerstein (2007). Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation. Nucleic Acids Res 35: D55-60.

[TOOL] WEBSITE: http://pseudogene.org


KY Yip, P Patel, PM Kim, DM Engelman, D McDermott, M Gerstein (2008). An integrated system for studying residue coevolution in proteins. Bioinformatics 24: 290-2.

[TOOL] WEBSITE: http://coevolution.gersteinlab.org/coevolution


TE Royce, NJ Carriero, MB Gerstein (2007). An efficient pseudomedian filter for tiling microrrays. BMC Bioinformatics 8: 186.

[TOOL] WEBSITE: http://tiling.gersteinlab.org/pseudomedian


Modeling ChIP sequencing in silico with applications. ZD Zhang, J Rozowsky, M Snyder, J Chang, M Gerstein (2008) PLoS Comput Biol 4: e1000158.

[TOOL] WEBSITE: http://www.gersteinlab.org/proj/chip-seq-simu