Supplementary web site for incRNA
Zhi John Lu*, Kevin Y. Yip*, Guilin Wang, Chong Shou, LaDeana W. Hillier, Ekta Khurana, Ashish Agarwal, Raymond Auerbach, Joel Rozowsky, Chao Cheng, Masaomi Kato, David M. Miller, Frank Slack, Michael Snyder, Robert H. Waterston, Valerie Reinke and Mark Gerstein,
Prediction and Characterization of Non-coding RNAs in C. elegans by Integrating Conservation, Secondary Structure and High Throughput Sequencing and Array Data
Genome Research (Published in Advance December 22, 2010, doi:10.1101/gr.110189.110)
|
Datasets
|
Prediction results (Supplementary Files)
Training: cross-validation set, prediction: independent validation set
Training: gold-standard set, prediction: full dataset
Supplementary Files (Master tables of the high-confidence and medium-confidence ncRNA candidates)
Supplementary File 1 (Full set): Prediction scores, structural features, sequence features and expression values for candidate ncRNA fragments/bins (10,994 bins and 7,237 fragments, in Microsoft Excel format)
Supplementary File 2 (Intergenic portion): Intergenic candidate ncRNA loci/fragments targeted by POL II and transcription factors across different developing stages (1,678 bins and 1,223 loci, in Microsoft Excel format)
|
Prediction software (Java source code and compiled classes)
This software performs machine learning based on computed features of genomic regions using a set of supervised learning methods, and picks the best method according to an unbiased procedure. It does not generate the features, as the generation process involves third-party software for tasks such as sequence alignment and RNA structure prediction. The file format of datasets and a script for running the program can be found in the file readme.txt inside the packages.
|
|