Integrating sequencing technologies in personal genomics: optimal low cost reconstruction of structural variants
Jiang Du, Robert D. Bjornson, Zhengdong D. Zhang, Yong Kong, Michael Snyder and Mark B. Gerstein (2009) PLoS Comput Biol, 5(7), e1000432.

As an important goal of personal genomics is to reduce the total cost of individual re-sequencing to an affordable point, it is worthwhile to consider the best way of integrating many different technologies. Here, we attempt to build a simulation toolbox that will help us optimize the combination of different technologies to perform comparative genome re-sequencing, especially in reconstructing large structural variants (SVs).

Re-sequencing Simulation Toolbox (ReSeqSim)

In order to be adaptive to the fast development of the experimental technologies in personal genomics, our simulation framework is modularized in such a way that it is capable of incorporating new technologies as well as adjusting the parameters for the existing ones. Also, this approach relies on the general concept of mapability data, and can be easily applied to any representative SV for similar analysis. We envision that in the future, more experimental technologies can be incorporated into this sequencing/assembly simulation and the results of such simulations can provide informative guidelines for the actual experimental design to achieve optimal assembly performance at relatively low costs. With this purpose, we have made our simulation framework available as a general toolbox that can be either used directly or extended easily.


reseq_sim-v0.9.tgz (including case study data)


Represented in BNF:

reseq_sim CommonOptions SeqParams
CommonOptions ::= UnitCost OverlapThres InsertionInfoPath MM2RefPathPrefix MM2TargetPathPrefix
SeqParams ::= SeqParams SeqParam
SeqParam ::= SingleReads | PairedEndReads | null
SingleReads ::= SingleReadType ReadLength CostPerBp TotalCost
PairedEndReads ::= PairedEndReadType ReadLength CostPerBp InsertSizeMean InsertSizeSD CostPerBp TotalCost

Refer to ``make test" for an example.


Boost Software License 1.0

BSL permits the creation of derivative works for commercial or non-commercial use with no legal requirement to release your source code. Also, BSL does not require reproduction of copyright messages for object code redistribution, and if you distribute your own code along with some Boost code, the Boost license applies only to the Boost code (and modified versions thereof); you are free to license your own code under any terms you like.

Supplementary material

Supplementary documents
Supplementary figures

Link to simulation case study data
Mapability data for novel insertions (10Kb, 5Kb and 2Kb)

Jiang Du
Last modified: Thu May 10 2009