bam2pbam

We have 2 different version of bam2pbam protocol. (1) For the bam files that are aligned to reference genome, (2) for the bam files that are aligned to reference transcriptome.

Bam Files Aligned to Reference Genome

The main bash script to convert bam files to pbam is bam2pbam.sh
This code requires:
(1) python3 (required for compression algorithm to work)
(2) samtools
(3) createDiff.py
(4) compress.py

README

User arguments
(1) read length
(2) paired end or single end squencing (PE, SE)
(3) bam file
(4) name of the temporary file (will be deleted at the end)
(5) type of cleaning. options: all, mismatch, indel, split (what kind of variants we want to mask)

  
example usage: sh bam2pbam.sh 100 PE file.bam tmp all

Bam Files Aligned to Reference Transcriptome

The bash script to convert bam files that are created by mapping the RNA-Seq data to transcriptome to pbam is createptransbam.sh
This code requires:
(1) samtools
(2) ptransbam.sh

README

For now, necessary file locations are hardcoded in createptransbam.sh and ptransbam.sh
Please make sure to change bam and rsemfa variables in createptransbam.sh and ptransbam.sh

  
ref = < reference transcriptome >
bam = < name of the bam file >

We used the rsem created reference transcriptome from ENCODE data portal.