GATK4 mitochondria SNPs and INDELs pipeline

This is an HPC workflow to call SNPs and INDELs in the mitochondrial genome from matched tumor-normal whole genome sequencing (WGS) data from 25 childhood acute lymphoblastic leukemia cases. This workflow is largely based on GATK’s best practice WDL. Check out GATK’s doc for more details.

Step 0: Download GATK’s public reference databases

Step 1: Subset a whole genome bam to just Mitochondria reads and remove alignment information

Step 2: Align unmapped bam, mark duplicates, collect coverage information, and call SNPs and INDELs from normal bams

Step 3: Call SNPs and INDELs from tumor bams

Step 4: Estimate levels of contamination in mitochondria