3.6. read_utils.py - utilities that manipulate bam and fastq filesΒΆ
Utilities for working with sequence reads, such as converting formats and fixing mate pairs.
usage: read_utils.py subcommand
- Sub-commands:
- purge_unmated
Use mergeShuffledFastqSeqs to purge unmated reads, and put corresponding reads in the same order. Corresponding sequences must have sequence identifiers of the form SEQID/1 and SEQID/2.
usage: read_utils.py purge_unmated [-h] [--regex REGEX] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inFastq1 inFastq2 outFastq1 outFastq2
- Positional arguments:
inFastq1 Input fastq file; 1st end of paired-end reads. inFastq2 Input fastq file; 2nd end of paired-end reads. outFastq1 Output fastq file; 1st end of paired-end reads. outFastq2 Output fastq file; 2nd end of paired-end reads. - Options:
--regex=^@(\S+)/[1|2]$ Perl regular expression to parse paired read IDs (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- fastq_to_fasta
Convert from fastq format to fasta format. Warning: output reads might be split onto multiple lines.
usage: read_utils.py fastq_to_fasta [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inFastq outFasta
- Positional arguments:
inFastq Input fastq file. outFasta Output fasta file. - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- index_fasta_samtools
Index a reference genome for Samtools.
usage: read_utils.py index_fasta_samtools [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] inFasta
- Positional arguments:
inFasta Reference genome, FASTA format. - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit
- index_fasta_picard
Create an index file for a reference genome suitable for Picard/GATK.
usage: read_utils.py index_fasta_picard [-h] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inFasta
- Positional arguments:
inFasta Input reference genome, FASTA format. - Options:
--JVMmemory=512m JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s CreateSequenceDictionary, OPTIONNAME=value ... --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- mkdup_picard
Mark or remove duplicate reads from BAM file.
usage: read_utils.py mkdup_picard [-h] [--outMetrics OUTMETRICS] [--remove] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBams [inBams ...] outBam
- Positional arguments:
inBams Input reads, BAM format. outBam Output reads, BAM format. - Options:
--outMetrics Output metrics file. Default is to dump to a temp file. --remove=False Instead of marking duplicates, remove them entirely (default: %(default)s) --JVMmemory=2g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s MarkDuplicates, OPTIONNAME=value ... --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- revert_bam_picard
Revert BAM to raw reads
usage: read_utils.py revert_bam_picard [-h] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBam outBam
- Positional arguments:
inBam Input reads, BAM format. outBam Output reads, BAM format. - Options:
--JVMmemory=2g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s RevertSam, OPTIONNAME=value ... --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- picard
Generic Picard runner.
usage: read_utils.py picard [-h] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] command
- Positional arguments:
command picard command - Options:
--JVMmemory=2g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard, OPTIONNAME=value ... --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- sort_bam
Sort BAM file
usage: read_utils.py sort_bam [-h] [--index] [--md5] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBam outBam {unsorted,queryname,coordinate}
- Positional arguments:
inBam Input bam file. outBam Output bam file, sorted. sortOrder How to sort the reads. [default: %(default)s]
Possible choices: unsorted, queryname, coordinate
- Options:
--index=False Index outBam (default: %(default)s) --md5=False MD5 checksum outBam (default: %(default)s) --JVMmemory=2g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s SortSam, OPTIONNAME=value ... --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- merge_bams
Merge multiple BAMs into one
usage: read_utils.py merge_bams [-h] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBams [inBams ...] outBam
- Positional arguments:
inBams Input bam files. outBam Output bam file. - Options:
--JVMmemory=2g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s MergeSamFiles, OPTIONNAME=value ... --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- filter_bam
Filter BAM file by read name
usage: read_utils.py filter_bam [-h] [--exclude] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBam readList outBam
- Positional arguments:
inBam Input bam file. readList Input file of read IDs. outBam Output bam file. - Options:
--exclude=False If specified, readList is a list of reads to remove from input. Default behavior is to treat readList as an inclusion list (all unnamed reads are removed). --JVMmemory=4g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s FilterSamReads, OPTIONNAME=value ... --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- fastq_to_bam
Convert a pair of fastq paired-end read files and optional text header to a single bam file.
usage: read_utils.py fastq_to_bam [-h] (--sampleName SAMPLENAME | --header HEADER) [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inFastq1 inFastq2 outBam
- Positional arguments:
inFastq1 Input fastq file; 1st end of paired-end reads. inFastq2 Input fastq file; 2nd end of paired-end reads. outBam Output bam file. - Options:
--sampleName Sample name to insert into the read group header. --header Optional text file containing header. --JVMmemory=2g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s FastqToSam, OPTIONNAME=value ... Note that header-related options will be overwritten by HEADER if present. --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- split_bam
Split BAM file equally into several output BAM files.
usage: read_utils.py split_bam [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBam outBams [outBams ...]
- Positional arguments:
inBam Input BAM file. outBams Output BAM files - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- reheader_bam
Copy a BAM file (inBam to outBam) while renaming elements of the BAM header. The mapping file specifies which (key, old value, new value) mappings. For example: LB lib1 lib_one SM sample1 Sample_1 SM sample2 Sample_2 SM sample3 Sample_3 CN broad BI
usage: read_utils.py reheader_bam [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBam rgMap outBam
- Positional arguments:
inBam Input reads, BAM format. rgMap Tabular file containing three columns: field, old, new. outBam Output reads, BAM format. - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- reheader_bams
Copy BAM files while renaming elements of the BAM header. The mapping file specifies which (key, old value, new value) mappings. For example: LB lib1 lib_one SM sample1 Sample_1 SM sample2 Sample_2 SM sample3 Sample_3 CN broad BI FN in1.bam out1.bam FN in2.bam out2.bam
usage: read_utils.py reheader_bams [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] rgMap
- Positional arguments:
rgMap Tabular file containing three columns: field, old, new. - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- rmdup_cdhit_bam
Remove duplicate reads from BAM file using cd-hit-dup.
usage: read_utils.py rmdup_cdhit_bam [-h] [--JVMmemory JVM_MEMORY] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBam outBam
- Positional arguments:
inBam Input reads, BAM format. outBam Output reads, BAM format. - Options:
--JVMmemory=4g JVM virtual memory size (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- rmdup_mvicuna_bam
Remove duplicate reads from BAM file using M-Vicuna. The primary advantage to this approach over Picard’s MarkDuplicates tool is that Picard requires that input reads are aligned to a reference, and M-Vicuna can operate on unaligned reads.
usage: read_utils.py rmdup_mvicuna_bam [-h] [--JVMmemory JVMMEMORY] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBam outBam
- Positional arguments:
inBam Input reads, BAM format. outBam Output reads, BAM format. - Options:
--JVMmemory=4g JVM virtual memory size (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- rmdup_prinseq_fastq
Run prinseq-lite’s duplicate removal operation on paired-end reads. Also removes reads with more than one N.
usage: read_utils.py rmdup_prinseq_fastq [-h] [--includeUnmated] [--unpairedOutFastq1 UNPAIREDOUTFASTQ1] [--unpairedOutFastq2 UNPAIREDOUTFASTQ2] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inFastq1 inFastq2 outFastq1 outFastq2
- Positional arguments:
inFastq1 Input fastq file; 1st end of paired-end reads. inFastq2 Input fastq file; 2nd end of paired-end reads. outFastq1 Output fastq file; 1st end of paired-end reads. outFastq2 Output fastq file; 2nd end of paired-end reads. - Options:
--includeUnmated=False Include unmated reads in the main output fastq files (default: %(default)s) --unpairedOutFastq1 File name of output unpaired reads from 1st end of paired-end reads (independent of –includeUnmated) --unpairedOutFastq2 File name of output unpaired reads from 2nd end of paired-end reads (independent of –includeUnmated) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- filter_bam_mapped_only
Samtools to reduce a BAM file to only reads that are aligned (-F 4) with a non-zero mapping quality (-q 1) and are not marked as a PCR/optical duplicate (-F 1024).
usage: read_utils.py filter_bam_mapped_only [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBam outBam
- Positional arguments:
inBam Input aligned reads, BAM format. outBam Output sorted indexed reads, filtered to aligned-only, BAM format. - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- novoalign
Align reads with Novoalign. Sort and index BAM output.
usage: read_utils.py novoalign [-h] [--options OPTIONS] [--min_qual MIN_QUAL] [--JVMmemory JVMMEMORY] [--NOVOALIGN_LICENSE_PATH NOVOALIGN_LICENSE_PATH] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBam refFasta outBam
- Positional arguments:
inBam Input reads, BAM format. refFasta Reference genome, FASTA format, pre-indexed by Novoindex. outBam Output reads, BAM format (aligned). - Options:
--options=-r Random Novoalign options (default: %(default)s) --min_qual=0 Filter outBam to minimum mapping quality (default: %(default)s) --JVMmemory=2g JVM virtual memory size (default: %(default)s) --NOVOALIGN_LICENSE_PATH A path to the novoalign.lic file. This overrides the NOVOALIGN_LICENSE_PATH environment variable. (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- novoindex
usage: read_utils.py novoindex [-h] [--NOVOALIGN_LICENSE_PATH NOVOALIGN_LICENSE_PATH] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] refFasta
- Positional arguments:
refFasta Reference genome, FASTA format. - Options:
--NOVOALIGN_LICENSE_PATH A path to the novoalign.lic file. This overrides the NOVOALIGN_LICENSE_PATH environment variable. (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit
- gatk_ug
Call genotypes using the GATK UnifiedGenotyper.
usage: read_utils.py gatk_ug [-h] [--options OPTIONS] [--JVMmemory JVMMEMORY] [--GATK_PATH GATK_PATH] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBam refFasta outVcf
- Positional arguments:
inBam Input reads, BAM format. refFasta Reference genome, FASTA format, pre-indexed by Picard. outVcf Output calls in VCF format. If this filename ends with .gz, GATK will BGZIP compress the output and produce a Tabix index file as well. - Options:
--options=--min_base_quality_score 15 -ploidy 4 UnifiedGenotyper options (default: %(default)s) --JVMmemory=2g JVM virtual memory size (default: %(default)s) --GATK_PATH A path containing the GATK jar file. This overrides the GATK_ENV environment variable or the GATK conda package. (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- gatk_realign
Local realignment of BAM files with GATK IndelRealigner.
usage: read_utils.py gatk_realign [-h] [--JVMmemory JVMMEMORY] [--GATK_PATH GATK_PATH] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] [--threads THREADS] inBam refFasta outBam
- Positional arguments:
inBam Input reads, BAM format, aligned to refFasta. refFasta Reference genome, FASTA format, pre-indexed by Picard. outBam Realigned reads. - Options:
--JVMmemory=2g JVM virtual memory size (default: %(default)s) --GATK_PATH A path containing the GATK jar file. This overrides the GATK_ENV environment variable or the GATK conda package. (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure. --threads=1 Number of threads (default: %(default)s)
- align_and_fix
Take reads, align to reference with Novoalign, optionally mark duplicates with Picard, realign indels with GATK, and optionally filters final file to mapped/non-dupe reads.
usage: read_utils.py align_and_fix [-h] [--outBamAll OUTBAMALL] [--outBamFiltered OUTBAMFILTERED] [--aligner_options ALIGNER_OPTIONS] [--aligner {novoalign,bwa}] [--JVMmemory JVMMEMORY] [--threads THREADS] [--skipMarkDupes] [--GATK_PATH GATK_PATH] [--NOVOALIGN_LICENSE_PATH NOVOALIGN_LICENSE_PATH] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBam refFasta
- Positional arguments:
inBam Input unaligned reads, BAM format. refFasta Reference genome, FASTA format; will be indexed by Picard and Novoalign. - Options:
--outBamAll Aligned, sorted, and indexed reads. Unmapped and duplicate reads are retained. By default, duplicate reads are marked. If “–skipMarkDupes” is specified duplicate reads are included in outout without being marked. --outBamFiltered Aligned, sorted, and indexed reads. Unmapped reads are removed from this file, as well as any marked duplicate reads. Note that if “–skipMarkDupes” is provided, duplicates will be not be marked and will be included in the output. --aligner_options aligner options (default for novoalign: “-r Random”, bwa: “-T 30” --aligner=novoalign aligner (default: %(default)s)
Possible choices: novoalign, bwa
--JVMmemory=4g JVM virtual memory size (default: %(default)s) --threads=1 Number of threads (default: %(default)s) --skipMarkDupes=False If specified, duplicate reads will not be marked in the resulting output file. --GATK_PATH A path containing the GATK jar file. This overrides the GATK_ENV environment variable or the GATK conda package. (default: %(default)s) --NOVOALIGN_LICENSE_PATH A path to the novoalign.lic file. This overrides the NOVOALIGN_LICENSE_PATH environment variable. (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- bwamem_idxstats
Take reads, align to reference with BWA-MEM and perform samtools idxstats.
usage: read_utils.py bwamem_idxstats [-h] [--outBam OUTBAM] [--outStats OUTSTATS] [--minScoreToFilter MIN_SCORE_TO_FILTER] [--alignerOptions ALIGNER_OPTIONS] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBam refFasta
- Positional arguments:
inBam Input unaligned reads, BAM format. refFasta Reference genome, FASTA format, pre-indexed by Picard and Novoalign. - Options:
--outBam Output aligned, indexed BAM file --outStats Output idxstats file --minScoreToFilter Filter bwa alignments using this value as the minimum allowed alignment score. Specifically, sum the alignment scores across all alignments for each query (including reads in a pair, supplementary and secondary alignments) and then only include, in the output, queries whose summed alignment score is at least this value. This is only applied when the aligner is ‘bwa’. The filtering on a summed alignment score is sensible for reads in a pair and supplementary alignments, but may not be reasonable if bwa outputs secondary alignments (i.e., if ‘-a’ is in the aligner options). (default: not set - i.e., do not filter bwa’s output) --alignerOptions bwa options (default: bwa defaults) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.