3.5. read_utils.py - utilities that manipulate bam and fastq files¶

Utilities for working with sequence reads, such as converting formats and fixing mate pairs.

usage: read_utils.py subcommand

Sub-commands:

purge_unmated

Undocumented

Use mergeShuffledFastqSeqs to purge unmated reads, and put corresponding reads in the same order. Corresponding sequences must have sequence identifiers of the form SEQID/1 and SEQID/2.

usage: read_utils.py purge_unmated [-h] [--regex REGEX]
                                   [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                   [--version] [--tmp_dir TMP_DIR]
                                   [--tmp_dirKeep]
                                   inFastq1 inFastq2 outFastq1 outFastq2

Positional arguments:

`inFastq1`	Input fastq file; 1st end of paired-end reads.
`inFastq2`	Input fastq file; 2nd end of paired-end reads.
`outFastq1`	Output fastq file; 1st end of paired-end reads.
`outFastq2`	Output fastq file; 2nd end of paired-end reads.

Options:

`--regex=^@(\S+)/[1\|2]$`
	Perl regular expression to parse paired read IDs (default: %(default)s)
`--loglevel=DEBUG`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmp_dir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmp_dirKeep=False`
	Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

fastq_to_fasta

Undocumented

Convert from fastq format to fasta format. Warning: output reads might be split onto multiple lines.

usage: read_utils.py fastq_to_fasta [-h]
                                    [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                    [--version] [--tmp_dir TMP_DIR]
                                    [--tmp_dirKeep]
                                    inFastq outFasta

Positional arguments:

`inFastq`	Input fastq file.
`outFasta`	Output fasta file.

Options:

`--loglevel=DEBUG`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmp_dir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmp_dirKeep=False`
	Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

index_fasta_samtools

Undocumented

Index a reference genome for Samtools.

usage: read_utils.py index_fasta_samtools [-h]
                                          [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                          [--version]
                                          inFasta

Positional arguments:

inFasta

Reference genome, FASTA format.

Options:

--loglevel=DEBUG

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V

show program’s version number and exit

index_fasta_picard

Undocumented

Create an index file for a reference genome suitable for Picard/GATK.

usage: read_utils.py index_fasta_picard [-h] [--JVMmemory JVMMEMORY]
                                        [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]]
                                        [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                        [--version] [--tmp_dir TMP_DIR]
                                        [--tmp_dirKeep]
                                        inFasta

Positional arguments:

inFasta

Input reference genome, FASTA format.

Options:

`--JVMmemory=512m`
	JVM virtual memory size (default: %(default)s)
`--picardOptions=[]`
	Optional arguments to Picard’s CreateSequenceDictionary, OPTIONNAME=value ...
`--loglevel=DEBUG`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmp_dir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmp_dirKeep=False`
	Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

mkdup_picard

Undocumented

Mark or remove duplicate reads from BAM file.

usage: read_utils.py mkdup_picard [-h] [--outMetrics OUTMETRICS] [--remove]
                                  [--JVMmemory JVMMEMORY]
                                  [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]]
                                  [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                  [--version] [--tmp_dir TMP_DIR]
                                  [--tmp_dirKeep]
                                  inBams [inBams ...] outBam

Positional arguments:

`inBams`	Input reads, BAM format.
`outBam`	Output reads, BAM format.

Options:

`--outMetrics`	Output metrics file. Default is to dump to a temp file.
`--remove=False`	Instead of marking duplicates, remove them entirely (default: %(default)s)
`--JVMmemory=2g`	JVM virtual memory size (default: %(default)s)
`--picardOptions=[]`
	Optional arguments to Picard’s MarkDuplicates, OPTIONNAME=value ...
`--loglevel=DEBUG`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmp_dir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmp_dirKeep=False`
	Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

revert_bam_picard

Undocumented

Revert BAM to raw reads

usage: read_utils.py revert_bam_picard [-h] [--JVMmemory JVMMEMORY]
                                       [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]]
                                       [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                       [--version] [--tmp_dir TMP_DIR]
                                       [--tmp_dirKeep]
                                       inBam outBam

Positional arguments:

`inBam`	Input reads, BAM format.
`outBam`	Output reads, BAM format.

Options:

`--JVMmemory=2g`	JVM virtual memory size (default: %(default)s)
`--picardOptions=[]`
	Optional arguments to Picard’s RevertSam, OPTIONNAME=value ...
`--loglevel=DEBUG`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmp_dir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmp_dirKeep=False`
	Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

picard

Undocumented

Generic Picard runner.

usage: read_utils.py picard [-h] [--JVMmemory JVMMEMORY]
                            [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]]
                            [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                            [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
                            command

Positional arguments:

command

picard command

Options:

`--JVMmemory=2g`	JVM virtual memory size (default: %(default)s)
`--picardOptions=[]`
	Optional arguments to Picard, OPTIONNAME=value ...
`--loglevel=DEBUG`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmp_dir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmp_dirKeep=False`
	Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

sort_bam

Undocumented

Sort BAM file

usage: read_utils.py sort_bam [-h] [--index] [--md5] [--JVMmemory JVMMEMORY]
                              [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]]
                              [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                              [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
                              inBam outBam {unsorted,queryname,coordinate}

Positional arguments:

inBam

Input bam file.

outBam

Output bam file, sorted.

sortOrder

How to sort the reads. [default: %(default)s]

Possible choices: unsorted, queryname, coordinate

Options:

`--index=False`	Index outBam (default: %(default)s)
`--md5=False`	MD5 checksum outBam (default: %(default)s)
`--JVMmemory=2g`	JVM virtual memory size (default: %(default)s)
`--picardOptions=[]`
	Optional arguments to Picard’s SortSam, OPTIONNAME=value ...
`--loglevel=DEBUG`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmp_dir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmp_dirKeep=False`
	Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

merge_bams

Undocumented

Merge multiple BAMs into one

usage: read_utils.py merge_bams [-h] [--JVMmemory JVMMEMORY]
                                [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]]
                                [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                [--version] [--tmp_dir TMP_DIR]
                                [--tmp_dirKeep]
                                inBams [inBams ...] outBam

Positional arguments:

`inBams`	Input bam files.
`outBam`	Output bam file.

Options:

`--JVMmemory=2g`	JVM virtual memory size (default: %(default)s)
`--picardOptions=[]`
	Optional arguments to Picard’s MergeSamFiles, OPTIONNAME=value ...
`--loglevel=DEBUG`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmp_dir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmp_dirKeep=False`
	Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

filter_bam

Undocumented

Filter BAM file by read name

usage: read_utils.py filter_bam [-h] [--exclude] [--JVMmemory JVMMEMORY]
                                [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]]
                                [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                [--version] [--tmp_dir TMP_DIR]
                                [--tmp_dirKeep]
                                inBam readList outBam

Positional arguments:

`inBam`	Input bam file.
`readList`	Input file of read IDs.
`outBam`	Output bam file.

Options:

`--exclude=False`
	If specified, readList is a list of reads to remove from input. Default behavior is to treat readList as an inclusion list (all unnamed reads are removed).
`--JVMmemory=4g`	JVM virtual memory size (default: %(default)s)
`--picardOptions=[]`
	Optional arguments to Picard’s FilterSamReads, OPTIONNAME=value ...
`--loglevel=DEBUG`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmp_dir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmp_dirKeep=False`
	Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

bam_to_fastq

Undocumented

Convert a bam file to a pair of fastq paired-end read files and optional text header.

usage: read_utils.py bam_to_fastq [-h] [--outHeader OUTHEADER]
                                  [--JVMmemory JVMMEMORY]
                                  [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]]
                                  [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                  [--version] [--tmp_dir TMP_DIR]
                                  [--tmp_dirKeep]
                                  inBam outFastq1 outFastq2

Positional arguments:

`inBam`	Input bam file.
`outFastq1`	Output fastq file; 1st end of paired-end reads.
`outFastq2`	Output fastq file; 2nd end of paired-end reads.

Options:

`--outHeader`	Optional text file name that will receive bam header.
`--JVMmemory=2g`	JVM virtual memory size (default: %(default)s)
`--picardOptions=[]`
	Optional arguments to Picard’s SamToFastq, OPTIONNAME=value ...
`--loglevel=DEBUG`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmp_dir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmp_dirKeep=False`
	Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

fastq_to_bam

Undocumented

Convert a pair of fastq paired-end read files and optional text header to a single bam file.

usage: read_utils.py fastq_to_bam [-h]
                                  (--sampleName SAMPLENAME | --header HEADER)
                                  [--JVMmemory JVMMEMORY]
                                  [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]]
                                  [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                  [--version] [--tmp_dir TMP_DIR]
                                  [--tmp_dirKeep]
                                  inFastq1 inFastq2 outBam

Positional arguments:

`inFastq1`	Input fastq file; 1st end of paired-end reads.
`inFastq2`	Input fastq file; 2nd end of paired-end reads.
`outBam`	Output bam file.

Options:

`--sampleName`	Sample name to insert into the read group header.
`--header`	Optional text file containing header.
`--JVMmemory=2g`	JVM virtual memory size (default: %(default)s)
`--picardOptions=[]`
	Optional arguments to Picard’s FastqToSam, OPTIONNAME=value ... Note that header-related options will be overwritten by HEADER if present.
`--loglevel=DEBUG`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmp_dir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmp_dirKeep=False`
	Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

split_reads

Undocumented

Split fasta or fastq file into chunks of maxReads reads or into numChunks chunks named outPrefix01, outPrefix02, etc. If both maxReads and numChunks are None, use defaultMaxReads. The number of characters in file names after outPrefix is indexLen; if not specified, use defaultIndexLen.

usage: read_utils.py split_reads [-h]
                                 [--maxReads MAXREADS | --numChunks NUMCHUNKS]
                                 [--indexLen INDEXLEN]
                                 [--format {fastq,fasta}]
                                 [--outSuffix OUTSUFFIX]
                                 inFileName outPrefix

Positional arguments:

`inFileName`	Input fastq or fasta file.
`outPrefix`	Output files will be named ${outPrefix}01${outSuffix}, ${outPrefix}02${outSuffix}...

Options:

`--maxReads`	Maximum number of reads per chunk (default 1000 if neither maxReads nor numChunks is specified).
`--numChunks`	Number of output files, if maxReads is not specified.
`--indexLen=2`	Number of characters to append to outputPrefix for each output file (default %(default)s). Number of files must not exceed 10^INDEXLEN.
`--format=fastq`	Input fastq or fasta file (default: %(default)s). Possible choices: fastq, fasta
`--outSuffix=`	Output filename suffix (e.g. .fastq or .fastq.gz). A suffix ending in .gz will cause the output file to be gzip compressed. Default is no suffix.

split_bam

Undocumented

Split BAM file equally into several output BAM files.

usage: read_utils.py split_bam [-h]
                               [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                               [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
                               inBam outBams [outBams ...]

Positional arguments:

`inBam`	Input BAM file.
`outBams`	Output BAM files

Options:

`--loglevel=DEBUG`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmp_dir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmp_dirKeep=False`
	Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

reheader_bam

Undocumented

Copy a BAM file (inBam to outBam) while renaming elements of the BAM header. The mapping file specifies which (key, old value, new value) mappings. For example: LB lib1 lib_one SM sample1 Sample_1 SM sample2 Sample_2 SM sample3 Sample_3 CN broad BI

usage: read_utils.py reheader_bam [-h]
                                  [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                  [--version] [--tmp_dir TMP_DIR]
                                  [--tmp_dirKeep]
                                  inBam rgMap outBam

Positional arguments:

`inBam`	Input reads, BAM format.
`rgMap`	Tabular file containing three columns: field, old, new.
`outBam`	Output reads, BAM format.

Options:

`--loglevel=DEBUG`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmp_dir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmp_dirKeep=False`
	Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

reheader_bams

Undocumented

Copy BAM files while renaming elements of the BAM header. The mapping file specifies which (key, old value, new value) mappings. For example: LB lib1 lib_one SM sample1 Sample_1 SM sample2 Sample_2 SM sample3 Sample_3 CN broad BI FN in1.bam out1.bam FN in2.bam out2.bam

usage: read_utils.py reheader_bams [-h]
                                   [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                   [--version] [--tmp_dir TMP_DIR]
                                   [--tmp_dirKeep]
                                   rgMap

Positional arguments:

rgMap

Tabular file containing three columns: field, old, new.

Options:

`--loglevel=DEBUG`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmp_dir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmp_dirKeep=False`
	Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

rmdup_mvicuna_bam

Undocumented

Remove duplicate reads from BAM file using M-Vicuna. The primary advantage to this approach over Picard’s MarkDuplicates tool is that Picard requires that input reads are aligned to a reference, and M-Vicuna can operate on unaligned reads.

usage: read_utils.py rmdup_mvicuna_bam [-h] [--JVMmemory JVMMEMORY]
                                       [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                       [--version] [--tmp_dir TMP_DIR]
                                       [--tmp_dirKeep]
                                       inBam outBam

Positional arguments:

`inBam`	Input reads, BAM format.
`outBam`	Output reads, BAM format.

Options:

`--JVMmemory=4g`	JVM virtual memory size (default: %(default)s)
`--loglevel=DEBUG`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmp_dir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmp_dirKeep=False`
	Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

dup_remove_mvicuna

Undocumented

Run mvicuna’s duplicate removal operation on paired-end reads.

usage: read_utils.py dup_remove_mvicuna [-h]
                                        [--unpairedOutFastq UNPAIREDOUTFASTQ]
                                        [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                        [--version] [--tmp_dir TMP_DIR]
                                        [--tmp_dirKeep]
                                        inFastq1 inFastq2 pairedOutFastq1
                                        pairedOutFastq2

Positional arguments:

`inFastq1`	Input fastq file; 1st end of paired-end reads.
`inFastq2`	Input fastq file; 2nd end of paired-end reads.
`pairedOutFastq1`
	Output fastq file; 1st end of paired-end reads.
`pairedOutFastq2`
	Output fastq file; 2nd end of paired-end reads.

Options:

`--unpairedOutFastq`
	File name of output unpaired reads
`--loglevel=DEBUG`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmp_dir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmp_dirKeep=False`
	Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

rmdup_prinseq_fastq

Undocumented

Run prinseq-lite’s duplicate removal operation on paired-end reads. Also removes reads with more than one N.

usage: read_utils.py rmdup_prinseq_fastq [-h]
                                         [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                         [--version] [--tmp_dir TMP_DIR]
                                         [--tmp_dirKeep]
                                         inFastq1 inFastq2 outFastq1 outFastq2

Positional arguments:

`inFastq1`	Input fastq file; 1st end of paired-end reads.
`inFastq2`	Input fastq file; 2nd end of paired-end reads.
`outFastq1`	Output fastq file; 1st end of paired-end reads.
`outFastq2`	Output fastq file; 2nd end of paired-end reads.

Options:

`--loglevel=DEBUG`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmp_dir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmp_dirKeep=False`
	Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

filter_bam_mapped_only

Undocumented

Samtools to reduce a BAM file to only reads that are aligned (-F 4) with a non-zero mapping quality (-q 1) and are not marked as a PCR/optical duplicate (-F 1024).

usage: read_utils.py filter_bam_mapped_only [-h]
                                            [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                            [--version] [--tmp_dir TMP_DIR]
                                            [--tmp_dirKeep]
                                            inBam outBam

Positional arguments:

`inBam`	Input aligned reads, BAM format.
`outBam`	Output sorted indexed reads, filtered to aligned-only, BAM format.

Options:

`--loglevel=DEBUG`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmp_dir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmp_dirKeep=False`
	Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

novoalign

Undocumented

Align reads with Novoalign. Sort and index BAM output.

usage: read_utils.py novoalign [-h] [--options OPTIONS] [--min_qual MIN_QUAL]
                               [--JVMmemory JVMMEMORY]
                               [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                               [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
                               inBam refFasta outBam

Positional arguments:

`inBam`	Input reads, BAM format.
`refFasta`	Reference genome, FASTA format, pre-indexed by Novoindex.
`outBam`	Output reads, BAM format (aligned).

Options:

`--options=-r Random`
	Novoalign options (default: %(default)s)
`--min_qual=0`	Filter outBam to minimum mapping quality (default: %(default)s)
`--JVMmemory=2g`	JVM virtual memory size (default: %(default)s)
`--loglevel=DEBUG`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmp_dir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmp_dirKeep=False`
	Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

novoindex

Undocumented

Index a FASTA file (reference genome) for use with Novoalign. The input file name must end in ”.fasta”. This will create a new ”.nix” file in the same directory. If it already exists, it will be deleted and regenerated.

usage: read_utils.py novoindex [-h]
                               [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                               [--version]
                               refFasta

Positional arguments:

refFasta

Reference genome, FASTA format.

Options:

--loglevel=DEBUG

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V

show program’s version number and exit

gatk_ug

Undocumented

Call genotypes using the GATK UnifiedGenotyper.

usage: read_utils.py gatk_ug [-h] [--options OPTIONS] [--JVMmemory JVMMEMORY]
                             [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                             [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
                             inBam refFasta outVcf

Positional arguments:

`inBam`	Input reads, BAM format.
`refFasta`	Reference genome, FASTA format, pre-indexed by Picard.
`outVcf`	Output calls in VCF format. If this filename ends with .gz, GATK will BGZIP compress the output and produce a Tabix index file as well.

Options:

`--options=--min_base_quality_score 15 -ploidy 4`
	UnifiedGenotyper options (default: %(default)s)
`--JVMmemory=2g`	JVM virtual memory size (default: %(default)s)
`--loglevel=DEBUG`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmp_dir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmp_dirKeep=False`
	Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

gatk_realign

Undocumented

Local realignment of BAM files with GATK IndelRealigner.

usage: read_utils.py gatk_realign [-h] [--JVMmemory JVMMEMORY]
                                  [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                  [--version] [--tmp_dir TMP_DIR]
                                  [--tmp_dirKeep] [--threads THREADS]
                                  inBam refFasta outBam

Positional arguments:

`inBam`	Input reads, BAM format, aligned to refFasta.
`refFasta`	Reference genome, FASTA format, pre-indexed by Picard.
`outBam`	Realigned reads.

Options:

`--JVMmemory=2g`	JVM virtual memory size (default: %(default)s)
`--loglevel=DEBUG`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmp_dir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmp_dirKeep=False`
	Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
`--threads=1`	Number of threads (default: %(default)s)

align_and_fix

Undocumented

Take reads, align to reference with Novoalign, mark duplicates with Picard, realign indels with GATK, and optionally filter final file to mapped/non-dupe reads.

usage: read_utils.py align_and_fix [-h] [--outBamAll OUTBAMALL]
                                   [--outBamFiltered OUTBAMFILTERED]
                                   [--novoalign_options NOVOALIGN_OPTIONS]
                                   [--JVMmemory JVMMEMORY] [--threads THREADS]
                                   [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                   [--version] [--tmp_dir TMP_DIR]
                                   [--tmp_dirKeep]
                                   inBam refFasta

Positional arguments:

`inBam`	Input unaligned reads, BAM format.
`refFasta`	Reference genome, FASTA format, pre-indexed by Picard and Novoalign.

Options:

`--outBamAll`	Aligned, sorted, and indexed reads. Unmapped reads are retained and duplicate reads are marked, not removed.
`--outBamFiltered`
	Aligned, sorted, and indexed reads. Unmapped reads and duplicate reads are removed from this file.
`--novoalign_options=-r Random`
	Novoalign options (default: %(default)s)
`--JVMmemory=4g`	JVM virtual memory size (default: %(default)s)
`--threads=1`	Number of threads (default: %(default)s)
`--loglevel=DEBUG`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmp_dir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmp_dirKeep=False`
	Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

align_and_count_hits

Undocumented

Take reads, align to reference with Novoalign and return aligned read counts for each reference sequence.

usage: read_utils.py align_and_count_hits [-h] [--includeZeros]
                                          [--JVMmemory JVMMEMORY]
                                          [--threads THREADS]
                                          [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                          [--version] [--tmp_dir TMP_DIR]
                                          [--tmp_dirKeep]
                                          inBam refFasta outCounts

Positional arguments:

`inBam`	Input unaligned reads, BAM format.
`refFasta`	Reference genome, FASTA format, pre-indexed by Picard and Novoalign.
`outCounts`	Output counts file

Options:

`--includeZeros=False`
	Output lines with no hits (default: %(default)s)
`--JVMmemory=4g`	JVM virtual memory size (default: %(default)s)
`--threads=8`	Number of threads (default: %(default)s)
`--loglevel=DEBUG`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmp_dir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmp_dirKeep=False`
	Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.