3.5. read_utils.py - utilities that manipulate bam and fastq filesΒΆ
Utilities for working with sequence reads, such as converting formats and fixing mate pairs.
usage: read_utils.py subcommand
- Sub-commands:
- purge_unmated
Undocumented
Use mergeShuffledFastqSeqs to purge unmated reads, and put corresponding reads in the same order. Corresponding sequences must have sequence identifiers of the form SEQID/1 and SEQID/2.
usage: read_utils.py purge_unmated [-h] [--regex REGEX] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inFastq1 inFastq2 outFastq1 outFastq2
- Positional arguments:
inFastq1 Input fastq file; 1st end of paired-end reads. inFastq2 Input fastq file; 2nd end of paired-end reads. outFastq1 Output fastq file; 1st end of paired-end reads. outFastq2 Output fastq file; 2nd end of paired-end reads. - Options:
--regex=^@(\S+)/[1|2]$ Perl regular expression to parse paired read IDs (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- fastq_to_fasta
Undocumented
Convert from fastq format to fasta format. Warning: output reads might be split onto multiple lines.
usage: read_utils.py fastq_to_fasta [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inFastq outFasta
- Positional arguments:
inFastq Input fastq file. outFasta Output fasta file. - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- index_fasta_samtools
Undocumented
Index a reference genome for Samtools.
usage: read_utils.py index_fasta_samtools [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] inFasta
- Positional arguments:
inFasta Reference genome, FASTA format. - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit
- index_fasta_picard
Undocumented
Create an index file for a reference genome suitable for Picard/GATK.
usage: read_utils.py index_fasta_picard [-h] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inFasta
- Positional arguments:
inFasta Input reference genome, FASTA format. - Options:
--JVMmemory=512m JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s CreateSequenceDictionary, OPTIONNAME=value ... --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- mkdup_picard
Undocumented
Mark or remove duplicate reads from BAM file.
usage: read_utils.py mkdup_picard [-h] [--outMetrics OUTMETRICS] [--remove] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBams [inBams ...] outBam
- Positional arguments:
inBams Input reads, BAM format. outBam Output reads, BAM format. - Options:
--outMetrics Output metrics file. Default is to dump to a temp file. --remove=False Instead of marking duplicates, remove them entirely (default: %(default)s) --JVMmemory=2g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s MarkDuplicates, OPTIONNAME=value ... --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- revert_bam_picard
Undocumented
Revert BAM to raw reads
usage: read_utils.py revert_bam_picard [-h] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBam outBam
- Positional arguments:
inBam Input reads, BAM format. outBam Output reads, BAM format. - Options:
--JVMmemory=2g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s RevertSam, OPTIONNAME=value ... --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- picard
Undocumented
Generic Picard runner.
usage: read_utils.py picard [-h] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] command
- Positional arguments:
command picard command - Options:
--JVMmemory=2g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard, OPTIONNAME=value ... --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- sort_bam
Undocumented
Sort BAM file
usage: read_utils.py sort_bam [-h] [--index] [--md5] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBam outBam {unsorted,queryname,coordinate}
- Positional arguments:
inBam Input bam file. outBam Output bam file, sorted. sortOrder How to sort the reads. [default: %(default)s]
Possible choices: unsorted, queryname, coordinate
- Options:
--index=False Index outBam (default: %(default)s) --md5=False MD5 checksum outBam (default: %(default)s) --JVMmemory=2g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s SortSam, OPTIONNAME=value ... --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- merge_bams
Undocumented
Merge multiple BAMs into one
usage: read_utils.py merge_bams [-h] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBams [inBams ...] outBam
- Positional arguments:
inBams Input bam files. outBam Output bam file. - Options:
--JVMmemory=2g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s MergeSamFiles, OPTIONNAME=value ... --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- filter_bam
Undocumented
Filter BAM file by read name
usage: read_utils.py filter_bam [-h] [--exclude] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBam readList outBam
- Positional arguments:
inBam Input bam file. readList Input file of read IDs. outBam Output bam file. - Options:
--exclude=False If specified, readList is a list of reads to remove from input. Default behavior is to treat readList as an inclusion list (all unnamed reads are removed). --JVMmemory=4g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s FilterSamReads, OPTIONNAME=value ... --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- bam_to_fastq
Undocumented
Convert a bam file to a pair of fastq paired-end read files and optional text header.
usage: read_utils.py bam_to_fastq [-h] [--outHeader OUTHEADER] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBam outFastq1 outFastq2
- Positional arguments:
inBam Input bam file. outFastq1 Output fastq file; 1st end of paired-end reads. outFastq2 Output fastq file; 2nd end of paired-end reads. - Options:
--outHeader Optional text file name that will receive bam header. --JVMmemory=2g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s SamToFastq, OPTIONNAME=value ... --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- fastq_to_bam
Undocumented
Convert a pair of fastq paired-end read files and optional text header to a single bam file.
usage: read_utils.py fastq_to_bam [-h] (--sampleName SAMPLENAME | --header HEADER) [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inFastq1 inFastq2 outBam
- Positional arguments:
inFastq1 Input fastq file; 1st end of paired-end reads. inFastq2 Input fastq file; 2nd end of paired-end reads. outBam Output bam file. - Options:
--sampleName Sample name to insert into the read group header. --header Optional text file containing header. --JVMmemory=2g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s FastqToSam, OPTIONNAME=value ... Note that header-related options will be overwritten by HEADER if present. --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- split_reads
Undocumented
Split fasta or fastq file into chunks of maxReads reads or into numChunks chunks named outPrefix01, outPrefix02, etc. If both maxReads and numChunks are None, use defaultMaxReads. The number of characters in file names after outPrefix is indexLen; if not specified, use defaultIndexLen.
usage: read_utils.py split_reads [-h] [--maxReads MAXREADS | --numChunks NUMCHUNKS] [--indexLen INDEXLEN] [--format {fastq,fasta}] [--outSuffix OUTSUFFIX] inFileName outPrefix
- Positional arguments:
inFileName Input fastq or fasta file. outPrefix Output files will be named ${outPrefix}01${outSuffix}, ${outPrefix}02${outSuffix}... - Options:
--maxReads Maximum number of reads per chunk (default 1000 if neither maxReads nor numChunks is specified). --numChunks Number of output files, if maxReads is not specified. --indexLen=2 Number of characters to append to outputPrefix for each output file (default %(default)s). Number of files must not exceed 10^INDEXLEN. --format=fastq Input fastq or fasta file (default: %(default)s).
Possible choices: fastq, fasta
--outSuffix= Output filename suffix (e.g. .fastq or .fastq.gz). A suffix ending in .gz will cause the output file to be gzip compressed. Default is no suffix.
- split_bam
Undocumented
Split BAM file equally into several output BAM files.
usage: read_utils.py split_bam [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBam outBams [outBams ...]
- Positional arguments:
inBam Input BAM file. outBams Output BAM files - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- reheader_bam
Undocumented
Copy a BAM file (inBam to outBam) while renaming elements of the BAM header. The mapping file specifies which (key, old value, new value) mappings. For example: LB lib1 lib_one SM sample1 Sample_1 SM sample2 Sample_2 SM sample3 Sample_3 CN broad BI
usage: read_utils.py reheader_bam [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBam rgMap outBam
- Positional arguments:
inBam Input reads, BAM format. rgMap Tabular file containing three columns: field, old, new. outBam Output reads, BAM format. - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- reheader_bams
Undocumented
Copy BAM files while renaming elements of the BAM header. The mapping file specifies which (key, old value, new value) mappings. For example: LB lib1 lib_one SM sample1 Sample_1 SM sample2 Sample_2 SM sample3 Sample_3 CN broad BI FN in1.bam out1.bam FN in2.bam out2.bam
usage: read_utils.py reheader_bams [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] rgMap
- Positional arguments:
rgMap Tabular file containing three columns: field, old, new. - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- rmdup_mvicuna_bam
Undocumented
Remove duplicate reads from BAM file using M-Vicuna. The primary advantage to this approach over Picard’s MarkDuplicates tool is that Picard requires that input reads are aligned to a reference, and M-Vicuna can operate on unaligned reads.
usage: read_utils.py rmdup_mvicuna_bam [-h] [--JVMmemory JVMMEMORY] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBam outBam
- Positional arguments:
inBam Input reads, BAM format. outBam Output reads, BAM format. - Options:
--JVMmemory=4g JVM virtual memory size (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- dup_remove_mvicuna
Undocumented
Run mvicuna’s duplicate removal operation on paired-end reads.
usage: read_utils.py dup_remove_mvicuna [-h] [--unpairedOutFastq UNPAIREDOUTFASTQ] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inFastq1 inFastq2 pairedOutFastq1 pairedOutFastq2
- Positional arguments:
inFastq1 Input fastq file; 1st end of paired-end reads. inFastq2 Input fastq file; 2nd end of paired-end reads. pairedOutFastq1 Output fastq file; 1st end of paired-end reads. pairedOutFastq2 Output fastq file; 2nd end of paired-end reads. - Options:
--unpairedOutFastq File name of output unpaired reads --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- rmdup_prinseq_fastq
Undocumented
Run prinseq-lite’s duplicate removal operation on paired-end reads. Also removes reads with more than one N.
usage: read_utils.py rmdup_prinseq_fastq [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inFastq1 inFastq2 outFastq1 outFastq2
- Positional arguments:
inFastq1 Input fastq file; 1st end of paired-end reads. inFastq2 Input fastq file; 2nd end of paired-end reads. outFastq1 Output fastq file; 1st end of paired-end reads. outFastq2 Output fastq file; 2nd end of paired-end reads. - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- filter_bam_mapped_only
Undocumented
Samtools to reduce a BAM file to only reads that are aligned (-F 4) with a non-zero mapping quality (-q 1) and are not marked as a PCR/optical duplicate (-F 1024).
usage: read_utils.py filter_bam_mapped_only [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBam outBam
- Positional arguments:
inBam Input aligned reads, BAM format. outBam Output sorted indexed reads, filtered to aligned-only, BAM format. - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- novoalign
Undocumented
Align reads with Novoalign. Sort and index BAM output.
usage: read_utils.py novoalign [-h] [--options OPTIONS] [--min_qual MIN_QUAL] [--JVMmemory JVMMEMORY] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBam refFasta outBam
- Positional arguments:
inBam Input reads, BAM format. refFasta Reference genome, FASTA format, pre-indexed by Novoindex. outBam Output reads, BAM format (aligned). - Options:
--options=-r Random Novoalign options (default: %(default)s) --min_qual=0 Filter outBam to minimum mapping quality (default: %(default)s) --JVMmemory=2g JVM virtual memory size (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- novoindex
Undocumented
Index a FASTA file (reference genome) for use with Novoalign. The input file name must end in ”.fasta”. This will create a new ”.nix” file in the same directory. If it already exists, it will be deleted and regenerated.
usage: read_utils.py novoindex [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] refFasta
- Positional arguments:
refFasta Reference genome, FASTA format. - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit
- gatk_ug
Undocumented
Call genotypes using the GATK UnifiedGenotyper.
usage: read_utils.py gatk_ug [-h] [--options OPTIONS] [--JVMmemory JVMMEMORY] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBam refFasta outVcf
- Positional arguments:
inBam Input reads, BAM format. refFasta Reference genome, FASTA format, pre-indexed by Picard. outVcf Output calls in VCF format. If this filename ends with .gz, GATK will BGZIP compress the output and produce a Tabix index file as well. - Options:
--options=--min_base_quality_score 15 -ploidy 4 UnifiedGenotyper options (default: %(default)s) --JVMmemory=2g JVM virtual memory size (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- gatk_realign
Undocumented
Local realignment of BAM files with GATK IndelRealigner.
usage: read_utils.py gatk_realign [-h] [--JVMmemory JVMMEMORY] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] [--threads THREADS] inBam refFasta outBam
- Positional arguments:
inBam Input reads, BAM format, aligned to refFasta. refFasta Reference genome, FASTA format, pre-indexed by Picard. outBam Realigned reads. - Options:
--JVMmemory=2g JVM virtual memory size (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure. --threads=1 Number of threads (default: %(default)s)
- align_and_fix
Undocumented
Take reads, align to reference with Novoalign, mark duplicates with Picard, realign indels with GATK, and optionally filter final file to mapped/non-dupe reads.
usage: read_utils.py align_and_fix [-h] [--outBamAll OUTBAMALL] [--outBamFiltered OUTBAMFILTERED] [--novoalign_options NOVOALIGN_OPTIONS] [--JVMmemory JVMMEMORY] [--threads THREADS] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBam refFasta
- Positional arguments:
inBam Input unaligned reads, BAM format. refFasta Reference genome, FASTA format, pre-indexed by Picard and Novoalign. - Options:
--outBamAll Aligned, sorted, and indexed reads. Unmapped reads are retained and duplicate reads are marked, not removed. --outBamFiltered Aligned, sorted, and indexed reads. Unmapped reads and duplicate reads are removed from this file. --novoalign_options=-r Random Novoalign options (default: %(default)s) --JVMmemory=4g JVM virtual memory size (default: %(default)s) --threads=1 Number of threads (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- align_and_count_hits
Undocumented
Take reads, align to reference with Novoalign and return aligned read counts for each reference sequence.
usage: read_utils.py align_and_count_hits [-h] [--includeZeros] [--JVMmemory JVMMEMORY] [--threads THREADS] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBam refFasta outCounts
- Positional arguments:
inBam Input unaligned reads, BAM format. refFasta Reference genome, FASTA format, pre-indexed by Picard and Novoalign. outCounts Output counts file - Options:
--includeZeros=False Output lines with no hits (default: %(default)s) --JVMmemory=4g JVM virtual memory size (default: %(default)s) --threads=8 Number of threads (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.