3.7. illumina.py - for raw Illumina outputsΒΆ
Utilities for demultiplexing Illumina data.
usage: illumina.py subcommand
- Sub-commands:
- illumina_demux
Undocumented
Demultiplex Illumina runs & produce BAM files, one per sample. Wraps together Picard’s ExtractBarcodes and IlluminaBasecallsToSam while handling the various required input formats. Also can read Illumina BCL directories, tar.gz BCL directories. TO DO: read BCL or tar.gz BCL directories from S3 / object store.
usage: illumina.py illumina_demux [-h] [--outMetrics OUTMETRICS] [--sampleSheet SAMPLESHEET] [--flowcell FLOWCELL] [--read_structure READ_STRUCTURE] [--max_mismatches MAX_MISMATCHES] [--minimum_base_quality MINIMUM_BASE_QUALITY] [--min_mismatch_delta MIN_MISMATCH_DELTA] [--max_no_calls MAX_NO_CALLS] [--minimum_quality MINIMUM_QUALITY] [--compress_outputs COMPRESS_OUTPUTS] [--sequencing_center SEQUENCING_CENTER] [--adapters_to_check [ADAPTERS_TO_CHECK [ADAPTERS_TO_CHECK ...]]] [--platform PLATFORM] [--max_reads_in_ram_per_tile MAX_READS_IN_RAM_PER_TILE] [--max_records_in_ram MAX_RECORDS_IN_RAM] [--num_processors NUM_PROCESSORS] [--apply_eamss_filter APPLY_EAMSS_FILTER] [--force_gc FORCE_GC] [--first_tile FIRST_TILE] [--tile_limit TILE_LIMIT] [--include_non_pf_reads INCLUDE_NON_PF_READS] [--run_start_date RUN_START_DATE] [--read_group_id READ_GROUP_ID] [--JVMmemory JVMMEMORY] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inDir lane outDir
- Positional arguments:
inDir Illumina BCL directory (or tar.gz of BCL directory). lane Lane number. outDir Output directory for BAM files. - Options:
--outMetrics Output ExtractIlluminaBarcodes metrics file. Default is to dump to a temp file. --sampleSheet Override SampleSheet. Input tab or CSV file w/header and four named columns: barcode_name, library_name, barcode_sequence_1, barcode_sequence_2. Default is to look for a SampleSheet.csv in the inDir. --flowcell Override flowcell ID (default: read from RunInfo.xml). --read_structure Override read structure (default: read from RunInfo.xml). --max_mismatches=0 Picard ExtractIlluminaBarcodes MAX_MISMATCHES (default: %(default)s) --minimum_base_quality=25 Picard ExtractIlluminaBarcodes MINIMUM_BASE_QUALITY (default: %(default)s) --min_mismatch_delta Picard ExtractIlluminaBarcodes MIN_MISMATCH_DELTA (default: %(default)s) --max_no_calls Picard ExtractIlluminaBarcodes MAX_NO_CALLS (default: %(default)s) --minimum_quality Picard ExtractIlluminaBarcodes MINIMUM_QUALITY (default: %(default)s) --compress_outputs Picard ExtractIlluminaBarcodes COMPRESS_OUTPUTS (default: %(default)s) --sequencing_center Picard IlluminaBasecallsToSam SEQUENCING_CENTER (default: %(default)s) --adapters_to_check=('PAIRED_END', 'NEXTERA_V1', 'NEXTERA_V2') Picard IlluminaBasecallsToSam ADAPTERS_TO_CHECK (default: %(default)s) --platform Picard IlluminaBasecallsToSam PLATFORM (default: %(default)s) --max_reads_in_ram_per_tile=100000 Picard IlluminaBasecallsToSam MAX_READS_IN_RAM_PER_TILE (default: %(default)s) --max_records_in_ram=100000 Picard IlluminaBasecallsToSam MAX_RECORDS_IN_RAM (default: %(default)s) --num_processors=4 Picard IlluminaBasecallsToSam NUM_PROCESSORS (default: %(default)s) --apply_eamss_filter Picard IlluminaBasecallsToSam APPLY_EAMSS_FILTER (default: %(default)s) --force_gc=False Picard IlluminaBasecallsToSam FORCE_GC (default: %(default)s) --first_tile Picard IlluminaBasecallsToSam FIRST_TILE (default: %(default)s) --tile_limit Picard IlluminaBasecallsToSam TILE_LIMIT (default: %(default)s) --include_non_pf_reads=False Picard IlluminaBasecallsToSam INCLUDE_NON_PF_READS (default: %(default)s) --run_start_date Picard IlluminaBasecallsToSam RUN_START_DATE (default: %(default)s) --read_group_id Picard IlluminaBasecallsToSam READ_GROUP_ID (default: %(default)s) --JVMmemory=54g JVM virtual memory size (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- miseq_fastq_to_bam
Undocumented
Convert fastq read files to a single bam file. Fastq file names must conform to patterns emitted by Miseq machines. Sample metadata must be provided in a SampleSheet.csv that corresponds to the fastq filename. Specifically, the _S##_ index in the fastq file name will be used to find the corresponding row in the SampleSheet
usage: illumina.py miseq_fastq_to_bam [-h] [--inFastq2 INFASTQ2] [--runInfo RUNINFO] [--sequencing_center SEQUENCING_CENTER] [--JVMmemory JVMMEMORY] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] outBam sampleSheet inFastq1
- Positional arguments:
outBam Output BAM file. sampleSheet Input SampleSheet.csv file. inFastq1 Input fastq file; 1st end of paired-end reads if paired. - Options:
--inFastq2 Input fastq file; 2nd end of paired-end reads. --runInfo Input RunInfo.xml file. --sequencing_center Name of your sequencing center (default is the sequencing machine ID from the RunInfo.xml) --JVMmemory=2g JVM virtual memory size (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- extract_fc_metadata
Undocumented
Extract RunInfo.xml and SampleSheet.csv from the provided Illumina directory
usage: illumina.py extract_fc_metadata [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] flowcell outRunInfo outSampleSheet
- Positional arguments:
flowcell Illumina directory (possibly tarball) outRunInfo Output RunInfo.xml file. outSampleSheet Output SampleSheet.csv file. - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.