Package picard.sam
Class MergeBamAlignment
java.lang.Object
picard.cmdline.CommandLineProgram
picard.sam.MergeBamAlignment
Summary
A command-line tool for merging BAM/SAM alignment info from a third-party aligner with the data in an unmapped BAM file, producing a third BAM file that has alignment data (from the aligner) and all the remaining data from the unmapped BAM. Quick note: this is not a tool for taking multiple sam files and creating a bigger file by merging them. For that use-case, seeMergeSamFiles
.
Details
Many alignment tools (still!) require fastq format input. The unmapped bam may contain useful information that will be lost in the conversion to fastq (meta-data like sample alias, library, barcodes, etc., and read-level tags.) This tool takes an unaligned bam with meta-data, and the aligned bam produced by callingSamToFastq
and
then passing the result to an aligner/mapper. It produces a new SAM file that includes all aligned and unaligned reads
and also carries forward additional read attributes from the unmapped BAM (attributes that are otherwise lost in the
process of converting to fastq). The resulting file will be valid for use by Picard and GATK tools.
The output may be coordinate-sorted, in which case the tags, NM, MD, and UQ will be calculated and populated, or
query-name sorted, in which case the tags will not be calculated or populated.
Usage example:
java -jar picard.jar MergeBamAlignment \\ ALIGNED=aligned.bam \\ UNMAPPED=unmapped.bam \\ O=merge_alignments.bam \\ R=reference_sequence.fasta
Caveats
This tool has been developing for a while and many arguments have been added to it over the years. You may be particularly interested in the following (partial) list:- CLIP_ADAPTERS -- Whether to (soft-)clip the ends of the reads that are identified as belonging to adapters
- IS_BISULFITE_SEQUENCE -- Whether the sequencing originated from bisulfite sequencing, in which case NM will be calculated differently
- ALIGNER_PROPER_PAIR_FLAGS -- Use if the aligner that was used cannot be trusted to set the "Proper pair" flag and then the tool will set this flag based on orientation and distance between pairs.
- ADD_MATE_CIGAR -- Whether to use this opportunity to add the MC tag to each read.
- UNMAP_CONTAMINANT_READS (and MIN_UNCLIPPED_BASES) -- Whether to identify extremely short alignments (with clipping on both sides) as cross-species contamination and unmap the reads.
-
Field Summary
FieldsModifier and TypeFieldDescriptionboolean
boolean
boolean
boolean
List<htsjdk.samtools.SamPairUtil.PairOrientation>
boolean
boolean
boolean
Deprecated.int
int
Deprecated.final PGTagArgumentCollection
picard.sam.MergeBamAlignment.PrimaryAlignmentStrategy
int
int
htsjdk.samtools.SAMFileHeader.SortOrder
boolean
Fields inherited from class picard.cmdline.CommandLineProgram
COMPRESSION_LEVEL, CREATE_INDEX, CREATE_MD5_FILE, GA4GH_CLIENT_SECRETS, MAX_ALLOWABLE_ONE_LINE_SUMMARY_LENGTH, MAX_RECORDS_IN_RAM, QUIET, REFERENCE_SEQUENCE, referenceSequence, specialArgumentsCollection, SYNTAX_TRANSITION_URL, TMP_DIR, USE_JDK_DEFLATER, USE_JDK_INFLATER, VALIDATION_STRINGENCY, VERBOSITY
-
Constructor Summary
Constructors -
Method Summary
Methods inherited from class picard.cmdline.CommandLineProgram
checkRInstallation, getCommandLine, getCommandLineParser, getCommandLineParserForArgs, getDefaultHeaders, getFaqLink, getMetricsFile, getPGRecord, getStandardUsagePreamble, getStandardUsagePreamble, getVersion, hasWebDocumentation, instanceMain, instanceMainWithExit, makeReferenceArgumentCollection, parseArgs, setDefaultHeaders, useLegacyParser
-
Field Details
-
pgTagArgumentCollection
-
UNMAPPED_BAM
@Argument(shortName="UNMAPPED", doc="Original SAM or BAM file of unmapped reads, which must be in queryname order. Reads MUST be unmapped.") public File UNMAPPED_BAM -
ALIGNED_BAM
-
READ1_ALIGNED_BAM
-
READ2_ALIGNED_BAM
-
OUTPUT
-
PROGRAM_RECORD_ID
@Argument(shortName="PG", doc="The program group ID of the aligner (if not supplied by the aligned file).", optional=true) public String PROGRAM_RECORD_ID -
PROGRAM_GROUP_VERSION
@Argument(shortName="PG_VERSION", doc="The version of the program group (if not supplied by the aligned file).", optional=true) public String PROGRAM_GROUP_VERSION -
PROGRAM_GROUP_COMMAND_LINE
@Argument(shortName="PG_COMMAND", doc="The command line of the program group (if not supplied by the aligned file).", optional=true) public String PROGRAM_GROUP_COMMAND_LINE -
PROGRAM_GROUP_NAME
@Argument(shortName="PG_NAME", doc="The name of the program group (if not supplied by the aligned file).", optional=true) public String PROGRAM_GROUP_NAME -
PAIRED_RUN
@Deprecated @Argument(doc="DEPRECATED. This argument is ignored and will be removed.", shortName="PE", optional=true) public Boolean PAIRED_RUNDeprecated. -
JUMP_SIZE
@Deprecated @Argument(doc="The expected jump size (required if this is a jumping library). Deprecated. Use EXPECTED_ORIENTATIONS instead", shortName="JUMP", mutex="EXPECTED_ORIENTATIONS", optional=true) public Integer JUMP_SIZEDeprecated. -
CLIP_ADAPTERS
@Argument(doc="Whether to clip adapters where identified.") public boolean CLIP_ADAPTERS -
IS_BISULFITE_SEQUENCE
@Argument(doc="Whether the lane is bisulfite sequence (used when calculating the NM tag).") public boolean IS_BISULFITE_SEQUENCE -
ALIGNED_READS_ONLY
@Argument(doc="Whether to output only aligned reads. ") public boolean ALIGNED_READS_ONLY -
MAX_INSERTIONS_OR_DELETIONS
@Argument(doc="The maximum number of insertions or deletions permitted for an alignment to be included. Alignments with more than this many insertions or deletions will be ignored. Set to -1 to allow any number of insertions or deletions.", shortName="MAX_GAPS") public int MAX_INSERTIONS_OR_DELETIONS -
ATTRIBUTES_TO_RETAIN
-
ATTRIBUTES_TO_REMOVE
-
ATTRIBUTES_TO_REVERSE
-
ATTRIBUTES_TO_REVERSE_COMPLEMENT
-
READ1_TRIM
@Argument(shortName="R1_TRIM", doc="The number of bases trimmed from the beginning of read 1 prior to alignment") public int READ1_TRIM -
READ2_TRIM
@Argument(shortName="R2_TRIM", doc="The number of bases trimmed from the beginning of read 2 prior to alignment") public int READ2_TRIM -
EXPECTED_ORIENTATIONS
@Argument(shortName="ORIENTATIONS", doc="The expected orientation of proper read pairs. Replaces JUMP_SIZE", mutex="JUMP_SIZE", optional=true) public List<htsjdk.samtools.SamPairUtil.PairOrientation> EXPECTED_ORIENTATIONS -
ALIGNER_PROPER_PAIR_FLAGS
@Argument(doc="Use the aligner\'s idea of what a proper pair is rather than computing in this program.") public boolean ALIGNER_PROPER_PAIR_FLAGS -
SORT_ORDER
@Argument(shortName="SO", doc="The order in which the merged reads should be output.") public htsjdk.samtools.SAMFileHeader.SortOrder SORT_ORDER -
PRIMARY_ALIGNMENT_STRATEGY
@Argument(doc="Strategy for selecting primary alignment when the aligner has provided more than one alignment for a pair or fragment, and none are marked as primary, more than one is marked as primary, or the primary alignment is filtered out for some reason. For all strategies, ties are resolved arbitrarily.") public picard.sam.MergeBamAlignment.PrimaryAlignmentStrategy PRIMARY_ALIGNMENT_STRATEGY -
CLIP_OVERLAPPING_READS
@Argument(doc="For paired reads, clip the 3\' end of each read if necessary so that it does not extend past the 5\' end of its mate. Reads are first soft clipped so that the 3\' aligned end of each read does not extend past the 5\' aligned end of its mate. If HARD_CLIP_OVERLAPPING_READS is also true, then reads are additionally hard clipped so that the 3\' unclipped end of each read does not extend past the 5\' unclipped end of its mate. Hard clipped bases and their qualities are stored in the XB and XQ tags, respectively.") public boolean CLIP_OVERLAPPING_READS -
HARD_CLIP_OVERLAPPING_READS
@Argument(doc="If true, hard clipping will be applied to overlapping reads. By default, soft clipping is used.") public boolean HARD_CLIP_OVERLAPPING_READS -
INCLUDE_SECONDARY_ALIGNMENTS
@Argument(doc="If false, do not write secondary alignments to output.") public boolean INCLUDE_SECONDARY_ALIGNMENTS -
ADD_MATE_CIGAR
@Argument(shortName="MC", optional=true, doc="Adds the mate CIGAR tag (MC) if true, does not if false.") public Boolean ADD_MATE_CIGAR -
UNMAP_CONTAMINANT_READS
@Argument(shortName="UNMAP_CONTAM", optional=true, doc="Detect reads originating from foreign organisms (e.g. bacterial DNA in a non-bacterial sample),and unmap + label those reads accordingly.") public boolean UNMAP_CONTAMINANT_READS -
MIN_UNCLIPPED_BASES
@Argument(doc="If UNMAP_CONTAMINANT_READS is set, require this many unclipped bases or else the read will be marked as contaminant.") public int MIN_UNCLIPPED_BASES -
MATCHING_DICTIONARY_TAGS
-
UNMAPPED_READ_STRATEGY
@Argument(doc="How to deal with alignment information in reads that are being unmapped (e.g. due to cross-species contamination.) Currently ignored unless UNMAP_CONTAMINANT_READS = true. Note that the DO_NOT_CHANGE strategy will actually reset the cigar and set the mapping quality on unmapped reads since otherwisethe result will be an invalid record. To force no change use the DO_NOT_CHANGE_INVALID strategy.", optional=true) public AbstractAlignmentMerger.UnmappingReadStrategy UNMAPPED_READ_STRATEGY
-
-
Constructor Details
-
MergeBamAlignment
public MergeBamAlignment()
-
-
Method Details
-
requiresReference
protected boolean requiresReference()- Overrides:
requiresReference
in classCommandLineProgram
-
doWork
protected int doWork()Description copied from class:CommandLineProgram
Do the work after command line has been parsed. RuntimeException may be thrown by this method, and are reported appropriately.- Specified by:
doWork
in classCommandLineProgram
- Returns:
- program exit status.
-
customCommandLineValidation
Put any custom command-line validation in an override of this method. clp is initialized at this point and can be used to print usage and access argv. Any options set by command-line parser can be validated.- Overrides:
customCommandLineValidation
in classCommandLineProgram
- Returns:
- null if command line is valid. If command line is invalid, returns an array of error messages to be written to the appropriate place.
-