Package picard.sam

Class FastqToSam


@DocumentedFeature public class FastqToSam extends CommandLineProgram
Converts a FASTQ file to an unaligned BAM or SAM file.

Output read records will contain the original base calls and quality scores will be translated depending on the base quality score encoding: FastqSanger, FastqSolexa and FastqIllumina.

There are also arguments to provide values for SAM header and read attributes that are not present in FASTQ (e.g see RG or SM below).

Inputs

One FASTQ file name for single-end or two for pair-end sequencing input data. These files might be in gzip compressed format (when file name is ending with ".gz").

Alternatively, for larger inputs you can provide a collection of FASTQ files indexed by their name (see USE_SEQUENCIAL_FASTQ for details below).

By default, this tool will try to guess the base quality score encoding. However you can indicate it explicitly using the QUALITY_FORMAT argument.

Output

A single unaligned BAM or SAM file. By default, the records are sorted by query (read) name.

Usage examples

Example 1:

Single-end sequencing FASTQ file conversion. All reads are annotated as belonging to the "rg0013" read group that in turn is part of the sample "sample001".

 java -jar picard.jar FastqToSam \
      F1=input_reads.fastq \
      O=unaligned_reads.bam \
      SM=sample001 \
      RG=rg0013
 

Example 2:

Similar to example 1 above, but for paired-end sequencing.

 java -jar picard.jar FastqToSam \
      F1=forward_reads.fastq \
      F2=reverse_reads.fastq \
      O=unaligned_read_pairs.bam \
      SM=sample001 \
      RG=rg0013
 
  • Field Details

    • FASTQ

      @Argument(shortName="F1", doc="Input fastq file (optionally gzipped) for single end data, or first read in paired end data.") public PicardHtsPath FASTQ
    • FASTQ2

      @Argument(shortName="F2", doc="Input fastq file (optionally gzipped) for the second read of paired end data.", optional=true) public PicardHtsPath FASTQ2
    • USE_SEQUENTIAL_FASTQS

      @Argument(doc="Use sequential fastq files with the suffix <prefix>_###.fastq or <prefix>_###.fastq.gz.The files should be named:\n <prefix>_001.<extension>, <prefix>_002.<extension>, ..., <prefix>_XYZ.<extension>\n The base files should be:\n <prefix>_001.<extension>\n An example would be:\n RUNNAME_S8_L005_R1_001.fastq\n RUNNAME_S8_L005_R1_002.fastq\n RUNNAME_S8_L005_R1_003.fastq\n RUNNAME_S8_L005_R1_004.fastq\nRUNNAME_S8_L005_R1_001.fastq should be provided as FASTQ.", optional=true) public boolean USE_SEQUENTIAL_FASTQS
    • QUALITY_FORMAT

      @Argument(shortName="V", doc="A value describing how the quality values are encoded in the input FASTQ file. Either Solexa (phred scaling + 66), Illumina (phred scaling + 64) or Standard (phred scaling + 33). If this value is not specified, the quality format will be detected automatically.", optional=true) public htsjdk.samtools.util.FastqQualityFormat QUALITY_FORMAT
    • OUTPUT

      @Argument(doc="Output BAM/SAM/CRAM file. ", shortName="O") public File OUTPUT
    • READ_GROUP_NAME

      @Argument(shortName="RG", doc="Read group name") public String READ_GROUP_NAME
    • SAMPLE_NAME

      @Argument(shortName="SM", doc="Sample name to insert into the read group header") public String SAMPLE_NAME
    • LIBRARY_NAME

      @Argument(shortName="LB", doc="The library name to place into the LB attribute in the read group header", optional=true) public String LIBRARY_NAME
    • PLATFORM_UNIT

      @Argument(shortName="PU", doc="The platform unit (often run_barcode.lane) to insert into the read group header", optional=true) public String PLATFORM_UNIT
    • PLATFORM

      @Argument(shortName="PL", doc="The platform type (e.g. ILLUMINA, SOLID) to insert into the read group header", optional=true) public String PLATFORM
    • SEQUENCING_CENTER

      @Argument(shortName="CN", doc="The sequencing center from which the data originated", optional=true) public String SEQUENCING_CENTER
    • PREDICTED_INSERT_SIZE

      @Argument(shortName="PI", doc="Predicted median insert size, to insert into the read group header", optional=true) public Integer PREDICTED_INSERT_SIZE
    • PROGRAM_GROUP

      @Argument(shortName="PG", doc="Program group to insert into the read group header.", optional=true) public String PROGRAM_GROUP
    • PLATFORM_MODEL

      @Argument(shortName="PM", doc="Platform model to insert into the group header (free-form text providing further details of the platform/technology used)", optional=true) public String PLATFORM_MODEL
    • COMMENT

      @Argument(doc="Comment(s) to include in the merged output file\'s header.", optional=true, shortName="CO") public List<String> COMMENT
    • DESCRIPTION

      @Argument(shortName="DS", doc="Inserted into the read group header", optional=true) public String DESCRIPTION
    • RUN_DATE

      @Argument(shortName="DT", doc="Date the run was produced, to insert into the read group header", optional=true) public htsjdk.samtools.util.Iso8601Date RUN_DATE
    • SORT_ORDER

      @Argument(shortName="SO", doc="The sort order for the output BAM/SAM/CRAM file.") public htsjdk.samtools.SAMFileHeader.SortOrder SORT_ORDER
    • MIN_Q

      @Argument(doc="Minimum quality allowed in the input fastq. An exception will be thrown if a quality is less than this value.") public int MIN_Q
    • MAX_Q

      @Argument(doc="Maximum quality allowed in the input fastq. An exception will be thrown if a quality is greater than this value.") public int MAX_Q
    • STRIP_UNPAIRED_MATE_NUMBER

      @Deprecated @Argument(doc="Deprecated (No longer used). If true and this is an unpaired fastq any occurrence of \'/1\' or \'/2\' will be removed from the end of a read name.") public Boolean STRIP_UNPAIRED_MATE_NUMBER
      Deprecated.
    • ALLOW_AND_IGNORE_EMPTY_LINES

      @Argument(doc="Allow (and ignore) empty lines") public Boolean ALLOW_AND_IGNORE_EMPTY_LINES
  • Constructor Details

    • FastqToSam

      public FastqToSam()
  • Method Details

    • determineQualityFormat

      public static htsjdk.samtools.util.FastqQualityFormat determineQualityFormat(htsjdk.samtools.fastq.FastqReader reader1, htsjdk.samtools.fastq.FastqReader reader2, htsjdk.samtools.util.FastqQualityFormat expectedQuality)
      Looks at fastq input(s) and attempts to determine the proper quality format Closes the reader(s) by side effect
      Parameters:
      reader1 - The first fastq input
      reader2 - The second fastq input, if necessary. To not use this input, set it to null
      expectedQuality - If provided, will be used for sanity checking. If left null, autodetection will occur
    • getSequentialFileList

      protected static List<Path> getSequentialFileList(Path baseFastq)
      Get a list of FASTQs that are sequentially numbered based on the first (base) fastq. The files should be named: _001., _002., ..., _XYZ. The base files should be: _001. An example would be: RUNNAME_S8_L005_R1_001.fastq RUNNAME_S8_L005_R1_002.fastq RUNNAME_S8_L005_R1_003.fastq RUNNAME_S8_L005_R1_004.fastq where `baseFastq` is the first in that list.
    • doWork

      protected int doWork()
      Description copied from class: CommandLineProgram
      Do the work after command line has been parsed. RuntimeException may be thrown by this method, and are reported appropriately.
      Specified by:
      doWork in class CommandLineProgram
      Returns:
      program exit status.
    • makeItSo

      public void makeItSo(htsjdk.samtools.fastq.FastqReader reader1, htsjdk.samtools.fastq.FastqReader reader2, htsjdk.samtools.SAMFileWriter writer)
      Handles the FastqToSam execution on the FastqReader(s). In some circumstances it might be useful to circumvent the command line based instantiation of this class, however note that there is no handholding or guardrails to running in this manner. It is the caller's responsibility to close the reader(s)
      Parameters:
      reader1 - The FastqReader for the first fastq file
      reader2 - The second FastqReader if applicable. Pass in null if only using a single reader
      writer - The SAMFileWriter where the new SAM file is written
    • doUnpaired

      protected int doUnpaired(htsjdk.samtools.fastq.FastqReader freader, htsjdk.samtools.SAMFileWriter writer)
      Creates a simple SAM file from a single fastq file.
    • doPaired

      protected int doPaired(htsjdk.samtools.fastq.FastqReader freader1, htsjdk.samtools.fastq.FastqReader freader2, htsjdk.samtools.SAMFileWriter writer)
      More complicated method that takes two fastq files and builds pairing information in the SAM.
    • createSamFileHeader

      public htsjdk.samtools.SAMFileHeader createSamFileHeader()
      Creates a simple header with the values provided on the command line.
    • customCommandLineValidation

      protected String[] customCommandLineValidation()
      Description copied from class: CommandLineProgram
      Put any custom command-line validation in an override of this method. clp is initialized at this point and can be used to print usage and access argv. Any options set by command-line parser can be validated.
      Overrides:
      customCommandLineValidation in class CommandLineProgram
      Returns:
      null if command line is valid. If command line is invalid, returns an array of error message to be written to the appropriate place.