Converting between Fastq variants

The Fastq format is the de facto standard for encoding sequence data with accompanying quality scores. Unfortunately, several incompatible variations of this standard exist (see Cock et al (2010) for details). This page contains instructions for converting data between the various “flavors” of the Fastq format.

Seqret

The seqret program (part of the EMBOSS toolkit) is fluent in the various Fastq dialects and can convert between them. The following command will convert a Sanger-variant Fastq file to an Illumina-variant Fastq file.

seqret -sequence YourNGSdataFile.fastq -sformat1 fastq-sanger -outseq YouNewDataFile.fastq -osformat2 fastq-illumina

The seqret uses the following strings to refer to the various Fastq format variants.

Label Description
fastq Fastq short read format ignoring quality scores
fastq-sanger Fastq short read format with phred quality
fastq-illumina Fastq Illumina 1.3 short read format
fastq-solexa Fastq Solexa/Illumina 1.0 short read format

To see a list of all available options for running seqret, type

seqret -h -verbose
cgss15/ngs/fastq-interconvert.txt · Last modified: 2015/01/12 14:07 by standage
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki