Coupling (interleaving) and decoupling paired/unpaired Fastq files

Paired-end short read data typically come in one of two forms: paired files or interleaved files. In paired files, read pairs are split between two files. For example, if the sequence data are stored in files called sample.r1.fq and sample.r2.fq, then the first entry of sample.r1.fq and the first entry of sample.r2.fq is a read pair, the second entries of the files are another read pair, and so on. In interleaved files, read pairs are stored in the same file, one after the other: the first and second entry is a read pair, the third and fourth entry is a read pair, and so on.

This page includes instructions for converting between these two conventions.

Coupling (interleaving) paired Fastq files

If you have a pair of Fastq files that you need to interleave, the following script (provided as part of the source code distribution of the Velvet assembler) can do this.

shuffleSequences_fastq.pl
#!/usr/bin/perl
 
$filenameA = $ARGV[0];
$filenameB = $ARGV[1];
$filenameOut = $ARGV[2];
 
open $FILEA, "< $filenameA";
open $FILEB, "< $filenameB";
 
open $OUTFILE, "> $filenameOut";
 
while(<$FILEA>) {
        print $OUTFILE $_;
        $_ = <$FILEA>;
        print $OUTFILE $_; 
        $_ = <$FILEA>;
        print $OUTFILE $_; 
        $_ = <$FILEA>;
        print $OUTFILE $_; 
 
        $_ = <$FILEB>;
        print $OUTFILE $_; 
        $_ = <$FILEB>;
        print $OUTFILE $_;
        $_ = <$FILEB>;
        print $OUTFILE $_;
        $_ = <$FILEB>;
        print $OUTFILE $_;
}

If your data are stored in Fastq files called sample.r1.fq and sample.r2.fq, the following command will generate an interleaved Fastq file called sample.int.fq.

perl shuffleSequences_fastq.pl sample.r1.fq sample.r2.fq sample.int.fq

Decoupling interleaved Fastq files

If you have an interleaved Fastq file that you would like to split into paired files, the following script can do this.

fastq-split.pl
use strict;
 
while(my $line = <STDIN>)
{
  chomp($line);
  my @values = split(/\t/, $line);
  printf(STDOUT "%s\n%s\n%s\n%s\n", @values[0..3]);
  printf(STDERR "%s\n%s\n%s\n%s\n", @values[4..7]);
}

If your data is stored in a Fastq file called sample.int.fq, then the following command will generate the paired files sample.r1.fq and sample.r2.fq for you.

paste - - - - - - - - < sample.int.fq | perl fastq-split.pl > sample.r1.fq 2> sample.r2.fq
cgss15/ngs/fastq-interleave.txt · Last modified: 2015/01/12 14:08 by standage
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki