Exercise: transcriptome assembly

For our transcriptome assembly exercise, we'll continue using wasp data. In total, we have 12 wasp libraries: 6 biological replicates each from queen and worker. However, for this exercise I have provided only 4 libraries.

These libraries have been reduced in size to enable quicker computation. First, they were digitally normalized to a coverage of approximately 20x. Then, a subset of 100,000 reads were sampled at random from each library.

Hopefully working with data sets this small, you should be able to compute an assembly fairly quickly. Don't worry about trying to interpret the assembly too much, though, since this is insufficient data. Here we're simply trying to get comfortable with the mechanics of running the Trinity assembler.

Data access: iPlant data store

For this exercise, 4 reduced libraries are available. Choose one arbitrarily for this exercise, and download the data to Mason with the following command.

:!: For the exercise, do not worry about quality control. We will worry about that later in the assignment. :!:

  • 4 libraries
    • 2 queen wasps
      • q1
      • q2
    • 2 worker wasps
      • w1
      • w2
# Change $sample to whichever you would like
sample=q1
iget -V /iplant/home/standage/CGS/pdom-${sample}-subsample-1.fq.gz
iget -V /iplant/home/standage/CGS/pdom-${sample}-subsample-2.fq.gz

Running Trinity on Mason

The following launch script will run a Trinity assembly on Mason. By specifying 8 nodes and a 4hr runtime, hopefully these will not have to wait in the queue too long!

trinity.sh
#!/usr/bin/env bash
 
#PBS -N TransAsmblExercise
#PBS -l nodes=1:ppn=8,walltime=4:00:00,vmem=125gb
#PBS -j oe
#PBS -m abe
#PBS -q shared
#PBS -M your.email@indiana.edu
 
# Load software
module load java
module load bowtie/1.1.1
module load samtools
module load trinityrnaseq/2014-07-17
 
WD=/N/dc2/scratch/YourUsername/trans-asmbl
cd $WD
 
Trinity --seqType fq --JM 100G --CPU 8 --output test-assembly \
        --left  pdom-q1-subsample-1.fq.gz \
        --right pdom-q1-subsample-2.fq.gz

With a successful assembly, there should be a Trinity.fasta file in the output directory. Go ahead and take a look at the file, and perhaps compute some basic summary statistics (using the asmbleval.pl script). Again, we're not going to spend too much time interpreting the data because it's so artificially small, but just to get an idea.

If you are able to get a successful assembly, perhaps try downloading a second data set and assembling them together like so.

trinity.sh
#!/usr/bin/env bash
 
#PBS -N TransAsmblExercise2
#PBS -l nodes=1:ppn=8,walltime=4:00:00,vmem=125gb
#PBS -j oe
#PBS -m abe
#PBS -q shared
#PBS -M your.email@indiana.edu
 
# Load software
module load java
module load bowtie/1.1.1
module load samtools
module load trinityrnaseq/2014-07-17
 
WD=/N/dc2/scratch/YourUsername/trans-asmbl
cd $WD
 
Trinity --seqType fq --JM 100G --CPU 8 --output test-assembly \
        --left  pdom-q1-subsample-1.fq.gz pdom-q2-subsample-1.fq.gz \
        --right pdom-q1-subsample-2.fq.gz pdom-q2-subsample-2.fq.gz
cgss15/transcript-assembly/exercise.txt · Last modified: 2015/04/02 22:20 by standage
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki