This is a sitemap over all available pages ordered by namespaces.
It's difficult to craft a scaled down exercise for eukaryotic genome assembly. Any data set that would complete in 5-10 minutes probably isn't a good example of what you're likely to deal with in “the real world”, but we want to follow the pattern we've used so far in the class by starting off with a manageable example before moving on to full-scale analysis and interpretation.
For the eukaryotic genome assembly assignment, we'll be working with a data set from the paper wasp Polistes dominula. The complete data set has 5 Illumina libraries, but for a preliminary exercise we will only use the 500 bp insert library and the 3kb insert library. I have applied digital normalization to these libraries to reduce their coverage while retaining their information content.
Despite my best efforts to encourage you to branch out, it seems like most everyone used Velvet for the prokaryotic genome assembly assignment. This exercise will introduce you to another popular assembler from BGI called SOAPdenovo. The assembler is pre-installed on your iPlant VMs, although there is not enough memory on those machines to do anything but the smallest of assemblies. SOAPdenovo is also available on Mason, however, and since we'll need Mason's computing power for the main assignment we may as well use it for the exercise as well.
iinit
, iget
, etc) installed by default. Install iRODS in your home directory (using the instructions below) before you begin the assignment./iplant/home/standage/CGS/pdom-500bp-1.fq
/iplant/home/standage/CGS/pdom-500bp-2.fq
/iplant/home/standage/CGS/pdom-3kb-1.fq
/iplant/home/standage/CGS/pdom-3kb-2.fq
asmbleval.pl
script?Download this file to your Mason home directory and then follow the following instructions. The setup script will ask you several questions about what you want to build. Answer no to all the questions except Save configuration and Start iRODS build.
cd ~ tar xzf irods3.1.tgz cd iRODS ./irodssetup export PATH=~/iRODS/clients/icommands/bin:$PATH # You'll probably want to add this last command to your ''.bashrc'' file.
Using the example config from the SOAPdenovo website, I created a config file for these data sets.
[LIB] #maximal read length max_rd_len=100 #average insert size avg_ins=500 #if sequence needs to be reversed reverse_seq=0 #in which part(s) the reads are used asm_flags=3 #in which order the reads are used while scaffolding rank=1 #fastq files q1=/path/to/your/workdir/pdom-500bp-1.fq q2=/path/to/your/workdir/pdom-500bp-2.fq [LIB] max_rd_len=35 avg_ins=3000 reverse_seq=0 asm_flags=3 rank=2 q1=/path/to/your/workdir/pdom-3kb-1.fq q2=/path/to/your/workdir/pdom-3kb-2.fq
With your config file prepped, you could run SOAPdenovo as follows.
module load soapdenovo2 time SOAPdenovo-63mer all -s soap.cnf -K 27 -o output-dir
However, remember that we cannot run jobs on the interactive node. We have to create a launch script and submit it to the queue for execution. Here is an example launch script.
#!/bin/bash #PBS -N AssemblyExercise1 #PBS -l nodes=1:ppn=32,walltime=4:00:00,vmem=500gb #PBS -k oe #PBS -q shared #PBS -m bea #PBS -M youremail@indiana.edu module load soapdenovo2 WORKDIR=/path/to/your/workdir/ SOAPdenovo-63mer all -s $WORKDIR/soap.cnf -K 27 -p 32 -o $WORKDIR/output-dir
With your config file and launch script in place, you can submit your job like so.
qsub run-soapdenovo.sh
In class we discussed using Mason's interactive queue to troubleshoot and test your jobs.
This is often a good idea for any task you need to do, not just this exercise.
Once you have your launch script, data files, and any other configuration files in place, you can use qsub
to request an interactive session.
You probably want to run the command with a tmux terminal–otherwise you will have to keep your terminal open until the session begins.
If you use the command below, Mason will send you an email when your session begins, so you can close your terminal and log out if you need to.
Usually you want to request an interactive session with much small resource requests so that you will not have to wait as long in the queue. Instead of requesting all 32 processors for a node, request 4 or 8 or 16. Also, you probably won't need more than 30-60 minutes to troubleshoot, so requesting a shorter walltime will also reduce your waiting time in the queue.
qsub -I -q shared -l nodes=1:ppn=8,vmem=64gb,walltime=1:00:00 -M youremail@indiana.edu -m abe
Once the session begins, you can go to your working directory and run your launch script. If the SOAPdenovo command in your launch script uses 32 threads but you only requested 8 processors for your interactive session, make sure to change your thread count to 8 before running the script.
bash run-soapdenovo.sh
If there is a problem with your launch script or your config file, the command will probably fail right away. You can then use the error message to try to fix the problem and try again. Once the command runs for 5-10 minutes without an error, you can be confident that everything is probably OK. You can then use ctrl-c to cancel the job, type “exit” to close your interactive session, and then submit your job to the queue with qsub.