Assignment: prokaryotic genome assembly

The exercise for this topic introduced you to genome assembly, using a nice clean E. coli data set and the Velvet assembler. For this exercise you will continue exploring prokaryotic genome assembly, but with the following guidelines.

  • Branch out and try different assemblers. Several are pre-installed on your VMs (as documented here), or if there is another you're interested in trying out, go for it! Make sure to read the assembler's manual to configure the software correctly.
  • The data set we used for the exercise had already been subjected to quality control. For this assignment, think about a prokaryotic organism that interests you and search the NCBI SRA to see if there is (genomic!) NGS data available for that organism. Alternatively, if you or your colleagues have a data set you would like to analyze you are welcome to do that. If you're lacking inspiration, consider this data set from Sinorhizobium meliloti.
  • We didn't really do any quality control for the exercise because the data was already so clean. Most of the time, however, quality control is crucial to getting an accurate assembly. Make sure to do quality control on your data set: adapter trimming (if needed), perhaps quality trimming, and error correction. Then, when you do the actual assembly, try assembling the genome twice: once with the raw data, and once with the cleaned up data.
  • Make liberal use of the assessment tools we've discussed: the assembly evaluation script, nucmer/mummerplot, and CGAL. Use these to assess the relative quality of your assemblies. What effect does skipping quality control have on your assembly? How does the assembly change for different values of k?

It's important to keep a record of the technical steps you take, but as always what's most important is framing the intellectual focus of your assignment. What are you trying to do? Why is this important? Before you even get started, how will you evaluate the results? Once you have your results, how do you interpret them? These are the types of questions that make the difference between doing science and doing technical work, so make sure to spend enough time developing the intellectual aspect of the work before and after you do the technical parts.

cgss15/genome-assembly/assignment-prok.txt · Last modified: 2015/02/27 14:40 by standage
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki