Genome assembly from NGS data

The sequence of an individual's genomic DNA contains a wealth of biological information about that individual and the species to which it belongs. Procuring a genome sequence is an important step to understanding the molecular, genetic, and evolutionary basis of life for a particular organism and across organisms. It would be ideal if we could observe the entire sequence of a chromosome in a single experiment, but we cannot due to limitations with current technologies, as we discussed in previous topics. Therefore, procuring a genome sequence requires us to observe many small sequences sampled randomly from throughout the genome, and reconstructing the complete genome sequence from these short, overlapping reads. The nature of this problem is inherently difficult, and the amounts of data involved with genome sequencing projects pose non-trivial technical concerns for assembling genome sequence from raw short read data.

The genome assembly problem is a topic of very active research within the genomics and bioinformatics communities. Many genome assemblers exist, each with different strengths and weaknesses, and none of which are perfect. We will get some practical experience using different assembly programs to assemble both prokaryotic and eukaryotic genomes. We'll also take some time discuss the current state of genome assembly in general, various issues current genome assembly programs have, and how we can evaluate the relative quality of two genome assemblies.

Common assembly programs

Here is a list of some common assembly programs. Some are pre-installed on your iPlant virtual machines. Many are pre-installed on Mason. Feel free to add to this list.

The nucleotid.es resource provides a catalog of assemblers, and benchmarks a variety of performance metrics on a variety of microbial data sets. If you work primarily with microbes, this website should be a very useful resource.

Readings

There is no reading assignment for Mon Feb 23rd, but for Wed Feb 25th please come to class ready to discuss the following papers.

cgss15/genome-assembly/start.txt · Last modified: 2015/02/20 16:10 by standage
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki