Genome biology is an exciting field driven in large part by rapid technological advances in DNA sequencing technologies. In the not-too-distant past, assembling the genome of a eukaryotic organism required a huge, expensive, coordinated technical effort. New so-called next-generation sequencing (NGS) platforms have drastically reduced the time, effort, and cost associated with genome sequencing, insomuch that it has become a routine laboratory procedure. In addition to assembling genomes for organisms lacking a reference genome, these technologies have also been adapted to measure genome-wide DNA methylation, genome-wide gene expression, genome variation within a population (such as SNPs, indels, and inversions), and a host of other applications.

One primary challenge in analyzing NGS data is the fact that genome biology is such a rapidly evolving field. Sequencing platforms are constantly changing and improving, and new NGS analysis software is published every week. Because of the pace of change, few NGS software packages ever get bundled up into nice apps with an easy-to-use point-and-click interface. Rather, scientists must use these tools through the command line, and this requires some basic computing literacy.

Course objective

The objective of this course is to prepare class participants with the skills necessary to conduct a wide range of computational genomics analyses. This includes locating & installing software, reading software documentation to successfully run the software, and (most importantly) critically evaluating the results of the analysis. By the end of the course, class participants will have experience with the fundamentals of computational genome science: read mapping & abundance estimation, genome & transcriptome assembly, and genome annotation. These skills provide an excellent foundation for whatever specific genomics challenges the student may encounter in the course of their research.

:!: NOTE :!: The objective of this course is NOT to teach students how to program, nor is programming experience a prerequisite. And while we may discuss experimental design briefly, this important topic will not be covered in depth.


In this course we will embrace the learning by doing philosophy. We will incorporate the occasional lecture or reading quiz, but only inasmuch as this helps us to understand the basics of a particular biological question and how we can leverage available genomics tools to address that question. Ideally we'll devote as much class time as possible to actually analyzing data and interpreting results.

This class is typically composed of a fairly even mix of informatics and biology/biochemistry graduate students. We may adjust the course format according to the needs of a particular class, but course participation generally involves the following.

  • assigned reading and reading quizzes to introduce the student to each topic
  • simple exercises to teach the student the mechanics of running a particular software package
  • more in-depth assignments that require the student to invest more time in understanding the problem they are trying to solve and critically assessing their results
  • assignment peer reviews to provide helpful feedback and to learn from each other's successes and failures
  • a final project to explore one particular topic or data set in more depth

See this page for more details on class participation.


Final grades will be based on class participation. Class time will frequently be devoted to working on assignments, but satisfactory completion of most assignments will require additional work outside of class. Students will be expected to fully engage in reading quizzes, class discussions, and peer reviews. A short final presentation on a term project at the end of the course will take the place of a final exam.

Computing resources

Our classroom is outfitted with desktop computers, and students are welcome to bring personal laptops if they would like. However, many class assignments will involve data analysis tasks that are too computationally intensive for a typical desktop or laptop. Also, most of the software we use is designed specifically for UNIX/Linux operating systems and may be incompatible with a Windows or Mac computer. Therefore, exercises and assignments will be completed by connecting to Linux computers located outside of the classroom. See this page for more details.

cgss15/orientation/start.txt · Last modified: 2015/01/12 11:30 by standage
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki