Fundamental Models and Algorithms in Bioinformatics
INFO I519 (= I617) Fall 2019
Time & Location: Mon, Wed 2:30p - 3:45p (Credits: 3.0); Informatics West (I) 232;
Computer Laboratory, Fri 9:30a - 10:45a; Geological Sciences (GY) 226
Instructor: Volker Brendel (205C Simon Hall);
Assistant Instructor: Murat Öztürk (205A Simon Hall)
Email:
VB, vbrendel@indiana.edu;
MÖ, muroztur@iu.edu
WWW:
https://brendelgroup.org/
Office Hours:
Mon, Wed after class and by appointment.
Grades: will be determined as described below.
Schedule:
https://brendelgroup.org/teaching/2019/I519F19schedule.php
Computing Resources:
You will have access to networked computer terminals during laboratory classes and will need such basic access outside of the classroom for assignments. An initial goal of the course will be to instruct you in setting up a laptop for class work and bioinformatics work in general.
Synopsis
Biology has become one of the primary application domains of computer science and informatics approaches. The term "Bioinformatics" covers a wide spectrum of data management and processing associated with large-scale, high-throughput biological data generation. This class will focus on biomolecular sequence data (DNA and protein) that underpin much of modern biology, including for example genetics; ecology, evolution, and population biology; and structural biology. Applications in medicine and biotechnology are changing our societies and world. Many of the data analysis problems in the field have been mapped to tractable mathematical models amenable to algorithmic solutions. The course will cover fundamental models and algorithms in bioinformatics, with emphasis on the general principles involved in the modeling and algorithmic approaches. The course should be of interest to you if one or more of the following apply to you: (1) You are curious and would like to learn about a "hot topic"; (2) You want to expand your range of options for post-graduate school; (3) You want to become or stay relevant in life science research in academia or industry; (4) You are considering a high-paying job in the biotechnology sector.
Prerequisites
This class is directed primarily at first and second year graduate students in Biology, Informatics, Computer Science, or Data Science; students of Mathematics, Statistics, and other fields may also find the course accessible and of interest. Although there are no formal prerequisites for the course, some basic calculus and statistics knowledge will be necessary and will be reviewed as required by students' background. Relevant biological concepts will be introduced as needed. Some classes will be taught as a computer lab. Students will need to be or become familiar with basic computer operational skills, including some programming (scripting) language knowledge. Class messages and materials, including assignments, will be shared through our Canvas site in addition to these web pages, and students are required to regularly check these relevant communication channels. IU is committed to Creating a Positive Environment for teaching and learning. If you have any concerns or suggestions, please let the instructor know.
Learning Goals
The course seeks to provide students with a solid foundation for understanding models and algorithms in bioinformatics and to impart the basic practical skills to work on bioinformatics projects. Specific learning goals cover the following topics: (1) Basic bioinformatics data skills: Linux, scripting, R, virtual machines, containers. (2) Modeling biomolecular sequences: sequence probability spaces, principles of feature significance evaluation. (3) Pairwise sequence alignment: alignment types, representations, scoring, algorithms for determining optimal alignments. (4) Multiple sequence alignment and database searches: algorithms, index structures, statistical evaluation. (5) Basic models and approaches to molecular phylogeny: molecular clock, bifurcating trees, parsimony and distance matrix methods. (6) Hidden Markov Models for gene finding, spliced alignment, and protein motif identification: basic algorithms and applications. (7) Genome assembly, genome variation, and gene expression: introduction to problems, algorithms, and data analysis.
Assignments
The class material will be organized into topics (chapters), each occupying several class periods and laboratory sessions. Relevant reading material will be assigned for homework study. Additional assignments will involve computational work. There will be a total of four written tests (quizzes) spread throughout the term, which are designed to provide feedback on progress towards the stated learning goals. Each quiz will count a maximum of 20 points towards the grade in the class (see next section).
Grading
Grades will be based on a 100-point scale, derived as the total of the three best scores from the four quizzes (for a maximum total of 60 points), the homework score (20 points maximum), and the final examination score (20 points maximum). Absences during quizzes or the final will be counted as zeros. A rough translation into letter grades is: >=95, A+; >=90, A; >=85, A-; >=80, B+; >=75, B; >=70, B-; and so forth.
Text book
The class is based on a draft textbook "Fundamental Models and Algorithms in Bioinformatics", V. Brendel (Indiana University) & K. Dorman (Iowa State University). Excerpts of the draft will be made available to the students as PDFs. We will make use of engaging, beautifully produced videos accompanying "Bioinformatics Algorithms - An Active Learning Approach" (Active Learning Publishers LLC) by Phillip Compeau & Pavel Pevzner. Students wishing to explore the biological background of class topics will find "Genomics and Personalized Medicine - What Everyone Needs to Know" (Oxford University Press) by Michael Snyder a concise, stimulating guide. For practical bioinformatics skills we strongly recommend "Bioinformatics Data Skills" by Vince Buffalo (O'Reilly); topics and examples from this book will be explored in the Computer Laboratory part of the course.