Fundamental Models and Algorithms in Bioinformatics

INFO I519 (= I617) Fall 2021

**Time & Location: **Mon, Wed 3:15p - 4:30p (Credits: 3.0); Lindley Hall (LH) 025

Computer Laboratory, Fri 9:25a - 10:40a; Radio-TV (TV) 186

**Instructor: **Volker Brendel (205C Simon Hall);
**Assistant Instructor:** Shivani Vogiral (TBA)

**Email: **
VB, vbrendel@indiana.edu;
SV, svogiral@iu.edu

**WWW: **
https://brendelgroup.org/

**Virtual Office Hours: **
Mon, Wed after class and by appointment.

**Grades: **will be determined as described below.

**Schedule: **
https://brendelgroup.org/teaching/2021/I519F21schedule.php

**Computing Resources: **
You will need to bring a laptop to class to participate in exercises, group activities, and scheduled quizzes. An initial goal of the course will be to instruct you in setting up a Linux environment on your laptop for class work and bioinformatics work in general. Access to IU Linux and HPC resource will be reviewed as needed.

Synopsis

Biology has become one of the primary application domains of computer science and informatics approaches. The term "Bioinformatics" covers a wide spectrum of data management and processing associated with large-scale, high-throughput biological data generation. This class will focus on biomolecular sequence data (DNA and protein) that underpin much of modern biology, including for example genetics; ecology, evolution, and population biology; and structural biology. Applications in medicine and biotechnology are changing our societies and world. Many of the data analysis problems in the field have been mapped to tractable mathematical models amenable to algorithmic solutions. The course will cover fundamental models and algorithms in bioinformatics, with emphasis on the general principles involved in the modeling and algorithmic approaches.
**The course should be of interest to you if one or more of the following apply to you: (1) You are curious and would like to learn about a "hot topic"; (2) You want to expand your range of options for post-graduate school; (3) You want to become or stay relevant in life science research in academia or industry; (4) You are considering a high-paying job in the biotechnology sector.**

Prerequisites

This class is directed primarily at first and second year graduate students in Biology, Informatics, Computer Science, or Data Science; students of Mathematics, Statistics, and other fields may also find the course accessible and of interest. Although there are no formal prerequisites for the course, some basic calculus and statistics knowledge will be necessary and will be reviewed as required by students' background. Relevant biological concepts will be introduced as needed. Some classes will be taught as a computer lab. Students will need to be or become familiar with basic computer operational skills, including some programming (scripting) language knowledge. Class messages and materials, including assignments, will be shared through our Canvas site in addition to these web pages, and students are required to regularly check these relevant communication channels. IU is committed to a positive environment for teaching and learning. If you have any concerns or suggestions, please let the instructor know.

Learning Goals

The course seeks to provide students with a solid foundation for understanding models and algorithms in bioinformatics and to impart the basic practical skills to work on bioinformatics projects. Specific learning goals cover the following topics: (1) Basic bioinformatics data skills: Linux, scripting, R, virtual machines, containers. (2) Modeling biomolecular sequences: sequence probability spaces, principles of feature significance evaluation. (3) Pairwise sequence alignment: alignment types, representations, scoring, algorithms for determining optimal alignments. (4) Multiple sequence alignment and database searches: algorithms, index structures, statistical evaluation. (5) Basic models and approaches to molecular phylogeny: molecular clock, bifurcating trees, parsimony and distance matrix methods. (6) Hidden Markov Models for gene finding, spliced alignment, and protein motif identification: basic algorithms and applications. (7) Genome assembly, genome variation, and gene expression: introduction to problems, algorithms, and data analysis.

Assignments and Grading

Grades will be based on a 100-point scale, derived as the total number of points gained from quizzes (for a maximum of 60 points), a project assignment (20 points maximum), and the final examination score (20 points maximum). A **rough** translation into letter grades is: >=95, A+; >=90, A; >=85, A-; >=80, B+; >=75, B; >=70, B-; and so forth.
Quizzes will be given at the end of each of the 6 basic course modules and will be administered via Canvas. The quizzes will be given either in class or as a homework assignment (to be announced for each quiz as the class schedule develops). Each quiz will count a maximum of 15 points towards your course total. However, we will only count the best 4 scores from the 6 quizzes. Thus, your total quiz score will be at most 60 points. This arrangement allows students to miss 2 of the 6 quizzes for any reason. Other accommodations for absences will only be made in exceptional circumstances.
A project assignment will be posted in the week before Thanksgiving Break and will be due before Reflection Week. The assignment is meant to give you an opportunity to develop a small project that will tie together concepts and tools learned in the class.

Text book

The class is based on a draft textbook "Fundamental Models and Algorithms in Bioinformatics", V. Brendel (Indiana University) & K. Dorman (Iowa State University). Excerpts of the draft will be made available to the students as PDFs. We will make use of engaging, beautifully produced videos accompanying "Bioinformatics Algorithms - An Active Learning Approach" (Active Learning Publishers LLC) by Phillip Compeau & Pavel Pevzner. Other materials will be posted on the course web pages or our Canvas site. Students wishing to explore the biological background of class topics will find "Genomics and Personalized Medicine - What Everyone Needs to Know" (Oxford University Press) by Michael Snyder a concise, stimulating guide. For practical bioinformatics skills we strongly recommend "Bioinformatics Data Skills" by Vince Buffalo (O'Reilly); topics and examples from this book will be explored in the Computer Laboratory part of the course.