Fundamental Models and Algorithms in Bioinformatics

INFO I519 (= I617) Fall 2022


Course description

INFO I519 FMAB

Class times and locations

Mon, Wed 3:00p - 4:15p (Credits: 3.0); Biology Building (JH) 001
Computer Laboratory, Fri 9:45a - 11:00a; Radio-TV (TV) 186

Tentative schedule

HTML version of this schedule

DAYDATELECTURETOPICLECTURER
Mon Aug 22 1.1 Orientation

Topics in bioinformatics, scope of class, and resources

Catch-up for non-life scientists: An introduction to DNA (Khan Academy)

Module 1: Bioinformatics resources and workspaces



National Center for Biotechnology Information (NCBI), a great starting point for "anything" bioinformatics

Sequence gazing: ... the famous TATA-box

Mapping biological question onto computational problems: The Modeling Spiral

Precision matters: Webb Telescope
Aim high: AlphaFold

Ubuntu on Windows: ... a quick way to get to a command line terminal

Virtual Machines: VirtualBox VMware

Volker Brendel
Wed Aug 24 1.2 Basic bioinformatics toolkit acquisition (Part I)

Entrez Direct

Guide: NCBI E-Utilities
... slides
Background notes on: Installing NCBI EDirect
EDirect Sample Code Explained

Volker Brendel
Fri Aug 26 L1.1 Computer Laboratory: Linux Basics

Basic UNIX shell tutorial
The UNIX Shell
The UNIX Shell: Summary of Basic Commands
vi(m) editor tutorial

AI
Mon Aug 29 1.3 Mapping biological question onto computational problems:
The modeling spiral

Polistes dominula proteins
SeqKit
SeqKit Tutorial

Entrez Direct
NCBI Sequence search fields
MEDLINE/PubMed search fields
Volker Brendel
Wed Aug 31 1.4 Basic statistical questions in bioinformatics


Quiz I
Volker Brendel
Fri Sept 2 L1.2 Computer Laboratory: Basic bioinformatics toolkit acquisition (Part II):

Getting code: git GitHub Brendel Group on GitHub

GitHub HowTo
git: working with branches
CodeFMAB
AI
Mon Sept 5 2.1

Module 2: Sequence Models and Spaces



Labor Day

no class
Wed Sept 7 2.2 Simple Sequence Models Volker Brendel
Fri Sept 9 L2.1 Computer Laboratory: Python Basics

Python tutorial
Python Scripting for Computational Molecular Science
Python for Everybody PY4E Lessons
J. Sundnes: Introduction to Scientific Programming with Python

Style matters ...
Learning Scientific Programming with Python - an intriguing resource

AI
Mon Sept 12 2.3 Markov Models for Sequences

Volker Brendel
Wed Sept 14 2.4 Applications of Markov Models

GENMARK

Some resources for statistics background review:
Event probabilities
Union of events
Bayes Theorem
Probability distribution
Expected value
Review: Conditional Probability
Sensitity, specificity, and all that

Stat225 at Purdue - nice slides by Dr. Whitney Huang


Quiz II
Volker Brendel
Fri Sept 16 L2.1 Computer Laboratory:

Coding random sequence generation and pattern probability calculations
AI
Mon Sept 19 3.1

Module 3: Pairwise Sequence Alignment



Models for Pairwise Sequence Alignment
Representations of alignments
NW alignments
Number of alignments (nNW algorithm)
Volker Brendel
Wed Sept 21 3.2 PWSA: gNW algorithm. Volker Brendel
Fri Sept 23 L3.1 Computer Laboratory: Coding the nNW and gNW algorithms AI
Mon Sept 26 3.3 Algorithms for Pairwise Sequence Alignment: gSW, lSW, and other algorithms

How Do We Compare Biological Sequences?
(from Bioinformatics: An Active Learning Approach)
Volker Brendel
Wed Sept 28 3.4 PWSA: Review and extensions.

Quiz IIIpartA in class
Quiz IIIpartB assigned (due: Thur, Sept 29, 5pm)
Volker Brendel
Fri Sept 30 L3.2 Computer Laboratory: Coding PWSA algorithms. AI
Mon Oct 3 4.1

Module 4: Sequence Analysis with Scores





Sequence Analysis with Scores: Theory
Volker Brendel
Wed Oct 5 4.2 Sequence analysis with scores: Substitution scoring matrices
Biological Sequence Analysis I (Lecturer: Dr. Andy Baxevanis)

NCBI BLAST

slides for Sequence Analysis I presentation
handout for Sequence Analysis I presentation

NCBI BLAST download site
Volker Brendel
Fri Oct 7 Computer Laboratory: AI
Mon Oct 10 4.3 Sequence Analysis with Scores: Applications

Chance and Statistical Significance in Protein and DNA Sequence Analysis

Expected Values I
Expected Values II
Volker Brendel
Wed Oct 12 4.4 Brief review and outlook

Quiz IV
Volker Brendel
Fri Oct 14 L4.2 Fall Break: no class
Mon Oct 17 5.1

Module 5: Hidden Markov Models





Hidden Markov Models: Motivation

Rabiner's Tutorial
Volker Brendel
Wed Oct 19 5.2 Hidden Markov Models: Algorithms

Hidden Markov Models
(from Bioinformatics: An Active Learning Approach)
Volker Brendel
Fri Oct 21 L5.1 Computer Laboratory: Coding and applications of HMM algorithms AI
Mon Oct 24 5.3 Hidden Markov Models: Applications

Application examples
GENSCAN
Profile Hidden Markov Models
TagDust Tagdust2 on github

Sequence motifs: models
Biological Sequence Analysis II (Lecturer: Dr. Andy Baxevanis)
InterPro
Volker Brendel
Wed Oct 26 5.4 Sequence motifs: algorithms

The MEME Suite
HOMER

Quiz V
Volker Brendel
Fri Oct 28 L5.2 Computer Laboratory: Implementation of HMM algorithms AI
Mon Oct 31 6.1

Module 6: Basic Concepts of Molecular Phylogenetics





Molecular Phylogeny: Models

The powers and pitfalls of parsimony
Volker Brendel
Wed Nov 2 6.2 Parsimony and Distance Matrix Methods

Lectures on molecular phylogeny
(from Bioinformatics: An Active Learning Approach)
Volker Brendel
Fri Nov 4 L6.1 Computer Laboratory: Molecular Phylogeny, applications AI
Mon Nov 7 6.3 Molecular Phylogeny: Applications

a typical paper ...

MEGA
A list of Web-servers for molecular phylogeny analyses

phylogeny.fr
Volker Brendel
Wed Nov 9 6.4 Brief review and outlook

Quiz VI
Volker Brendel
Fri Nov 11 L6.2 Literature review:
GelmanPNAS2021.pdf
FreschlinCOBIT2022.pdf
preprint

Optional: Seminar by Philip Romero
3:00 PM Chemistry CH033
Machine learning to navigate sequence-function landscapes for protein engineering
AI
Students
Mon Nov 14 7.1

Module 7: Genome Assembly and Annotation





DNA Sequencing
Sanger sequencing
Illumina sequencing
nanopore sequencing

Genome Resources
The Genomic Landscape circa 2016 (Lecturer: Dr. Andy Green)

Assembly basics
NCBI Assembly Help
NCBI Genome

Genome Assembly
Introduction to genome sequencing
How do we assemble genomes?
from Bioinformatics Algorithms: An Active Learning Approach

Volker Brendel
Wed Nov 16 7.2 Genome Annotation

Prokaryotic gene finding:
GeneMark.hmm prokaryotic
GeneMark article

Eukaryotic gene finding:
GeneMark.hmm eukaryotic
AUGUSTUS
GENSCAN; paper see here

Sensitivity, specificity, and all that
sample paper

How to evaluate gene structure prediction accuracy

Project Assignment Posted
Volker Brendel
Fri Nov 18 L7.1 Computer Laboratory:

wgsim - read generator
SoapDeNovo2 - assembler
AI
Mon Nov 22 THANKSGIVING BREAK n/a
Wed Nov 24 THANKSGIVING BREAK n/a
Fri Nov 26 THANKSGIVING BREAK n/a
Mon Nov 28 8.1

Module 8: Genetic Variation





Genetic variation
Interpreting an individual genome

NCBI dbSNP    How To
NCBI dbVar    How To
1000 Genomes Project    Nature 491:56
Example: rs1131769

file format specifications:

Variant Call Format (VCF)
Sequence Alignment Map format (SAM)
SAM flags explained
Pileup format (used by samtools)


Relevant code:

NCBI SRA Toolkit
samtools
bwa
freebayes
Volker Brendel
Wed Nov 30 9.1

Module 9: Protein Structure





peptide bond
Ramachandran plot ... very nice visualization thereof
(thanks to Prof. Eric Martz)

PDB: What is a protein?
PDB: How enzymes work
HIV I
HIV II

Guide to PDB

PDB Molecule of the Month
NCBI Protein

Secondary Structure

2struct server

Jpred - secondary structure prediction
SPIDER3

foldit
AlphaFold
Volker Brendel
Fri Dec 2 L7.2

Project Assignment Due
AI
Mon Dec 5

Review Sessions



Sequence Alignment

Bioinformatics Algorithms presentation
Volker Brendel
Wed Dec 7 Gene Expression Analyses
from Bioinformatics Algorithms: An Active Learning Approach

MIT Lecture: Gene Regulatory Networks
MIT Data Science: Clustering
Volker Brendel
Fri Dec 9 Zoom office hour AI
Mon Dec 12 12:40pm - 2:40pm Final Examination Students