Fundamental Models and Algorithms in Bioinformatics

INFO I519 (= I617) Fall 2023


Course description

INFO I519 FMAB

Class times and locations

Mon, Wed 3:00p - 4:15p (Credits: 3.0); Swain West (SW) 217
Computer Laboratory, Fri 9:45a - 11:00a; Radio-TV (TV) 186

Tentative schedule

HTML version of this schedule

DAYDATELECTURETOPICLECTURER
Mon Aug 21 1.1 Orientation

Topics in bioinformatics, scope of class, and resources

Catch-up for non-life scientists: An introduction to DNA (Khan Academy)

Module 1: Bioinformatics resources and workspaces



National Center for Biotechnology Information (NCBI), a great starting point for "anything" bioinformatics

Sequence gazing: ... the famous TATA-box

Mapping biological question onto computational problems: The Modeling Spiral

Precision matters: Webb Telescope
Aim high: AlphaFold

Ubuntu on Windows: ... a quick way to get to a command line terminal

Virtual Machines: VirtualBox VMware

Volker Brendel
Wed Aug 23 1.2 Basic bioinformatics toolkit acquisition (Part I)

Entrez Direct

Guide: NCBI E-Utilities
... slides
Background notes on: Installing NCBI EDirect
EDirect Sample Code Explained

Volker Brendel
Fri Aug 25 L1.1 Computer Laboratory: Linux Basics

Basic UNIX shell tutorial
The UNIX Shell
The UNIX Shell: Summary of Basic Commands
vi(m) editor tutorial

AI
Mon Aug 28 1.3 Mapping biological question onto computational problems:
The modeling spiral

Polistes dominula proteins
SeqKit
SeqKit Tutorial

Entrez Direct
NCBI Sequence search fields
MEDLINE/PubMed search fields
Volker Brendel
Wed Aug 30 1.4 IU Supercomputing for Everyone
IU Carbonate
IU Research Desktop
Apptainer
Apptainer User Guide
Volker Brendel
Fri Sept 1 L1.2 Computer Laboratory: Basic bioinformatics toolkit acquisition (Part II):

Getting code: git GitHub Brendel Group on GitHub

GitHub HowTo
git: working with branches
CodeFMAB
AI
Mon Sept 4 2.1

Module 2: Sequence Models and Spaces



Labor Day

no class
Wed Sept 6 2.2 Simple Sequence Models Volker Brendel
Fri Sept 8 L2.1 Computer Laboratory: Python Basics

Python tutorial
Python Scripting for Computational Molecular Science
Python for Everybody PY4E Lessons
J. Sundnes: Introduction to Scientific Programming with Python

Style matters ...
Learning Scientific Programming with Python - an intriguing resource

AI
Mon Sept 11 2.3 Markov Models for Sequences

Volker Brendel
Wed Sept 13 2.4 Applications of Markov Models

GENMARK

Some resources for statistics background review:
Event probabilities
Union of events
Bayes Theorem
Probability distribution
Expected value
Review: Conditional Probability
Sensitity, specificity, and all that

Stat225 at Purdue - nice slides by Dr. Whitney Huang


Volker Brendel
Fri Sept 15 L2.1 Computer Laboratory:

Coding random sequence generation and pattern probability calculations
AI
Mon Sept 18 3.1

Module 3: Pairwise Sequence Alignment



Models for Pairwise Sequence Alignment
Representations of alignments
NW alignments
Number of alignments (nNW algorithm)
Volker Brendel
Wed Sept 20 3.2 PWSA: gNW algorithm. Volker Brendel
Fri Sept 22 L3.1 Computer Laboratory: Coding the nNW and gNW algorithms AI
Mon Sept 25 3.3 Algorithms for Pairwise Sequence Alignment: gSW, lSW, and other algorithms

How Do We Compare Biological Sequences?
(from Bioinformatics: An Active Learning Approach)
Volker Brendel
Wed Sept 27 3.4 PWSA: Review and extensions.


Volker Brendel
Fri Sept 20 L3.2 Computer Laboratory: Coding PWSA algorithms. AI
Mon Oct 2 4.1

Module 4: Sequence Analysis with Scores





Sequence Analysis with Scores: Theory
Volker Brendel
Wed Oct 4 4.2 Sequence analysis with scores: Substitution scoring matrices
Biological Sequence Analysis I (Lecturer: Dr. Andy Baxevanis)

NCBI BLAST

slides for Sequence Analysis I presentation
handout for Sequence Analysis I presentation

NCBI BLAST download site
Volker Brendel
Fri Oct 6 Computer Laboratory: AI
Mon Oct 9 4.3 Sequence Analysis with Scores: Applications

Chance and Statistical Significance in Protein and DNA Sequence Analysis

Expected Values I
Expected Values II
Volker Brendel
Wed Oct 11 4.4 Brief review and outlook

Volker Brendel
Fri Oct 13 L4.2 Fall Break: no class
Mon Oct 16 5.1

Module 5: Hidden Markov Models





Hidden Markov Models: Motivation

Rabiner's Tutorial
Volker Brendel
Wed Oct 18 5.2 Hidden Markov Models: Algorithms

Hidden Markov Models
(from Bioinformatics: An Active Learning Approach)
Volker Brendel
Fri Oct 20 L5.1 Computer Laboratory: Coding and applications of HMM algorithms AI
Mon Oct 23 5.3 Hidden Markov Models: Applications

Application examples
GENSCAN
Profile Hidden Markov Models
TagDust Tagdust2 on github

Sequence motifs: models
Biological Sequence Analysis II (Lecturer: Dr. Andy Baxevanis)
InterPro
Volker Brendel
Wed Oct 25 5.4 Sequence motifs: algorithms

HMMER
The MEME Suite
HOMER

Volker Brendel
Fri Oct 27 L5.2 Computer Laboratory: Implementation of HMM algorithms AI
Mon Oct 30 6.1

Module 6: Basic Concepts of Molecular Phylogenetics





The Molecular Clock
Linus Pauling

Molecular Phylogeny: Models

The powers and pitfalls of parsimony
Volker Brendel
Wed Nov 1 6.2 Parsimony and Distance Matrix Methods

Lectures on molecular phylogeny
(from Bioinformatics: An Active Learning Approach)
Volker Brendel
Fri Nov 4 L6.1 Computer Laboratory: Molecular Phylogeny, applications AI
Mon Nov 6 6.3 Molecular Phylogeny: Applications

... some proteins to analyze

MEGA
A list of Web-servers for molecular phylogeny analyses

phylogeny.fr
Volker Brendel
Wed Nov 8 6.4 Brief review and outlook

A first look at Genome Assembly
Volker Brendel
Fri Nov 10 L7.1 Exploration of genome sequencing by simulation and genome assembly AI
Mon Nov 13 7.1

Module 7: Genome Assembly and Annotation





DNA Sequencing
Sanger sequencing
Illumina sequencing
nanopore sequencing

Genome Resources
The Genomic Landscape circa 2016 (Lecturer: Dr. Andy Green)

Assembly basics
NCBI Assembly Help
NCBI Genome

Genome Assembly
Introduction to genome sequencing
How do we assemble genomes?
from Bioinformatics Algorithms: An Active Learning Approach

How to set up, execute, and document a project
Volker Brendel
Wed Nov 15 7.2 Guest Lecture: Dr. Ryan Bracewell: A nematode sequencing project

Genome Annotation

Prokaryotic gene finding:
GeneMark.hmm prokaryotic
GeneMark article

Eukaryotic gene finding:
GeneMark.hmm eukaryotic
AUGUSTUS
GENSCAN; paper see here

Sensitivity, specificity, and all that
sample paper

How to evaluate gene structure prediction accuracy

Project Assignment Posted
Volker Brendel
Fri Nov 17 L7.1 Computer Laboratory:

wgsim - read generator
SoapDeNovo2 - assembler
AI
Mon Nov 21 THANKSGIVING BREAK n/a
Wed Nov 23 THANKSGIVING BREAK n/a
Fri Nov 25 THANKSGIVING BREAK n/a
Mon Nov 27 8.1

Module 8: Genetic Variation





Genetic variation
Interpreting an individual genome

NCBI dbSNP    How To
NCBI dbVar    How To
1000 Genomes Project    Nature 491:56
Example: rs1131769

file format specifications:

Variant Call Format (VCF)
Sequence Alignment Map format (SAM)
SAM flags explained
Pileup format (used by samtools)


Relevant code:

NCBI SRA Toolkit
samtools
bwa
freebayes
Volker Brendel
Wed Nov 29 9.1

Module 9: Protein Structure





peptide bond
Ramachandran plot ... very nice visualization thereof
(thanks to Prof. Eric Martz)

PDB: What is a protein?
PDB: How enzymes work
HIV I
HIV II

Guide to PDB

PDB Molecule of the Month
NCBI Protein

Secondary Structure

2struct server

Jpred - secondary structure prediction
SPIDER3

foldit
AlphaFold
Volker Brendel
Fri Dec 1 L7.2

Project Assignment Due
AI
Mon Dec 4

Review Sessions



Sequence Alignment

Bioinformatics Algorithms presentation
Volker Brendel
Wed Dec 6 Gene Expression Analyses
from Bioinformatics Algorithms: An Active Learning Approach

MIT Lecture: Gene Regulatory Networks
MIT Data Science: Clustering
Volker Brendel
Fri Dec 8 Office hour AI
Wed Dec 13 7:20pm Final Project due Students