Digital Biology: A survey of topics in bioinformatics and functional genomics

BIOL-L/MLS-M 388/BIOL-Z 620 Spring 2025


Course description

BIOL L388/Z620 DB

Class times and locations

Tues, Thur 11:10a - 12:25p (Credits: 3.0); Biology Building (JH) 001

Tentative schedule (class notes can be viewed with

HTML version of this schedule

DAYDATELECTURETOPICLECTURER
Tues Jan 14 0.1

Orientation



What is Digital Biology?
Topics in bioinformatics, scope of class, and resources

Motivation

COVID-19 pandemic and a precedence: The Fastest Outbreak - connection to bioinformatics work


Bioinformatics Databases and Computing Resources
National Center for Biotechnology Information (NCBI), a great starting point for "anything" bioinformatics

Mapping biological question onto computational problems:
The modeling spiral
Sequence gazing: the famous TATA-box
AlphaFold
Volker Brendel
Thur Jan 16 1.1

Module 1: Bioinformatics resources and workspaces




Ubuntu on Windows: ... a quick way to get to a command line terminal

Virtual Machines: VirtualBox

Linux Basics

Basic UNIX shell tutorial
Command-line bootcamp

The UNIX Shell
The UNIX Shell: Summary of Basic Commands
nano editor HowTo
vi(m) editor tutorial

Volker Brendel
Tues Jan 21 1.2 Customizing your Linux work space
Getting code with git

Working with NCBI data

Homework Assignment 1 posted. Due: Jan 27, 4:00pm
Volker Brendel
Thur Jan 23 1.3 Command line access to NCBI data: EDirect

Entrez Direct
NCBI Sequence search fields
MEDLINE/PubMed search fields
Volker Brendel
Tues Jan 28 1.4 Review: Command-line bootcamp Volker Brendel
Thur Jan 30 1.5 Review: Statistical evaluation of sequence features
Using generic Linux commands to process sequence data (at least for previews or consistency checks ...)

Homework Assignment 2 posted. Due: Feb 3, 4:00pm
Volker Brendel
Tues Feb 4 2.1

Module 2: Pairwise Sequence Alignment




Motivation

How Do We Compare Biological Sequences?
from Bioinformatics Algorithms: An Active Learning Approach
Volker Brendel
Thur Feb 6 2.2 PWSA: Definition and representation of "alignments" Volker Brendel
Tues Feb 11 2.3 Global alignment (Needleman-Wunsch)
How to calculate the number of NW alignments

Homework Assignment 3 posted. Due: Feb 17, 4:00pm
Volker Brendel
Thur Feb 13 2.4 Scoring alignments
How to calculate the optimal alignment score
(and find an optimal alignment)
PWSA: allowing "double-gaps"
PWSA: local alignment (Smith-Waterman)
Volker Brendel
Tues Feb 18 2.5 Sequence analysis with scores: Concepts and statistical foundations

Homework Assignment 4 posted. Due: Feb 24, 4:00pm
Volker Brendel
Thur Feb 20 2.6 Sequence analysis with scores: Practice

BLAST and Substitution scoring matrices
Biological Sequence Analysis I (Lecturer: Dr. Andy Baxevanis)

Slides for Sequence Analysis I presentation
Handout for Sequence Analysis I presentation
Volker Brendel
Tues Feb 25 3.1

Module 3: Basic Concepts in Molecular Phylogenetics




Molecular Phylogeny: Models

The powers and pitfalls of parsimony
Volker Brendel
Thur Feb 27 3.2 Lectures on molecular phylogeny
(from Bioinformatics: An Active Learning Approach)
Volker Brendel
Tues Mar 4 3.3 Methods I: Parsimony

Homework Assignment 5 posted. Due: Mar 10, 4:00pm
Volker Brendel
Thur Mar 6 3.4 Methods II: Distance matrix methods Volker Brendel
Tues March 11 4.1

Module 4: Hidden Markov Models




Hidden Markov Models: Concepts and Algorithms

Hidden Markov Models
(from Bioinformatics: An Active Learning Approach)

Volker Brendel
Thur March 13 4.2 Hidden Markov Models: Applications

Profile Hidden Markov Models
TagDust Tagdust2 on github
GeneMark.hmm prokaryotic
GENSCAN; paper see here

Background:
GeneMark article
Review: Conditional Probability
Volker Brendel
Tues March 18 Spring Break
Thur March 20 Spring Break
Tues March 25 5.1

Module 5: Genome Assembly and Annotation




Eukaryotic gene finding:

GeneMark
AUGUSTUS

Homework Assignment 6 posted. Due: Mar 31, 4:00pm
Volker Brendel
Thur March 27 5.2 Genome Annotation: Evaluation

Sensitivity, specificity, and all that
sample paper

How to evaluate gene structure prediction accuracy
Volker Brendel
Tues April 1 5.3 Genome Sequencing:

Illumina sequencing
nanopore sequencing

Assembly basics
NCBI Assembly Help
Volker Brendel
Thur April 3 5.4 Genome assembly:

How do we assemble genomes?
from Bioinformatics Algorithms: An Active Learning Approach

Volker Brendel
Tues April 8 6.1

Module 6: Genetic Variation




Homework Assignment 7 posted. Due: Apr 14, 4:00pm
Volker Brendel
Thur April 10 6.2 Sequence Alignment Map format (SAM)
SAM flags explained
Pileup format (used by samtools)

Relevant code:

samtools
bwa
Volker Brendel
Tues April 15 6.3 Relevant file format specifications:

Variant Call Format (VCF)

Relevant code:

NCBI SRA Toolkit
freebayes
Volker Brendel
Thur April 17 6.4 Gene Expression Analyses
from Bioinformatics Algorithms: An Active Learning Approach

MIT Lecture: Gene Regulatory Networks
MIT Data Science: Clustering

Final Project posted. Due: in finals week, TBD
Volker Brendel
Tues April 22 7.1

Module 7: Protein Structure




PDB: What is a protein?
PDB: How enzymes work
Guide to PDB

PDB Molecule of the Month
NCBI Protein
Volker Brendel
Thur April 24 7.2 peptide bond
Ramachandran plot ... very nice visualization thereof
(thanks to Prof. Eric Martz)

Secondary Structure

2StruCompare server

Jpred - secondary structure prediction
SPIDER3

HMMer
PFAM

foldit
AlphaFold

Volker Brendel
Tues April 29 8.1 Review: Managing workflows

loops in bash
Volker Brendel
Thur May 1 8.2 Review Volker Brendel
Thur Nay 8 12:20pm Final Project submission due Students