Digital Biology: A survey of topics in bioinformatics and functional genomics

BIOL-L/MLS-M 388/BIOL-Z 620 Spring 2024


Course description

BIOL L388/Z620 DB

Class times and locations

Tues, Thur 11:30a - 12:45p (Credits: 3.0); Biology Building (JH) 001

Tentative schedule (class notes can be viewed with

HTML version of this schedule

DAYDATELECTURETOPICLECTURER
Tues Jan 9 0.1

Orientation



What is Digital Biology?
Topics in bioinformatics, scope of class, and resources

Motivation

COVID-19 pandemic and a precedence: The Fastest Outbreak - connection to bioinformatics work


Bioinformatics Databases and Computing Resources
National Center for Biotechnology Information (NCBI), a great starting point for "anything" bioinformatics

Mapping biological question onto computational problems:
The modeling spiral
Sequence gazing: the famous TATA-box
AlphaFold
Volker Brendel
Thur Jan 11 1.1

Module 1: Bioinformatics resources and workspaces




Ubuntu on Windows: ... a quick way to get to a command line terminal

Virtual Machines: VirtualBox

Linux Basics

Basic UNIX shell tutorial
Command-line bootcamp

The UNIX Shell
The UNIX Shell: Summary of Basic Commands
nano editor HowTo
vi(m) editor tutorial

Volker Brendel
Tues Jan 16 1.2 Customizing your Linux work space
Getting code with git

Working with NCBI data
Volker Brendel
Thur Jan 18 1.3 Command line access to NCBI data: EDirect

Entrez Direct
NCBI Sequence search fields
MEDLINE/PubMed search fields
Volker Brendel
Tues Jan 23 1.4 Review: Command-line bootcamp Volker Brendel
Thur Jan 25 1.5 Review: Statistical evaluation of sequence features
Using generic Linux commands to process sequence data (at least for previews or consistency checks ...)

Home Work Assignment 1 posted. Due: Feb. 1, 6:00pm
Volker Brendel
Tues Jan 30 2.1

Module 2: Pairwise Sequence Alignment




Motivation

How Do We Compare Biological Sequences?
from Bioinformatics Algorithms: An Active Learning Approach
Volker Brendel
Thur Feb 1 2.2 PWSA: Definition and representation of "alignments"

Home Work Assignment 1 due at 6:00pm
Volker Brendel
Tues Feb 6 2.3 Global alignment (Needleman-Wunsch)
How to calculate the number of NW alignments
Volker Brendel
Thur Feb 8 2.4 Scoring alignments
How to calculate the optimal alignment score
(and find an optimal alignment)
PWSA: allowing "double-gaps"
PWSA: local alignment (Smith-Waterman)

Home Work Assignment 2 posted. Due: Feb. 15, 6:00pm
Volker Brendel
Tues Feb 13 2.5 Sequence analysis with scores: Concepts and statistical foundations

Volker Brendel
Thur Feb 15 2.6 Sequence analysis with scores: Practice

BLAST and Substitution scoring matrices
Biological Sequence Analysis I (Lecturer: Dr. Andy Baxevanis)

Slides for Sequence Analysis I presentation
Handout for Sequence Analysis I presentation
Volker Brendel
Tues Feb 20 3.1

Module 3: Basic Concepts in Molecular Phylogenetics




Molecular Phylogeny: Models

The powers and pitfalls of parsimony
Volker Brendel
Thur Feb 22 3.2 Lectures on molecular phylogeny
(from Bioinformatics: An Active Learning Approach)
Volker Brendel
Tues Feb 27 3.3 Methods I: Parsimony Volker Brendel
Thur Feb 29 3.4 Methods II: Distance matrix methods

Home Work Assignment 3 posted. Due: March 7, 9:00pm
Volker Brendel
Tues March 5 4.1

Module 4: Hidden Markov Models




Hidden Markov Models: Concepts and Algorithms

Hidden Markov Models
(from Bioinformatics: An Active Learning Approach)

Volker Brendel
Thur March 7 4.2 Hidden Markov Models: Applications

Profile Hidden Markov Models
TagDust Tagdust2 on github
GeneMark.hmm prokaryotic
GENSCAN; paper see here

Background:
GeneMark article
Review: Conditional Probability
Volker Brendel
Tues March 12 Spring Break
Thur March 14 Spring Break
Tues March 19 5.1

Module 5: Genome Assembly and Annotation




Eukaryotic gene finding:

GeneMark
AUGUSTUS
Volker Brendel
Thur March 21 5.2 Genome Annotation: Evaluation

Sensitivity, specificity, and all that
sample paper

How to evaluate gene structure prediction accuracy
Volker Brendel
Tues March 26 5.3 Genome Sequencing:

Illumina sequencing
nanopore sequencing

Assembly basics
NCBI Assembly Help
Volker Brendel
Thur March 28 5.4 Genome assembly:

How do we assemble genomes?
from Bioinformatics Algorithms: An Active Learning Approach

Volker Brendel
Tues April 2 6.1

Module 6: Genetic Variation






Volker Brendel
Thur April 4 6.2 Sequence Alignment Map format (SAM)
SAM flags explained
Pileup format (used by samtools)

Relevant code:

samtools
bwa
Volker Brendel
Tues April 9 6.3 Relevant file format specifications:

Variant Call Format (VCF)

Relevant code:

NCBI SRA Toolkit
freebayes
Volker Brendel
Thur April 11 6.4 Gene Expression Analyses
from Bioinformatics Algorithms: An Active Learning Approach

MIT Lecture: Gene Regulatory Networks
MIT Data Science: Clustering
Volker Brendel
Tues April 16 7.1

Module 7: Protein Structure




PDB: What is a protein?
PDB: How enzymes work
Guide to PDB

PDB Molecule of the Month
NCBI Protein

Volker Brendel
Thur April 18 7.2 peptide bond
Ramachandran plot ... very nice visualization thereof
(thanks to Prof. Eric Martz)

Secondary Structure

2StruCompare server

Jpred - secondary structure prediction
SPIDER3

HMMer
PFAM

foldit
AlphaFold

Volker Brendel
Tues April 23 8.1 Review: Managing workflows

loops in bash
Volker Brendel
Thur April 25 8.2 Review Volker Brendel
Tues April 30 12:30pm Final Project submission due Students