Digital Biology: A survey of topics in bioinformatics and functional genomics

BIOL-L/MLS-M 388 Spring 2023


Course description

BIOL L388 DB

Class times and locations

Tues, Thur 11:30a - 12:45p (Credits: 3.0); Swain East (SE) 010

Tentative schedule

HTML version of this schedule

DAYDATELECTURETOPICLECTURER
Tues Jan 10 0.1

Orientation



What is Digital Biology?
Topics in bioinformatics, scope of class, and resources

Motivation
New virus outbreak: BBC Jan. 19, 2020
New virus outbreak: BBC Jan. 20, 2020
2023: XBB.1.5
NCBI SARS-CoV-2
Precedence: The Fastest Outbreak - connection to bioinformatics work


Bioinformatics Databases and Computing Resources
National Center for Biotechnology Information (NCBI), a great starting point for "anything" bioinformatics

Mapping biological question onto computational problems:
The modeling spiral
Sequence gazing: the famous TATA-box
AlphaFold
Volker Brendel
Thur Jan 12 1.1

Module 1: Bioinformatics resources and workspaces




Ubuntu on Windows: ... a quick way to get to a command line terminal

Virtual Machines: VirtualBox

Linux Basics

Basic UNIX shell tutorial
Command-line bootcamp

The UNIX Shell
The UNIX Shell: Summary of Basic Commands
vi(m) editor tutorial

Volker Brendel
Tues Jan 17 1.2 Customizing your Linux work space
Getting code with git

Working with NCBI data
Volker Brendel
Thur Jan 19 1.3 Command line access to NCBI data: EDirect

Entrez Direct
NCBI Sequence search fields
MEDLINE/PubMed search fields
Volker Brendel
Tues Jan 24 1.4 Review: Command-line bootcamp Volker Brendel
Thur Jan 26 1.5 Review: Statistical evaluation of sequence features
Using generic Linux commands to process sequence data (at least for previews or consistency checks ...)

Home Work Assignment 1 posted. Due: Feb. 2, 6:00pm
Volker Brendel
Tues Jan 31 2.1

Module 2: Pairwise Sequence Alignment




Motivation

How Do We Compare Biological Sequences?
from Bioinformatics Algorithms: An Active Learning Approach
Volker Brendel
Thur Feb 2 2.2 PWSA: Definition and representation of "alignments"

Home Work Assignment 1 due at 6:00pm
Volker Brendel
Tues Feb 7 2.3 Global alignment (Needleman-Wunsch)
How to calculate the number of NW alignments
Volker Brendel
Thur Feb 9 2.4 Scoring alignments
How to calculate the optimal alignment score
(and find an optimal alignment)
PWSA: allowing "double-gaps"
PWSA: local alignment (Smith-Waterman)

Home Work Assignment 2 posted. Due: Feb. 17, 6:59pm
Volker Brendel
Tues Feb 14 2.5 Sequence analysis with scores: Concepts and statistical foundations

Volker Brendel
Thur Feb 16 2.6 Sequence analysis with scores: Practice

BLAST and Substitution scoring matrices
Biological Sequence Analysis I (Lecturer: Dr. Andy Baxevanis)

Slides for Sequence Analysis I presentation
Handout for Sequence Analysis I presentation
Volker Brendel
Tues Feb 21 3.1

Module 3: Basic Concepts in Molecular Phylogenetics




Molecular Phylogeny: Models

The powers and pitfalls of parsimony
Volker Brendel
Thur Feb 23 3.2 Lectures on molecular phylogeny
(from Bioinformatics: An Active Learning Approach)
Volker Brendel
Tues Feb 28 3.3 Methods I: Parsimony

Home Work Assignment 3 posted. Due: March 7, 11:00pm
Volker Brendel
Thur March 2 3.4 Methods II: Distance matrix methods

Volker Brendel
Tues March 7 4.1

Module 4: Hidden Markov Models




Hidden Markov Models: Concepts and Algorithms

Hidden Markov Models
(from Bioinformatics: An Active Learning Approach)

Volker Brendel
Thur March 9 4.2 Hidden Markov Models: Applications

Profile Hidden Markov Models
TagDust Tagdust2 on github
GeneMark.hmm prokaryotic
GENSCAN; paper see here

Background:
GeneMark article
Review: Conditional Probability
Volker Brendel
Tues March 14 Spring Break
Thur March 16 Spring Break
Tues March 21 5.1

Module 5: Genome Assembly and Annotation




Eukaryotic gene finding:

GeneMark
AUGUSTUS
Volker Brendel
Thur March 23 5.2 Genome Annotation: Evaluation

Sensitivity, specificity, and all that
sample paper

How to evaluate gene structure prediction accuracy
Volker Brendel
Tues March 28 5.3 Genome Sequencing:

Illumina sequencing
nanopore sequencing

Assembly basics
NCBI Assembly Help
Volker Brendel
Thur March 30 5.4 Genome assembly:

How do we assemble genomes?
from Bioinformatics Algorithms: An Active Learning Approach

Volker Brendel
Tues April 4 6.1

Module 6: Genetic Variation






Home Work Assignment 5 posted. Due: April 11, 7:00pm
Home Work Assignment 6 posted. Due: April 14, 11:00pm
Volker Brendel
Thur April 6 6.2 Sequence Alignment Map format (SAM)
SAM flags explained
Pileup format (used by samtools)

Relevant code:

samtools
bwa
Volker Brendel
Tues April 11 6.3 Relevant file format specifications:

Variant Call Format (VCF)

Relevant code:

NCBI SRA Toolkit
freebayes
Volker Brendel
Thur April 13 6.4 Gene Expression Analyses
from Bioinformatics Algorithms: An Active Learning Approach

MIT Lecture: Gene Regulatory Networks
MIT Data Science: Clustering
Volker Brendel
Tues April 18 7.1

Module 7: Protein Structure




PDB: What is a protein?
PDB: How enzymes work
Guide to PDB

PDB Molecule of the Month
NCBI Protein

Volker Brendel
Thur April 20 7.2 peptide bond
Ramachandran plot ... very nice visualization thereof
(thanks to Prof. Eric Martz)

Secondary Structure

2StruCompare server

Jpred - secondary structure prediction
SPIDER3

HMMer
PFAM

foldit
AlphaFold

Volker Brendel
Tues April 25 8.1 Review: Managing workflows

loops in bash
Volker Brendel
Thur April 27 8.2 Review Volker Brendel
Thur May 4 12:20pm Final Project submission due Students