Digital Biology: A survey of topics in bioinformatics and functional genomics

BIOL-L/MLS-M 388 Spring 2022


Course description

BIOL L388 DB

Class times and locations

Tues, Thur 11:30a - 12:45p (Credits: 3.0); Biology Building (JH) A106

Tentative schedule

HTML version of this schedule

DAYDATELECTURETOPICLECTURER
Tues Jan 11 0.1

Orientation



What is Digital Biology?
Topics in bioinformatics, scope of class, and resources

Motivation
New virus outbreak: BBC Jan. 19, 2020
New virus outbreak: BBC Jan. 20, 2020
Precedence: The Fastest Outbreak - connection to bioinformatics work


Bioinformatics Databases and Computing Resources
National Center for Biotechnology Information (NCBI), a great starting point for "anything" bioinformatics

Mapping biological question onto computational problems:
The modeling spiral
Sequence gazing: the famous TATA-box
AlphaFold
Volker Brendel
Thur Jan 13 1.1

Module 1: Bioinformatics resources and workspaces




Ubuntu on Windows: ... a quick way to get to a command line terminal

Virtual Machines: VirtualBox

Linux Basics

Basic UNIX shell tutorial
Command-line bootcamp

The UNIX Shell
The UNIX Shell: Summary of Basic Commands
vi(m) editor tutorial

Volker Brendel
Tues Jan 18 1.2 Customizing your Linux work space
Getting code with wget
Volker Brendel
Thur Jan 20 1.3 Basic Linux system maintenance
Working with NCBI data
Volker Brendel
Tues Jan 25 1.4 Getting code with git Volker Brendel
Thur Jan 27 1.5 Command line access to NCBI data: EDirect Volker Brendel
Tues Feb 1 2.1

Module 2: Pairwise Sequence Alignment




Motivation

How Do We Compare Biological Sequences?
from Bioinformatics Algorithms: An Active Learning Approach
Volker Brendel
Thur Feb 3 2.2 PWSA: Definition and representation of "alignments" Volker Brendel
Tues Feb 8 2.3 Global alignment (Needleman-Wunsch)
How to calculate the number of NW alignments
Volker Brendel
Thur Feb 13 2.4 Scoring alignments
How to calculate the optimal alignment score
(and find an optimal alignment)
Volker Brendel
Tues Feb 15 2.5 Computers within computers: VMs and Singularity/Apptainer containers

PWSA: NW algorithm with no end-gap penalties
PWSA: allowing "double-gaps"
PWSA: local alignment (Smith-Waterman)

Home Work Assignment 1 posted. Due: Feb. 22
Volker Brendel
Thur Feb 17 2.6 Sequence analysis with scores: Concepts and statistical foundations

Practice: BLAST and Substitution scoring matrices
Biological Sequence Analysis I (Lecturer: Dr. Andy Baxevanis)

Slides for Sequence Analysis I presentation
Handout for Sequence Analysis I presentation
Volker Brendel
Tues Feb 22 3.1

Module 3: Basic Concepts in Molecular Phylogenetics




Molecular Phylogeny: Models

The powers and pitfalls of parsimony
Volker Brendel
Thur Feb 24 3.2 Lectures on molecular phylogeny
(from Bioinformatics: An Active Learning Approach)
Volker Brendel
Tues March 1 3.3 Volker Brendel
Thur March 3 3.4

Volker Brendel
Tues March 8 4.1

Module 4: Hidden Markov Models




Hidden Markov Models: Concepts and Algorithms

Hidden Markov Models
(from Bioinformatics: An Active Learning Approach)

Home Work Assignment 2 posted. Due: March 11
Volker Brendel
Thur March 10 4.2 Hidden Markov Models: Algorithms

Review: Conditional Probability

GeneMark.hmm prokaryotic
GeneMark article
Volker Brendel
Tues March 15 Spring Break
Thur March 17 Spring Break
Tues March 22 5.1

Module 5: Genome Assembly and Annotation




Home Work Assignment 3 posted. Due: March 29
Volker Brendel
Thur March 24 5.2 Genome Annotation: Evaluation

Sensitivity, specificity, and all that
sample paper

How to evaluate gene structure prediction accuracy
Volker Brendel
Tues March 29 5.3 Eukaryotic gene finding:

GeneMark
AUGUSTUS
GENSCAN; paper see here

Genome Assembly



Illumina sequencing
nanopore sequencing

Assembly basics
NCBI Assembly Help
How do we assemble genomes?
from Bioinformatics Algorithms: An Active Learning Approach
Volker Brendel
Thur March 31 5.4

Home Work Assignment 4 posted. Due: April 5
Volker Brendel
Tues April 5 6.1

Module 6: Genetic Variation




Volker Brendel
Thur April 7 6.2

Home Work Assignment 5 posted. Due: April 12
Volker Brendel
Tues April 12 7.1 Relevant file format specifications:

Variant Call Format (VCF)
Sequence Alignment Map format (SAM)
SAM flags explained
Pileup format (used by samtools)


Relevant code:

NCBI SRA Toolkit
samtools
bwa
freebayes
Volker Brendel
Thur April 14 7.2 Gene Expression Analyses
from Bioinformatics Algorithms: An Active Learning Approach

MIT Lecture: Gene Regulatory Networks
MIT Data Science: Clustering
Volker Brendel
Tues April 19 7.1

Module 7: Protein Structure




PDB: What is a protein?
PDB: How enzymes work
Guide to PDB

PDB Molecule of the Month
NCBI Protein

Volker Brendel
Thur April 21 7.2 peptide bond
Ramachandran plot ... very nice visualization thereof
(thanks to Prof. Eric Martz)

Secondary Structure

2StruCompare server

Jpred - secondary structure prediction
SPIDER3

HMMer
PFAM

foldit
AlphaFold

Home Work Assignment 6 posted. Due: April 27
Volker Brendel
Tues April 26 8.1 Review Volker Brendel
Thur April 28 8.2 Review Volker Brendel
Thur May 5 12:40pm - 2:40pm Final Examination Students