• Skip to main content
  • Skip to top navigation bar
  •      

    Links

    Courses S22

    Courses F21

    Other Years

    Graduate Programs

    Digital Biology: A survey of topics in bioinformatics and functional genomics

    BIOL-L/MLS-M 388 Spring 2022


    Course description

    BIOL L388 DB

    Class times and locations

    Tues, Thur 11:30a - 12:45p (Credits: 3.0); Biology Building (JH) A106

    Tentative schedule (click the icon for a PrinterFriendly.gif version)

    DAYDATELECTURETOPICLECTURER
    Tues Jan 11 0.1

    Orientation



    What is Digital Biology?
    Topics in bioinformatics, scope of class, and resources

    Motivation
    New virus outbreak: BBC Jan. 19, 2020
    New virus outbreak: BBC Jan. 20, 2020
    Precedence: The Fastest Outbreak - connection to bioinformatics work


    Bioinformatics Databases and Computing Resources
    National Center for Biotechnology Information (NCBI), a great starting point for "anything" bioinformatics

    Mapping biological question onto computational problems:
    The modeling spiral
    Sequence gazing: the famous TATA-box
    AlphaFold
    Volker Brendel
    Thur Jan 13 1.1

    Module 1: Bioinformatics resources and workspaces




    Ubuntu on Windows: ... a quick way to get to a command line terminal

    Virtual Machines: VirtualBox

    Linux Basics

    Basic UNIX shell tutorial
    Command-line bootcamp

    The UNIX Shell
    The UNIX Shell: Summary of Basic Commands
    vi(m) editor tutorial

    Volker Brendel
    Tues Jan 18 1.2 Customizing your Linux work space
    Getting code with wget
    Volker Brendel
    Thur Jan 20 1.3 Basic Linux system maintenance
    Working with NCBI data
    Volker Brendel
    Tues Jan 25 1.4 Getting code with git Volker Brendel
    Thur Jan 27 1.5 Command line access to NCBI data: EDirect Volker Brendel
    Tues Feb 1 2.1

    Module 2: Pairwise Sequence Alignment




    Motivation

    How Do We Compare Biological Sequences?
    from Bioinformatics Algorithms: An Active Learning Approach
    Volker Brendel
    Thur Feb 3 2.2 PWSA: Definition and representation of "alignments" Volker Brendel
    Tues Feb 8 2.3 Global alignment (Needleman-Wunsch)
    How to calculate the number of NW alignments
    Volker Brendel
    Thur Feb 13 2.4 Scoring alignments
    How to calculate the optimal alignment score
    (and find an optimal alignment)
    Volker Brendel
    Tues Feb 15 2.5 Computers within computers: VMs and Singularity/Apptainer containers

    PWSA: NW algorithm with no end-gap penalties
    PWSA: allowing "double-gaps"
    PWSA: local alignment (Smith-Waterman)

    Home Work Assignment 1 posted. Due: Feb. 22
    Volker Brendel
    Thur Feb 17 2.6 Sequence analysis with scores: Concepts and statistical foundations

    Practice: BLAST and Substitution scoring matrices
    Biological Sequence Analysis I (Lecturer: Dr. Andy Baxevanis)

    Slides for Sequence Analysis I presentation
    Handout for Sequence Analysis I presentation
    Volker Brendel
    Tues Feb 22 3.1

    Module 3: Basic Concepts in Molecular Phylogenetics




    Molecular Phylogeny: Models

    The powers and pitfalls of parsimony
    Volker Brendel
    Thur Feb 24 3.2 Lectures on molecular phylogeny
    (from Bioinformatics: An Active Learning Approach)
    Volker Brendel
    Tues March 1 3.3 Volker Brendel
    Thur March 3 3.4

    Volker Brendel
    Tues March 8 4.1

    Module 4: Hidden Markov Models




    Hidden Markov Models: Concepts and Algorithms

    Hidden Markov Models
    (from Bioinformatics: An Active Learning Approach)

    Home Work Assignment 2 posted. Due: March 11
    Volker Brendel
    Thur March 10 4.2 Hidden Markov Models: Algorithms

    Review: Conditional Probability

    GeneMark.hmm prokaryotic
    GeneMark article
    Volker Brendel
    Tues March 15 Spring Break
    Thur March 17 Spring Break
    Tues March 22 5.1

    Module 5: Genome Assembly and Annotation




    Home Work Assignment 3 posted. Due: March 29
    Volker Brendel
    Thur March 24 5.2 Genome Annotation: Evaluation

    Sensitivity, specificity, and all that
    sample paper

    How to evaluate gene structure prediction accuracy
    Volker Brendel
    Tues March 29 5.3 Eukaryotic gene finding:

    GeneMark
    AUGUSTUS
    GENSCAN; paper see here

    Genome Assembly



    Illumina sequencing
    nanopore sequencing

    Assembly basics
    NCBI Assembly Help
    How do we assemble genomes?
    from Bioinformatics Algorithms: An Active Learning Approach
    Volker Brendel
    Thur March 31 5.4

    Home Work Assignment 4 posted. Due: April 5
    Volker Brendel
    Tues April 5 6.1

    Module 6: Genetic Variation




    Volker Brendel
    Thur April 7 6.2

    Home Work Assignment 5 posted. Due: April 12
    Volker Brendel
    Tues April 12 7.1 Relevant file format specifications:

    Variant Call Format (VCF)
    Sequence Alignment Map format (SAM)
    SAM flags explained
    Pileup format (used by samtools)


    Relevant code:

    NCBI SRA Toolkit
    samtools
    bwa
    freebayes
    Volker Brendel
    Thur April 14 7.2 Gene Expression Analyses
    from Bioinformatics Algorithms: An Active Learning Approach

    MIT Lecture: Gene Regulatory Networks
    MIT Data Science: Clustering
    Volker Brendel
    Tues April 19 7.1

    Module 7: Protein Structure




    PDB: What is a protein?
    PDB: How enzymes work
    Guide to PDB

    PDB Molecule of the Month
    NCBI Protein

    Volker Brendel
    Thur April 21 7.2 peptide bond
    Ramachandran plot ... very nice visualization thereof
    (thanks to Prof. Eric Martz)

    Secondary Structure

    2StruCompare server

    Jpred - secondary structure prediction
    SPIDER3

    HMMer
    PFAM

    foldit
    AlphaFold

    Home Work Assignment 6 posted. Due: April 27
    Volker Brendel
    Tues April 26 8.1 Review Volker Brendel
    Thur April 28 8.2 Review Volker Brendel
    Thur May 5 12:40pm - 2:40pm Final Examination Students