Biological and Linguistic Sequence Analysis

Instructor: Brian Roark

Class time: M-F   6:00 - 9:00 PM    Dec.1-19, 2008

Class location: Facultad de Ingeniería, Universidad de la República, Montevideo, Uruguay

Required textbook: None

Optional textbooks:
Roark and Sproat, Computational Approaches to Morphology and Syntax
Dan Gusfield Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology
Richard Durbin, et al. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids

Skip to overview of lectures.

Grading

10% of your grade will depend on in-class discussion, 60% on the homeworks and 30% on a final exam.

What we'll cover and an approximate schedule

Date     Topic Assignment
Dec 1(a) Introduction to biological and linguistic sequences and strings; overview of main problems; course overview HW1.1
(due Dec.8)
(b) Introduction to approximate matching; edit distance, dynamic programming, local alignment
Dec 2(a) Chomsky hierarchy; weighted finite-state automata and transducers; n-gram language modeling HW 1.2
(due Dec.8)
(b) Hidden Markov models; POS-tagging; Viterbi algorithm
Dec 3(a) Supervised and unsupervised sequence learning; Expectation Maximization (EM) with forward-backward algorithm; discriminative learning
(b) Gene prediction
Dec 4(a) Protein and RNA secondary structure; Introduction to context-free grammars
(b) Context-free parsing; Chomsky Normal Form; CYK parsing; Earley's algorithm; Parser evaluation
Dec 5(a) "Mildly" context-sensitive grammars for pseudoknots and cross-serial dependencies  (Will tentatively cover Dec.11)
(b) Machine translation alignment; HMM alignment models; Transduction grammars  (Will tentatively cover Dec.18)
Dec 8(a) Deterministic exact string matching; a simple linear algorithm for exact match HW 2.1
(due Dec.15)
(b) Knuth-Morris-Pratt and Boyer-Moore algorithms for exact match
Dec 9(a) Aho-Corasick algorithm for sets of patterns; regular expression patterns HW 2.2
(due Dec.15)
(b) Efficient approximate match: linear space, bounded approximate matching and exclusion methods
Dec 10(a) Suffix trees
(b) Suffix automata; suffix arrays; Lempel-Ziv compression; Lowest common ancestor retrieval
Dec 11(a) "Mildly" context-sensitive grammars for pseudoknots and cross-serial dependencies  (Originally scheduled Dec.5)
(b) High(er) accuracy context-free parsing; Dependency grammars and parsing; efficient inference for projective grammars; minimum spanning trees for non-projective dependency parsing
Dec 12(a) Topics in Context-free parsing I: grammar induction and inference (Will tentatively cover Dec.17)
(b) Topics in Context-free parsing II: finite-state approximations to context-free grammars; pipelined systems; fast context-free parsing with finite-state pre-processing
Dec 15(a) Introduction to multiple sequence alignments; families; profile HMMs; Perceptron algorithm for learning profile models HW 3
(due Dec.19)
(b) Aligning multiple sequences; minimum sum-of-pairs alignment; higher dimensional dynamic programming; iterative pairwise alignment
Dec 16(a) Introduction to phylogenic tree building; ultrametric and additive distance trees; distance-based tree construction; parsimony
(b) Probabilistic models of phylogeny
Dec 17 Full course review;    Topics in context-free processing (Originally scheduled Dec.12)
Dec 18 Machine translation alignment; HMM alignment models; Transduction grammars  (Originally scheduled Dec.5)
Dec 19 Final exam