Insertion-Deletion (Indel) Processes and the Basic Statical Alignment Model

This and the next two lectures deals with stochastic models of insertion-deletions, which is an alternative to score based alignment methods.  This approach should appeal to statisticians.   This approach is more consistent approach to analysis of homologous sequences over the use optimization alignment, that uses optimization to deal with insertion-deletions, but statistics to deal with substitution events.  Statistical alignment gives a distribution on alignments and parameter estimation of all evolutionary parameters including insertion-deletion rates.

The first statistical alignment paper was published in 1986 by Bishop and Thompson, but it was overshadowed by the model (TKF91) published in 1991 by Thorne, Kishino and Felsenstein that provided a more consistent model and thorough model. Since there has been very few models compared to for instance models of amino acids or nucleotides.  In 1992 Thorne, Kishino and Felsenstein  published a model that allowed longer insertion-deletions under some very unbiological constraints. In 2004 Miklos, Lunter and Holmes published a more flexible model of long insertion-deletion. 

In this lecture Jotun will focus on the basic models and how to formulate a dynamical programming algorithm that can calculate the probability of 2 homologous sequences and define a distribution on all possible alignments.

However, there only little effort goes into development of statistical alignment compared to other approaches.

Preliminary slides can be found here.