Long Indel [MLH04] and the irreversible generalization

Summary: Long Indel [MLH04] and the irreversible generalization is the sixth of 12 lectures on Biological Sequence Analysis [BSA], I give at ANU, Canberra, Australia in September to November 2018.     BSA is a huge field since sequences are presently so abundant.

 This and previous and next lecture deals with stochastic models of insertion-deletions, which is an alternative to score based alignment methods.   In this lecture I will focus on how to incorporate long insertion deletions in a stochastic model of sequence evolution.  There are very few papers on this and it is a difficult topic. Three papers are Miklos, Lunter and Holmes (2004), Levy Karin, Ashkenazy, Hein and Pupko (2018) and hopefully Edwards, Golden and Hein (2019). MLH04 introduced a time reversible model of insertion-deletions and a dynamical programming algorithm for calculating the probability of a pair of sequences.  Several conceptual novelties had to be introduced to solve these problems: i. ChopZones [CZ] which is the analogue of a column in optimization alignment or p-functions in TKF91;ii. embedding the actual sequence in an imaginary infinitely long sequence, defining deletions as the time mirror-image of insertions to maintain time reversibility; iii. an effective way of summing over possible evolutionary paths in a CZ and more.  KAHP18 introduced a Knordian knot solution to path-integration and investigated how the maximal probability alignment compares to that of MAFFT and HOMSTRAD.  EGH19 relaxes the irreversibility assumption which can have major advantages.

Preliminary slides can be found here