Identification of 2000 novel protein isoforms in the human and mouse genomes

Single genes, through differential splicing, can encode several different proteins. We have previously identified an unusual outcome of alternative splicing in which the first exon of a gene can be translated in two different reading frames (Theodoratos et al. 2010). We have recently extended this work in order to investigate the prevalence of this form of splicing in the mouse and human genomes. This has resulted in the discovery of over 2000 new potential protein isoforms in mice and humans (Wilson et al 2014). In each case, the novel protein isoform has a completely novel amino terminal sequence, but is otherwise identical to the canonical protein. These proteins have been completely missed by previous genome annotation pipelines. Changing the amino terminus of a protein can have profound effects on protein localisation and function. Identification of these novel proteins can therefore open up new avenues of research into many different biological processes.

A number of undergraduate projects are likely to be offered, verifying the expression, and investigating the function of our bioinformatically identified protein isoforms.