Society for Mathematical Biology nautilus logo

International Conference on Mathematical Biology and

Annual Meeting of The Society for Mathematical Biology,

July 27-30, 2009

University of British Columbia, Vancouver

.

Program

Poster PS22B
Jenny Qian
Genome Sciences Centre, British Columbia Cancer Research Centre
Title Computational Methods for Detecting Novel Isoforms from De novo Transcriptome Assembly
Abstract De novo transcriptome assembly with ABySS (Assembly By Short Sequences) provides a unique window for further discovery of novel transcripts. As demonstrated recently, assembly data harbours valuable information for identifying small-scale sequence variations such as Single Nucleotide Polymorphism (SNPs), as well as larger-scale structural variations such as novel isoforms. The ABySS assembler uses de Bruijn graphs in a distributed representation, which makes assembling large genomes manageable. The assembly process can be divided into two stages: 1) At the single end tag (SET) stage, read sub-sequences of length k (k-mers) are extended one base at a time. Two sequences are joined to form longer sequences (contigs) when they share a unique (k-1) overlap. 2) At the paired end tag (PET) stage, SET contigs are merged into longer contigs only if sufficiently many paired reads support it. At the SET stage, there may be cases where two contigs cannot be unambiguously joined, namely, if two parallel contigs share the same k-1 overlaps. In the context of a transcriptome assembly, these parallel contigs imply the presence of isoforms, and the shorter one is of length 2(k-1). We call the latter a junction contig since it contains no additional sequence other than the (k-1)-base overlaps with its two neighbors. Junction contigs can be used to identify one of 4 possible novel events: unannotated skipped exon(s), retained intron(s), other alternative splicing, or additional exon(s). The detection mechanism of these events can be summarized as follows: 1 Align the junction contigs (ungapped) to both the reference transcriptome and the genome, and parse the output, to establish the association between junction contigs and known annotations. 2 Align the junction contigs (gapped) to PET contigs, and parse the output, to establish the association between junction contigs and their corresponding PET contigs. 3 Based on the associations established in steps 1 and 2, identify the novel events. We applied this classification scheme to the transcriptome assembly of a follicular lymphoma tumor sample. Our preliminary results show that, out of a complete set of 1859 junction contigs, 324 (17.8%) align to the transcriptome reference only, 660 (35.5%) do not align, and 126 (6.8%) align to the genome reference only. Further categorization of junction contigs in each pool allows us to identify and characterize novel isoforms from this transcriptome assembly in a systematic way.
CoauthorsShaun Jackman, Cydney Nielsen, Marco Marra, Steven JM Jones, İnanç Birol
LocationWoodward Lobby (Wednesday-Thursday)