Title: DNA fragment assembly: graph structures and chloroplast genome scaffolding
Abstract
To obtain the nucleotide sequence of a DNA molecule, the molecule is fragmented using sequencing technology and the fragments are assembled. These fragments are called reads. They are subject to sequencing errors and must be considered in two orientations: that of their original DNA strand, or the reverse-complementary for the other strand. Assembly is based on pairwise overlaps between oriented reads and consists of three phases: assembling the reads to obtain contigs (sequences longer than the reads), scaffolding the contigs to obtain scaffolds (orders of oriented contigs), and completing the scaffolds (finding the nucleotide sequences separating the oriented contigs in the scaffolds).
In this thesis, we compare graph structures representing succession relations between oriented DNA sequences, useful at different phases of assembly. Then, we address the scaffolding problem dedicated to chloroplast genomes by proposing a new formulation, an exact resolution and an implementation.