Parallel algorithm for center-star sequence alignments with l applications to short reads

Parallel algorithm for center-star sequence alignments with l applications to short reads Conference

Saeed, F, Khokhar, AA. (2012). Parallel algorithm for center-star sequence alignments with l applications to short reads . 131-136.

The center-star alignment method is one of the few alignment algorithms that has theoretical guarantees of giving a 2-approx solution. The method requires finding a center sequence to generate the alignment. However, finding the center sequence is of the order of O(N²) which is not feasible for large number of sequences. With the advent of fast sequencing techniques (454, Solexa), the number of sequences that have to be aligned and processed is huge and can reach up to one billion sequences. The alignment of these sequences with the reference genome is one of the basic steps in mapping, alignments and sequence analysis. In this paper we present a highly scalable parallel algorithm to find the Center-star sequence and alignments on multiprocessor platforms. Our method relies on ranking and sampling of sequences using k-mer rank. The proposed algorithm is capable of finding the center sequence in a short time with near linear speedups with increasing number of processors. The proposed algorithm has been implemented on a cluster of workstations using MPI library and results to find the Center sequence for up to 6.4 million sequences are presented. Detailed computation and communication complexity analysis of the algorithm is given along with quality assessment for the alignments.