Title: Comparison between Hirschberg’s algorithm and Needleman-Wunsch algorithm in finding optimal alignment in terms of search space and time complexity
Authors: Fahad Almsned, Arwa Alhamed
DOI: https://dx.doi.org/10.18535/jmscr/v5i3.168
Abstract
Background
DNA sequences, which are the data representation in this work, are not neatly arranged sequences that store organisms genetic. Specifically, the information is encoded using four key chemicals, adenine, thymine, guanine and cytosine (abbrevia-ted as A, T, G and C)1. This biological sequence data can be obtained from variety of public and private databases. With the growing amount of data, it became impractical to analyze DNA sequences manually, so faster algorithms and tools are needed. Sequence analysis is the process used to find information about a nucleotide or amino acid sequence using computational methods2. Sequence comparison, which is the fundamental procedure of analyzing sequence content, is regarded as one of the most fundamental problems of computational biology that usually solved with a technique known as sequence alignment. Sequence alignment can be defined as the problem of finding, which parts of the sequences are similar and which parts are different. Sequence alignment is a computationally challenging problem of paramount importance and is a fundamental operation performed in computational biology research. The goal is to produce the best alignment for a pair of DNA or protein sequences (represented as strings of characters). A good alignment has zero or more gaps inserted into the sequences to maximize the number of positions in the aligned strings that match. For example, consider aligning the sequences ATTGGC and AGGAC. By inserting gaps (-) in the appropriate place, the number of positions where the two sequences agree can be maximized: ATTGG-C and AGGAC3. The sequence alignment problem can be illustrated as, given a scoring function that measures the score of aligning characters at the same position from each sequence, calculate the total score of the alignment by adding the scores of all positions and find the maximum total score of every possible alignment.
Fahad Almsned
School of Systems Biology
George Mason University, Fairfax, VA 22030