Title: Comparison between Hirschberg’s algorithm and Needleman-Wunsch algorithm in finding optimal alignment in terms of search space and time complexity

Authors: Fahad Almsned, Arwa Alhamed

 DOI:  https://dx.doi.org/10.18535/jmscr/v5i3.168

Abstract

Background

DNA sequences, which are the data representation in this work, are not neatly arranged sequences that store organisms genetic. Specifically, the information is encoded using four key chemicals, adenine, thymine, guanine and cytosine (abbrevia-ted as A, T, G and C)1. This biological sequence data can be obtained from variety of public and private databases. With the growing amount of data, it became impractical to analyze DNA sequences manually, so faster algorithms and tools are needed. Sequence analysis is the process used to find information about a nucleotide or amino acid sequence using computational methods2. Sequence comparison, which is the fundamental procedure of analyzing sequence content, is regarded as one of the most fundamental problems of computational biology that usually solved with a technique known as sequence alignment. Sequence alignment can be defined as the problem of finding, which parts of the sequences are similar and which parts are different. Sequence alignment is a computationally challenging problem of paramount importance and is a fundamental operation performed in computational biology research. The goal is to produce the best alignment for a pair of DNA or protein sequences (represented as strings of characters). A good alignment has zero or more gaps inserted into the sequences to maximize the number of positions in the aligned strings that match. For example, consider aligning the sequences ATTGGC and AGGAC. By inserting gaps (-) in the appropriate place, the number of positions where the two sequences agree can be maximized: ATTGG-C and AGGAC3. The sequence alignment problem can be illustrated as, given a scoring function that measures the score of aligning characters at the same position from each sequence, calculate the total score of the alignment by adding the scores of all positions and find the maximum total score of every possible alignment. 

References

1.      S. K. Moore. 2000. Understanding the Human Genome. IEEE Press. 11: 34-35.

2.      Todd, J. Vision and Aoife McLysaght. 2003. Computational Tools and Resources in-Plant Genome Informatics.

3.      K. Charter, J. Schaeffer, and D. Szafron. 2000. Sequence Alignment using Fast LSA. International Conference on Math-ematics and Engineering Techniques in Medicine and Biological Sciences. 239-245.

4.      S. B. Needleman and Christian, D. Wunsch. 1970. A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Sequences. Journal of Molecular Biology. 48: 443-453.

5.      Hirschberg, D. S. (1975). ”A linear space algorithm for computing maxi- mal common subsequences”. Communications of the ACM. 18 (6): 341343.

Corresponding Author

Fahad Almsned

School of Systems Biology

George Mason University, Fairfax, VA 22030