The Experiments of Lossless Seeds Filtration in Pattern Matching Problems

In bio-informatics, DNA sequence contains the nucleotides within a DNA molecule. Technology and algorithm are used to determine the order of the four bases, adenine, guanine, cytosine, and thymine which are represented by {A, G , C , T } in a strand of DNA. The development of DNA pattern searching in bio-informatics has led to large scale sequencing of genome. Applications and tools to analyze DNA segments have grown. The filtration algorithm uses F M — index that employs Burrows-Wheeler Transform algorithm to perform pattern matching from a given text T. Another aspect is to find close approximation to pattern in T. The pattern is partitioned into segmentation. Then, the Levenshtein distance is applied to give the proximity of pattern matching within k errors. This thesis will explore F M — index and perform experiment of the filtration extensively to find the seeds of the pattern in the text T. All algorithms are implemented in Bwolo program written in C++.