| 研究生: |
林聖淵 Lin, Sheng-Yuan |
|---|---|
| 論文名稱: |
運行於多核心處理器之高效能近似字串比對平行演算法 A High Performance Parallel Algorithm for Approximate String Matching on Multi-core Processor |
| 指導教授: |
張世杰
Chang, Shih-Chieh |
| 口試委員: |
韓永楷
Hon, Wing-Kai 劉邦鋒 Liu, Pangfeng 林政宏 Lin, Cheng-Hung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
| 論文出版年: | 2013 |
| 畢業學年度: | 101 |
| 語文別: | 英文 |
| 論文頁數: | 29 |
| 中文關鍵詞: | 近似字串比對 、編輯距離 、位元平行法 、非確定有限狀態自動機 、平行演算法 、平行程式 、平行運算 、OpenMP |
| 外文關鍵詞: | Approximate string matching, Edit distance, Bit-parallelism, Nondeterministic finite automaton, Parallel algorithm, Parallel programming, Parallel computing, OpenMP |
| 相關次數: | 點閱:187 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近似字串比對,是一門廣泛應用於各種領域的基礎技術,包含計算生物學、資訊檢索、音訊辨識等;輸入一個特定字串以及一份文本,此種技術可列出此文本中,所有近似於該特定字串的位置。即時近似字串比對,是一種不會預先處理文本的比對方法,該技術在近數十年已發展成熟。近年來,一些研究曾使用平行運算來加速即時近似字串比對;然而,這些作品往往只使用最簡單的平行策略以及演算方法。在這個資訊量快速上升的年代,比對技術永遠需要追求更好的效能。
在這篇論文中,我們針對即時近似字串比對提出一種新的平行演算法,稱為高密度平行法。此平行法的主要概念,是分配許多輕量級的執行緒到每一個字元上進行比對;在此架構下,我們設計了數種策略降低執行緒的工作量,同時讓記憶體使用量無關待搜尋的字串長度。實驗顯示,此種平行演算法在多種情況下能有效地減少比對時間,特別是針對較長的待搜尋字串,其表現較過去的平行方法更為出色。實作上,我們並行採用本文所提出的高密度平行法及傳統平行策略,並套用於多種目前效果最佳的循序演算法,為平行化即時近似字串比對提供了全面性的研究。
Approximate string matching has been widely applied in various research and application domains including computational biology, information retrieval, and voice recognition. For an input text, the approximate string matching technique finds all positions of substrings that are approximately matched by a given pattern. Online approximate string matching, the searching without preprocessing the text, has been studied during recent decades. In these years, several researchers focused on accelerating approximate string matching using parallel computing; however, these works only applied a straightforward parallel approach and have room for improvement. As the complexity of information rapidly increase, the matching algorithm demands for better performance.
In this work, we propose a novel parallel algorithm called high-density parallelism for online approximate string matching. The main idea of high-density parallelism is to distribute many light-weight threads on every character to perform the matching. Under such parallel architecture, we develop several strategies to decrease the workload of threads, including making the memory usage independent of pattern size. Experiments show that the proposed parallel algorithm reduces the matching runtime much better than traditional parallel approach, especially for large patterns. Based on the state-of-the-art serial algorithms and different parallel approaches, we provide a comprehensive parallel study for online approximate string matching.
[1] Carla C. T. dos Reis. “Approximate String Matching Algorithm Using Parallel Methods for Molecular Sequence Camparisons,” Portuguese Conference on Artificial Intelligence, pp. 140-143, 2005
[2] Gene Myers, “A fast bit-vector algorithm for approximate string matching based on dynamic programming”, Journal of the ACM, vol. 46, pp. 395-415, 1999
[3] Gonzalo Navarro, “A guided tour to approximate string matching”, ACM Computing Surveys, vol. 33, pp. 31-88, 2001
[4] Heikki Hyyrö, “Improving the bit-parallel NFA of Baeza-Yates and Navarro for approximate string matching”, Information Processing Letters, vol. 108, pp. 313-319, 2008
[5] Hongjian Li, Bing Ni, Man-Hon Wong, and Kwong-Sak Leung, “A fast CUDA implementation of agrep algorithm for approximate nucleotide sequence matching”, IEEE 9th Symposium on Application Specific Processors, pp. 74-77, 2011
[6] Maxime Crochemore, Costas S. Iliopoulos, and Solon P. Pissis, “A parallel algorithm for fixed-length approximate string-matching with k-mismatches”, Algorithms and Applications, vol. 6060, pp. 92-101, 2010
29
[7] Mikael Onsjö, and Yoshinori Aono, “Online approximate string matching with CUDA”, Technical Report of Tokyo Institute of Technology (retrieved 2013 July, from http://pds13.egloos.com/pds/200907/26/57/pattmatch-report.pdf), 2009
[8] Peter H. Sellers, “The Theory and Computation of Evolutionary Distances: Pattern Recognition,”, Journal of Algorithms, vol. 1, pp. 359-373, 1980
[9] Richardo Baeza-Yates, and Gonzalo Navarro, “Faster approximate string matching”, Algorithmica, vol. 23, pp. 127-158, 1999
[10] Sun Wu, and Udi Manber, “Fast text searching allowing errors”, Communication of ACM, vol. 35, pp. 83-91, 1992
[11] Sun Wu, Udi Manber, and Gene Myers, “A subquadratic algorithm for approximate limited expression matching”, Algorithmica, vol. 15, pp. 50-67, 1996
[12] Yu Liu, Longjiang Guo, Jinbao Li, Meirui Ren, and Keqin Li, “Parallel Algorithms for Approximate String Matching with k Mismatches on CUDA”, Proceedings of the International Parallel and Distributed Processing Symposium Workshops, pp. 2414-2422, 2012
[13] Bitap algorithm, http://en.wikipedia.org/wiki/Bitap_algorithm
[14] NVIDIA CUDA, https://developer.nvidia.com/category/zone/cuda-zone
[15] OpenMP (Open Multi-Processing), http://openmp.org/wp/
[16] OpenMP in Wikipedia, http://en.wikipedia.org/wiki/OpenMP
全文公開日期 本全文未授權公開 (校內網路)