簡易檢索 / 詳目顯示

研究生: 吳岱洋
Wu, Dai-Yang
論文名稱: 以圖形處理器加速蛋白質序列比對
Accelerating Protein Alignment by GPU
指導教授: 韓永楷
Hon, Wing-Kai
口試委員: 李哲榮
Lee, Che-Rung
盧錦隆
Lu, Chin-Lung
學位類別: 碩士
Master
系所名稱:
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 34
中文關鍵詞: 蛋白質序列比對圖形處理器
外文關鍵詞: Protein Alignment, GPU, CUDA
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 比對生物序列和蛋白質資料庫是生物資訊研究及應用裡一個重要的步驟,由於讀取序列技術快速的成長,序列資料變得難以處理。由NCBI提供的BLASTX是最知名的比對工具,因爲他的靈敏度很好。然而,用它來比對大量資料和資料庫會非常慢。

    2015年,由Buchfink Xie和Huson提出的軟體DIAMOND(Nature Methods, 2015), 顯著地加速比對程序且保持和BLASTX差不多的靈敏度,然而,用DIAMOND比對大量資料仍然很慢。已經有一些加速手段被研究去加速DIAMOND,例如,AC-DIAMOND(Mai et al., Proc BIBE, 2016)使用CPU SIMD指令並且得到四倍的加速;HAMOND(Yu et al., J. Biotechnology, 2017)把DIAMOND在Hadoop分散式系統上平行化。

    儘管最近使用GPU加速演算法的成功例子很多,卻沒有DIAMOND的GPU加速版本。
    在這篇論文中,我們提出有效率GPU加速DIAMOND的CU-DIAMOND。
    實驗結果顯示CU-DIAMOND在DIAMOND最花時間的部分加速了十倍,整體則得到比DIAMOND快了四倍的結果。
    這些結果都是保證和DIAMOND相同的靈敏度。


    Aligning biological sequences against a protein database is an im-
    portant step of bioinformatics research and applications. Due to
    the rapid growth of sequencing technologies, sequence data becomes
    more difficult to handle. BLASTX, a software provided by NCBI, is
    the most popular alignment tool due to its high sensitivity. However,
    it is too slow in aligning large dataset with database.

    In 2015, DIAMOND, a software proposed by Buchfink, Xie, and
    Huson (Nature Methods, 2015) , speeds up the alignment process
    significantly while maintaining similar sensitivity as BLASTX. How-
    ever, DIAMOND is still slow when the query data is large. Several
    acceleration techniques have been studied to improve the speed of
    DIAMOND. For instance, AC-DIAMOND (Mai et al., Proc. BIBE,
    2016) utilizes CPU SIMD instructions and reports a 4-fold overall
    speedup over DIAMOND; HAMOND (Yu et al., J. Biotechnology,
    2017) parallelizes DIAMOND on Hadoop distributed system.
    Despite the many recent successes in applying GPU technology
    to speed up algorithms, there is no GPU-accelerated version of DIA-
    MOND.

    In this thesis, we present CU-DIAMOND, an efficient GPU
    acceleration of DIAMOND. Experimental results show that CU-
    DIAMOND achieves a 10-fold speedup in the most time-consuming
    alignment part of DIAMOND, and gains a 4-fold overall speedup
    over DIAMOND (and a 33% speedup over AC-DIAMOND), while
    sensitivity remains the same.

    1 Introduction - 1 2 Preliminaries - 5 2.1 ProteinAlignment ................... 5 2.2 Smith-Waterman Alignment Algorithm . . . . . . . . 8 2.3 Seed-and-ExtendParadigm .............. 9 2.4 GPU architecture and CUDA programming model . . 9 2.5 SIMDinstructionsinCPU............... 13 3 Review on DIAMOND and AC-DIAMOND - 15 3.1 Indexing......................... 16 3.2 MatchFiltering ..................... 16 3.3 FinalScoring ...................... 17 3.4 Bottlenecks ....................... 20 3.5 AC-DIAMOND ..................... 20 4 Methods 21 4.1 Indexing......................... 21 4.2 FinalScoring ...................... 21 5 Experimental Results - 26 6 Conclusion and Further Work - 30

    [1] Stephen F. Altschul, Warren Gish, Webb Miller, Eugene W. Myers, and David J. Lipman. Basic local alignment search tool. Journal of Molecular Biology, 215(3):403 – 410, 1990.
    [2] Benjamin Buchfink, Chao Xie, and Daniel H Huson. Fast and sensitive protein alignment using diamond. Nature methods, 12(1):59, 2015.
    [3] Paolo Ferragina and Giovanni Manzini. Opportunistic data structures with applications. In Foundations of Computer Sci- ence, 2000. Proceedings. 41st Annual Symposium on, pages 390– 398. IEEE, 2000.
    [4] Osamu Gotoh. An improved algorithm for matching biological sequences. Journal of Molecular Biology, 162(3):705 – 708, 1982.
    [5] Daniel H. Huson and Chao Xie. A poor man’s blastx—high- throughput metagenomic protein database search using pauda. Bioinformatics, 30(1):38–39, 2014.
    [6] Ali Khajeh-Saeed, Stephen Poole, and J. Blair Perot. Accelera- tion of the Smith–Waterman Algorithm Using Single and Mul-
    31
    tiple Graphics Processors. Journal of Computational Physics, 229(11):4247–4258, 2010.
    [7] Chi-Man Liu, Thomas Wong, Edward Wu, Ruibang Luo, Siu- Ming Yiu, Yingrui Li, Bingqiang Wang, Chang Yu, Xiaowen Chu, Kaiyong Zhao, Ruiqiang Li, and Tak-Wah Lam. Soap3: ultra-fast gpu-based parallel alignment tool for short reads. Bioinformatics, 28(6):878–879, 2012.
    [8] Weiguo Liu, Bertil Schmidt, Gerrit Voss, and Wolfgang Mu ̈ller- Wittig. Gpu-clustalw: Using graphics hardware to accelerate multiple sequence alignment. In Yves Robert, Manish Parashar, Ramamurthy Badrinath, and Viktor K. Prasanna, editors, High Performance Computing - HiPC 2006, pages 363–374, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.
    [9] Yongchao Liu, Adrianto Wirawan, and Bertil Schmidt. Cu- dasw++ 3.0: accelerating smith-waterman protein database search by coupling cpu and gpu simd instructions. BMC Bioin- formatics, 14(1):117, Apr 2013.
    [10] Huijun Mai, Dinghua Li, Yifan Zhang, Henry Chi-Ming Le- ung, Ruibang Luo, Hing-Fung Ting, and Tak-Wah Lam. Ac- diamond: Accelerating protein alignment via better simd par- allelization and space-efficient indexing. In Francisco Ortun ̃o and Ignacio Rojas, editors, Bioinformatics and Biomedical En- gineering, pages 426–433, Cham, 2016. Springer International Publishing.
    32
    [11] Torbjørn Rognes. Faster smith-waterman database searches with inter-sequence SIMD parallelisation. BMC Bioinformat- ics, 12(1):221, Jun 2011.
    [12] Torbjørn Rognes and Erling Seeberg. Six-fold speed-up of Smith–Waterman sequence database searches using parallel pro- cessing on common microprocessors. Bioinformatics, 16(8):699– 706, 2000.
    [13] T.F. Smith and M.S. Waterman. Identification of com- mon molecular subsequences. Journal of Molecular Biology, 147(1):195 – 197, 1981.
    [14] Shuji Suzuki, Masanori Kakuta, Takashi Ishida, and Yutaka Akiyama. Ghostx: An improved sequence homology search al- gorithm using a query suffix array and a database suffix array. PLOS ONE, 9(8):1–8, 08 2014.
    [15] Jia Yu, Jochen Blom, Alexander Sczyrba, and Alexander Goes- mann. Rapid protein alignment in the cloud: Hamond combines fast diamond alignments with hadoop parallelism. Journal of Biotechnology, 257:58 – 60, 2017. Dedicated to Prof. Dr. Alfred Pu ̈hler on the occasion of his 75th birthday.
    [16] J. Zhang, H. Wang, and W. c. Feng. cublastp: Fine-grained par- allelization of protein sequence search on cpu+gpu. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 14(4):830–843, July 2017.
    [17] Yongan Zhao, Haixu Tang, and Yuzhen Ye. Rapsearch2: a fast 33
    and memory-efficient protein similarity search tool for next- generation sequencing data. Bioinformatics, 28(1):125–126, 2012.

    QR CODE