簡易檢索 / 詳目顯示

研究生: 林冠吾
Lin, Kuan-Wu
論文名稱: FastQuery調整及動態分配利用混和平行機制
FastQuery Tuning and dynamic scheduling with hybrid parallel mechanism
指導教授: 周志遠
Chou, Jerry
口試委員: 李哲榮
Che-Rung Lee
蕭宏章
Hung-Chang Hsiao
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2014
畢業學年度: 103
語文別: 英文
論文頁數: 43
中文關鍵詞: FastQueryI/O 效能調參數
外文關鍵詞: I/O performance, Tuning
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • FastQuery是一個由我們所開發的平行索引和查詢系統,用來加
    速分析及視覺化科學資料。我們已經把它用在許多不同的高速運算應 用程式,也用一個兆等級的模擬實驗展現了他的彈性及能力。但是在 我們實驗裡可以看到I/O的結果因為參數不同而有嚴重的影響。在這 篇論文我們將會介紹如何調整參數以及對每個參數做分析還有討論 他們的影響。我們將會展示調整過的參數對結果有重大的影響。


    FastQuery is a parallel indexing and querying system we developed for accelerating analysis and visualization of scientific data. We have applied it to a wide variety of HPC applications and demonstrated its capability and scalability using a petas- cale trillion-particle simulation in our previous work. Yet, through our experience, we found that performance of reading and writing data with FastQuery, like many other HPC applications, could be significantly affected by various tunable param- eters throughout the parallel I/O stack. In this paper, we describe our success in tuning the performance of FastQuery on a Lustre parallel file system. We study and analyze the impact of parameters and tunable settings at file system, MPI-IO library, and HDF5 library levels of the I/O stack. We demonstrate that a combined optimization strategy is able to improve performance and I/O bandwidth of Fast- Query significantly. In our tests with a trillion-particle dataset, the time to index the dataset reduced by more than one half. We also provide a hybrid architecture for overlaying the CPU and IO time. FastQuery builds indexes iteratively, so the overall performance is bound by cost of each iteration. We combine thread and MPI to overcome this limitation. The results show that hybrid architecture can overlay the CPU and I/O time and has significant improvement.

    1 Introduction 5 2 Background and Motivation 8 2.1 SearchinginScientificData ....................... 8 2.2 TuningparallelI/Operformance..................... 9 2.3 ApplicationUseCase:VPIC....................... 10 3 Tuning Options 13 3.1 FastQueryoverview............................ 13 3.2 FastQueryParallelI/OStrategy..................... 15 3.3 HDF5andMPI-IOCollectiveI/O.................... 17 3.4 LustreStriping .............................. 19 4 Hybrid Parallelism Architecture 20 5 Experimental Setup 22 5.1 TuningoptimizationTestbed....................... 22 5.2 Datasets.................................. 23 5.3 Methodology ............................... 24 6 Tuning Optimization Experimental Results 26 6.1 OverallPerformanceEvaluation..................... 27 6.2 StripeCount ............................... 28 6.3 StripeSize&CollectiveBuffering .................... 30 6.4 ThreadCountperMPITask....................... 31 7 Hybrid Parallelism Experimental Results 34 7.1 Overall................................... 34 1 7.2 Monitor .................................. 35 7.3 Histogramdispatch............................ 36 7.4 Overalldispatch.............................. 37 8 Conclusions 39

    [1] IPCC Fifth Assessment Report. http://en.wikipedia.org/wiki/IPCC_ Fifth_Assessment_Report.
    [2] B. Behzad, J. Huchette, H. Luu, R. Aydt, S. Byna, Y. Yao, Q. Koziol, and Prabhat. A framework for auto-tuning hdf5 applications. In HPDC, 2013. https://sdm.lbl.gov/~sbyna/research/papers/hpdc2013.pdf.
    [3] K. J. Bowers, B. J. Albright, L. Yin, B. Bergen, and T. J. T. Kwan. Ultra- high performance three-dimensional electromagnetic relativistic kinetic plasma simulation. Physics of Plasmas, 15(5):7, 2008.
    [4] S. Byna, J. Chou, O. Ru ̈bel, Prabhat, H. Karimabadi, W. S. Daughton, V. Roytershteyn, E. W. Bethel, M. Howison, K.-J. Hsu, K.-W. Lin, A. Shoshani, A. Uselton, and K. Wu. Parallel i/o, analysis, and visualization of a trillion particle simulation. In SC, pages 59:1–59:12, 2012.
    [5] Y. Chen, M. Winslett, Y. Cho, and S. Kuo. Automatic parallel I/O performance optimization using genetic algorithms. In HPDC, pages 155 –162, jul 1998.
    [6] Y. Chen, M. Winslett, Y. Cho, S. Kuo, and C. Y. Chen. Automatic Parallel I/O Performance Optimization in Panda. In In Proceedings of the 10th Annual ACM Symposium on Parallel Algorithms and Architectures, pages 108–118, 1998.
    [7] Y. Chen, M. Winslett, S.-w. Kuo, Y. Cho, M. Subramaniam, and K. Seamons. Performance modeling for the panda array I/O library. In SC, 1996.
    [8] J. Chou, K. Wu, and Prabhat. FastQuery: A general indexing and querying system for scientific data. In SSDBM, pages 573–574, 2011.
    [9] J. Chou, K. Wu, and Prabhat. FastQuery: A parallel indexing system for scientific data. In Proceedings of Workshop on Interfaces and Abstractions for Scientific Data Storage, 2011.
    [10] J. Chou, K. Wu, O. Ru ̈bel, M. Howison, J. Qiang, Prabhat, B. Austin, E. W. Bethel, R. D. Ryne, and A. Shoshani. Parallel index and query for large scale data analysis. In SC, pages 30:1–30:11, 2011.
    [11] D. Comer. The ubiquitous b-tree. ACM Comput. Surv., 11(2):121–137, June 1979.
    [12] Cray. Getting Started on MPI I/O, Dec. 2009. CrayDoc S-2490-40.
    [13] C. M. Herb Wartens, Jim Garlick. LMT - The Lustre Monitoring Tool. https:
    //github.com/chaos/lmt/wiki.
    [14] T. Hey, S. Tansley, and K. Tolle, editors. The Fourth Paradigm: Data-Intensive
    Scientific Discovery. Microsoft, Oct. 2009.
    [15] M. Howison, Q. Koziol, D. Knaak, J. Mainzer, and J. Shalf. Tuning HDF5 for Lustre File Systems. In Proceedings of Workshop on Interfaces and Abstractions for Scientific Data Storage, Heraklion, Crete, Greece, Sept. 2010. LBNL-4803E.
    [16] M. Howison, Q. Koziol, D. Knaak, J. Mainzer, and J. Shalf. Tuning HDF5 for Lustre File Systems. In Proceedings of Workshop on Interfaces and Abstractions for Scientific Data Storage, Heraklion, Crete, Greece, Sept. 2010. LBNL-4803E.
    [17] J. Kim, H. Abbasi, L. Chaco ́n, C. Docan, S. Klasky, Q. Liu, N. Podhorszki, A. Shoshani, and K. Wu. Parallel in situ indexing for data-intensive computing. In LDAV, pages 65–72. IEEE, 2011.
    [18] J. Li, W. keng Liao, A. Choudhary, R. Ross, R. Thakur, W. Gropp, R. Latham, A. Siegel, B. Gallagher, and M. Zingale. Parallel netCDF: A high-performance scientific I/O interface. In SC, page 39, 2003.
    [19] W.-k. Liao and A. Choudhary. Dynamically adapting file domain partition- ing methods for collective i/o based on underlying parallel file system locking protocols. In SC, pages 3:1–3:12, 2008.
    [20] J. F. Lofstead, S. Klasky, K. Schwan, N. Podhorszki, and C. Jin. Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS). In CLADE, pages 15–24, 2008.
    [21] J. Mache, V. Lo, and S. Garg. The impact of spatial layout of jobs on I/O hotspots in mesh networks. JPDC, 65(10):1190–1203, Oct. 2005.
    [22] P. O’Neil and E. O’Neil. Database: principles, programming, and performance. Morgan Kaugmann, 2nd edition, 2000.
    [23] P. E. O’Neil. Model 204 architecture and performance. In Proceedings of the 2nd International Workshop on High Performance Transaction Systems, pages 40–59, London, UK, UK, 1989. Springer-Verlag.
    [24] A. Shoshani and D. e. Rotem. Scientific Data Management: Challenges, Tech- nology, and Deployment. Chapman & Hall/CRC Press, 2009.
    [25] K. Stockinger, J. Shalf, W. Bethel, and K. Wu. Query-driven visualization of large data sets. In IEEE Visualization, pages 167–174, Oct. 2005.
    [26] The HDF Group. HDF5 user guide. http://hdf.ncsa.uiuc.edu/HDF5/doc/ H5.user.html.
    [27] Unidata. The NetCDF users’ guide. http://www.unidata.ucar.edu/ software/netcdf/docs/netcdf/.
    [28] VisIt Visualization Tool. https://wci.llnl.gov/codes/visit/.
    [29] K. Wu. FastBit: an efficient indexing technology for accelerating data-intensive
    science. Journal of Physics: Conference Series, 16:556–560, 2005.
    [30] K. Wu, E. Otoo, and A. Shoshani. Optimizing bitmap indices with efficient
    compression. ACM Transactions on Database Systems, 31:1–38, 2006.
    [31] W. Yu, J. Vetter, and H. Oral. Performance characterization and optimization
    of parallel I/O on the Cray XT. In IPDPS, pages 1–11, Apr. 2008.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE