簡易檢索 / 詳目顯示

研究生: 王紹仲
Wang, Shao-Chumg
論文名稱: 異質多核心上之程式設計模型評估與設計
Evaluation and Design of Programming Models for Heterogeneous Multi-Core Systems
指導教授: 李政崑
Lee, Jenq Kuen
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2010
畢業學年度: 98
語文別: 英文
論文頁數: 36
中文關鍵詞: 異質多核心程式設計模型遠端程序呼叫遠端串流程序呼叫軟體快取立體視覺
外文關鍵詞: multi-core, heterogeneous, programming model, remote procedure call, streaming, software cache, stereo vision, belief propagation
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在異質多核心架構撰寫平行程式,需要面對許多的問題,像是 MPU 與 DSP 的不同指令集
    、不一致的記憶體架構、MPU 有作業系統所以看到的是虛擬記憶體而 DSP 看到的是實體記憶體。
    在此環境下撰寫程式需要工程師許多努力,才能達到好的效能,所以我們提出一個異質多核心
    程式設計模型,Multicore Software API(MSA) 與 Software Cache API,來幫助程式設計師撰寫平行程式。

    MSA 是一個中間層隱藏了底層的硬體資訊,MSA包括了3 個模組,遠端程序呼叫模組、 訊息傳遞模組
    與串流模組。遠端程序呼叫模組提供了應用程式介面可以讓使用者卸載程式至 DSP,同時也可以透過函式名稱呼叫在
    DSP 上的程序起來執行,而訊息傳遞與串流模組,則是提供了非串流與串流的資料傳輸方式,
    來讓程序彼此之間建立溝通管道交換資料。

    在嵌入式系統中,大量的存取外部記憶體會帶來效能的降低,
    Software Cache API 是被設計來處理愈來愈複雜的記憶體層級,
    它提供了應用程式介面來幫助程式設計師處理資料在外部與內部記憶體的流進流出,
    簡化軟體開發得難度,同時滿足高效能的需求。

    最後,我們在 SID-based 多核心模擬器上實做了 MSA 與 Software Cache API
    並以立體視覺應用程式來示範如何使用我麼的程式設計模型來開發平行程式。


    Abstract i Contents iii List of Figures v List of Tables vi 1 Introduction 1 1.1 Introduction . . . . . . . . . .. . . . . . . . . 1 1.2 Overview of the Thesis . . . . . . . . . . . . . . 2 2 The Design of Multi-core Software API 5 2.1 Invoking Remote Procedure . . . . . . . . . . . . 6 2.2 Task Communication . . . . . . . . . . . . . . . 8 2.3 Guided Programming Sample Using RPC and Communication Module . . . . . . . . . . . . . . . . . . . . . 10 3 Software-Managed Cache Support 14 3.1 Internal memory management . . . . . . . . . . . 15 3.2 Address lookup table . . . . . . . .. . . . . . . 16 3.3 Replacement Policy . . . . . . . .. . . . . . . . 16 3.4 Stride and Fast Block Access . . . . . . . . . .. 17 3.5 Guided Programming Sample . . . . . . . . . . . . 17 4 Case Study 20 4.1 NAS Parallel Benchmarks . . . . . . . . . . . . . 20 4.1.1 Overview of NAS Parallel Benchmarks . . . 20 4.1.2 NAS Parallel Benchmarks inMSA . . . . . . . 21 4.2 stereo vision with belief propagation . . . . . . 22 4.2.1 Overview of the Belief Propagation . . . . 22 4.2.2 Parallelize Belief Propagation in MSA . . . 24 5 Experiment Results 25 5.1 Experiment Environment . . . . . . . . . . . . . 25 5.2 Experiment . . . . . . . . . . . . . . . . . . . 26 5.2.1 Performance of NAS benchmark . . . . . . . 26 5.2.2 Performance of belief propagation with stereo vision. . . . . . . . . . . . . . . . . . 27 5.2.3 Performance of software cache . . . . . . 29 6 Conclusion 32 6.1 Summary . . . . . . . . . . . . . . . . . . . . . 32 6.2 FutureWork . . . . . . . . . . . . . . . . . . . 33

    [1] Texas Instruments, “OmapTM4 mobile applications
    platform,” 2009.
    [2] Qualcomm, “The snapdragon platform,” 2010. [Online].
    Available:
    http://www.qctconnect.com/products/snapdragon.html
    [3] T. Lin, C. Liu, S. Tseng, Y. Chu, and A.Wu, “Overview
    of ITRI PAC project–from VLIW DSP processor to
    multicore computing platform,”in Proc. IEEE Int. Symp.
    VLSI Des., Automation, and Test, 2008, pp.188–191.
    [4] J. Nickolls, I. Buck, M. Garland, and K. Skadron,
    “Scalable parallel programming with cuda,” Queue, vol.
    6, no. 2, pp. 40–53, 2008.
    [5] I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian,
    M. Houston, and P. Hanrahan, “Brook for gpus: stream
    computing on graphics hardware,”in SIGGRAPH ’04: ACM
    SIGGRAPH 2004 Papers. NewYork, NY, USA: ACM, 2004, pp.
    777–786.
    [6] A. Munshi, “Opencl: Parallel computing on the gpu and
    cpu.” SIGGRAPH,2008.
    [7] A. D. Reid, K. Flautner, E. G. Evans, and Y. Lin, “Soc-
    c: efficient programming abstractions for heterogeneous
    multicore systems on chip,”in CASES ’08: Proceedings
    of the 2008 international conference on Compilers,
    architectures and synthesis for embedded systems.
    NewYork, NY, USA: ACM, 2008, pp. 95–104.
    [8] W. Thies, M. Karczmarek, and S. Amarasinghe,
    “Streamit: A language for streaming applications,” in
    Compiler Construction, ser. Lecture Notes in Computer
    Science, R. N. Horspool, Ed. Berlin, Heidelberg:
    Springer Berlin Heidelberg, March 2002, vol. 2304, ch.
    14, pp.49–84.
    [9] K. Fatahalian, D. R. Horn, T. J. Knight, L. Leem, M.
    Houston, J. Y. Park, M. Erez, M. Ren, A. Aiken, W. J.
    Dally, and P. Hanrahan, “Sequoia: programming the
    memory hierarchy,” in SC ’06: Proceedings of
    the 2006 ACM/IEEE conference on Supercomputing. New
    York, NY, USA: ACM, 2006, p. 83.
    [10] K.-Y. Hsieh, Y.-C. Liu, P.-W. Wu, S.-W. Chang, and J.
    K. Lee, “Enabling streaming remoting on embedded dual-
    core processors,” in Parallel Processing, 2008.
    ICPP ’08. 37th International Conference on, Sept.
    2008, pp. 35–42.
    [11] T. Mattson, B. Sanders, and B. Massingill, Patterns
    for parallel programming. Addison-Wesley Professional,
    2004.
    [12] C. A. Moritz, M. Frank, M. M. Frank, W. Lee, and S.
    Amarasinghe, “Hot pages: Software caching for raw
    microprocessors,” 1999.
    [13] D. Patterson and J. Hennessy, Computer Organization
    and Design: The Hardware/software Interface. Morgan
    Kaufmann, 2005.
    [14] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient
    belief propagation for early vision,” Computer Vision
    and Pattern Recognition, IEEE Computer Society
    Conference on, vol. 1, pp. 261–268, 2004.
    [15] J. Sun, N. Zheng, and H. Shum, “Stereo matching using
    belief propagation,”IEEE Transactions on Pattern
    Analysis and Machine Intelligence, vol. 25, no. 7, pp.
    787–800, 2003.
    [16] C.-W. Huang, W.-K. Shih, Y. Hsu, and J.-K. Lee,
    “Configurable sidbased multi-core simulators for
    embedded system education,”in Workshop on Embedded
    Systems Education’09, Grenoble, France, 2009.
    [17] D. C.-W. Chang, “PAC digital signal processor,” in
    Proceedings of Fall Microprocessor Forum, 2006.
    [18] K. Hsieh, Y. Lin, C. Huang, and J. Lee, “Enhancing
    microkernel performance on vliw dsp processors via
    multiset context switch,” Journal of Signal
    Processing Systems, vol. 51, no. 3, pp. 257–268, 2008.
    [19] Y.-C. Lin, C.-L. Tang, C.-J. Wu, M.-Y. Hung, Y.-P.
    You, Y.-C. Moo, S.-Y. Chen, and J. K. Lee, “Compiler
    supports and optimizations for PAC VLIW DSP
    processors,” in Proceedings of the 18th International
    Workshop on Languages and Compilers for Parallel
    Computing, 2005.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE