簡易檢索 / 詳目顯示

研究生: 簡孝偉
Chien, Hsiao Wei
論文名稱: 應用於多核心雛型平台之內嵌式追蹤器設計
Design of Embedded Tracer for Many-Core Prototyping Platform
指導教授: 黃稚存
Huang, Chih Tsun
口試委員: 黃俊達
Huang, Juinn Dar
劉靖家
Liou, Jing Jia
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2016
畢業學年度: 104
語文別: 英文
論文頁數: 91
中文關鍵詞: 多核心追蹤器
外文關鍵詞: many-core, tracer
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   隨著矽製程不斷進步,系統晶片設計可以整合處理器個數隨之增加,使得設計的複雜度也上升,而晶片網路多核心系統晶片設計大幅降低設計複雜度,也是現今常見的架構。傳統追蹤和除錯的方法注重在處理器和通訊網路上面,但是運行在多核心上的平行程式也是一個重要的部分。然而在多核心系統,對軟硬體做交互追蹤和除錯是非常有挑戰性的工作。

      在這篇論文中,我們基於多核心雛形平台上提出一個追蹤通訊網路傳輸狀況和核心之間溝通行為的嵌入式追蹤器。另外也改進前一代追蹤器的一些缺陷,例如:增進追蹤資料的資訊含量、簡化追蹤器設計。我們提出的追蹤器架構包含追蹤設定元件、通訊追蹤元件、通訊網路節點追蹤元件和追蹤匯流排系統。追蹤設定元件提供使用者能在系統運行中設定追蹤器元件;通訊追蹤元件監視處理單元的通訊元件的通訊事件;通訊網路節點追蹤元件則是用來監視在通訊網路傳輸上,使用者感興趣的事件;追蹤匯流排系統則會將通訊追蹤元件和通訊網路節點追蹤元件產生的追蹤資料加以包成封包,並將其儲存到嵌入式追蹤緩衝。

      在我們的個案研究中,顯示可以快速從我們提出的追蹤器產生的追蹤資料中做效能分系和除錯。在實驗分析部分,顯示我們提出的追蹤器減少87.96%到99.98%的通訊網路追蹤資料量。以及使用我們提出的時間高分支方法,可以減少85.72%到95.19%的時間資訊封包。此外,我們提出的追蹤器面積只佔多核心系統平台的1%,而且不會導致效能下降。


    Due to ever increasing the advance of silicon technology, the complexity of System-on-a-Chip (SOC) design keeps growing rapidly as the number of cores increases. Many-core SoC with Network-on-Chip (NoC) is popular used in today. However, traditional verification and debugging approaches aim at individual cores or on-chip communication network, but the parallel application is also important on many-core systems. The verification of design and the debugging cooperating with parallel application and hardware with a lot of processor cores is the most of challenging tasks and has become extremely important for many-core systems.

    In the thesis, we present the architecture of the proposed embedded tracer on the proposed many core platform and methodology for application-level communication that focus on NoC's traffic situation and inter-core communication. We also improved several lacks on the previous tracer. For example, finding out the cause of packet congestion needs huge trace data, and previous communication event trace loses some information useful. The proposed communication tracer consist of trace configuration unit, communication trace unit, NoC switch trace unit and trace bus system. Trace configuration Unit provides to configure tracer at run-time. Communication trace unit monitors the communication event on each PEs' communication unit. NoC switch trace unit is used to monitor transaction on the NoC which users may be interested in. Trace bus system is used to packet trace data from communication trace unit and NoC switch trace unit, and store them to embedded trace buffer.

    The case study shows that the proposed communication tracer can effectively collect information form the communication unit and the NoC to inter-PE communication performance analysis and debug. Experiment result shows that the reduction ratio of NoC trace data in our proposed communication tracer can achieve between 87.96% to 99.98% depended on various cases. By using our proposed time high offset method can reduce time beacon from 85.72% to 95.19%. In addition, the area of the proposed communication tracer cost is only 1% compared to the many-core platform without speed penalty.

    1 Introduction 1 1.1 Introduction of SoC Tracing and Debugging . . . . . . . . . . . . . . . 1 1.2 The Challenge of Tracing and Debugging on Many-Core SoC . . . . . . . . 3 1.3 The Lack of Previous Work . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 Motivation and Contribution . . . . . . . . . . . . . . . . . . . . . . 6 1.5 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . 7 2 Previous Work 8 2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.1 OpenRISC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.2 Bus Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.2.1 Wishbone . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.2.2 Open Core Protocol . . . . . . . . . . . . . . . . . . . . 12 2.1.2.3 Advanced Trace Bus Protocol . . . . . . . . . . . . . . . 15 2.2 Overview of Existent Many-Core Platform . . . . . . . . . . . . . . . . 17 2.2.1 Processing Element . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.2 Communication Unit . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.3 Network on Chip . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3 On-chip Communication Library . . . . . . . . . . . . . . . . . . . . . 22 2.4 Overview of Previous Many-Core Tracer . . . . . . . . . . . . . . . . . 24 2.4.1 Communication Event Generator . . . . . . . . . . . . . . . . . . 24 2.4.2 NoC Event Generator . . . . . . . . . . . . . . . . . . . . . . . 28 3 Component of Proposed Communication Tracer and Implementation 29 3.1 Overview of Communication Tracer . . . . . . . . . . . . . . . . . . . . 29 3.2 Trace Configuration Engine . . . . . . . . . . . . . . . . . . . . . . . 31 3.3 Communication Trace Unit . . . . . . . . . . . . . . . . . . . . . . . . 34 3.3.1 Communication Event monitor . . . . . . . . . . . . . . . . . . . 36 3.3.2 FIFO Checking and Utilization Unit . . . . . . . . . . . . . . . . 37 3.4 NoC Switch Tracer . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.4.1 The Top View of NoC Switch Tracer . . . . . . . . . . . . . . . . 38 3.4.2 NoC Switch Trace Unit . . . . . . . . . . . . . . . . . . . . . . 40 3.4.3 Hop Count Generator & Turn Direction Generator . . . . . . . . . . 42 3.4.4 NoC Switch Tracer implement in NoC with FIFO Channel . . . . . . . 43 3.5 Pending Switch, Hierarchical Trace Bus, Funnel and Embedded Trace Buffer 43 3.6 The Multiple Clock Domain Issue . . . . . . . . . . . . . . . . . . . . 46 4 Proposed Tracing and Debugging Methodology and Demonstrations 49 4.1 The Debugging and Tracing Flow on Many-Core Platform . . . . . . . . . . 49 4.2 off-line Analyzing . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.3 Demonstration Environment and Setup for Debugging and Tracing . . . . . 53 4.4 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.4.1 FIFO Control Bug . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.4.2 Odd-Even Sort . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.4.2.1 Introduction of Odd-Even Sort . . . . . . . . . . . . . . 60 4.4.2.2 Trace Data Analysis and Improvement . . . . . . . . . . . 60 4.4.2.3 Result of Improvement . . . . . . . . . . . . . . . . . . 63 4.4.3 3D Parallel Graphics Pipeline Program . . . . . . . . . . . . . . 65 4.4.3.1 Introduction of 3D Parallel Graphics Pipeline Program . . 65 4.4.3.2 Loading Balancing and Performance Issue . . . . . . . . . 67 4.4.3.3 Trace Data Analysis and Improvement . . . . . . . . . . . 69 4.4.3.4 Result of Improvement . . . . . . . . . . . . . . . . . . 73 5 Experiment Result and Analysis 77 5.1 The Reduction Rate of Trace Data . . . . . . . . . . . . . . . . . . . . 77 5.1.1 The Reduction Rate of NoC Trace Data . . . . . . . . . . . . . . . 77 5.1.2 The Reduction Rate of Time Beacon Packet . . . . . . . . . . . . . 79 5.2 The Rate of Generating Trace Data . . . . . . . . . . . . . . . . . . . 80 5.3 Synthesis Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 6 Conclusion and Future Work 83 6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Appendices 85 Appendix .A Global Addressing Space on Many-Core Platform . . . . . . . . . . 86 Appendix .B Communication Unit Addressing Space Used in A Processing Element . 87

    [1] H. Holzapfel, “On-chip, at-speed, debug and DFT support for OCP-based SoCs,” in Design, Automation and Test in Europe Conference and Exhibition, 2006. Proceedings. DAFCA, 2006.
    [2] L. Benini and G. De Micheli, “Networks on chips: a new soc paradigm,” Computer, vol. 35, no. 1, pp. 70–78, Jan 2002.
    [3] A. Hopkins and K. McDonald-Maier, “Debug support for complex systems on-chip: a review,” Computers and Digital Techniques, IEE Proceedings -, vol. 153, no. 4, pp. 197–207, July 2006.
    [4] I.-J. Huang, C.-F. Kao, H.-M. Chen, C.-N. Juan, and T.-A. Lu, “A retargetable em- bedded in-circuit emulation module for microprocessors,” Design Test of Computers, IEEE, vol. 19, no. 4, pp. 28–38, July 2002.
    [5] L.-B. Chen, Y.-C. Liu, C.-H. Chen, C.-F. Kao, and I.-J. Huang, “Parameterized embed- ded in-circuit emulator and its retargetable debugging software for microprocessor/mi- crocontroller/dsp processor,” in Design Automation Conference, 2008. ASPDAC 2008. Asia and South Pacific, March 2008, pp. 117–118.
    [6] IEEE Standards Association et al., “IEEE Standard Test Access Port and Boundary Scan Architecture,” IEEE Std 1149.1-2001, pp. 1–212, July 2001.
    [7] B. Vermeulen, “Functional debug techniques for embedded systems,” Design Test of Computers, IEEE, vol. 25, no. 3, pp. 208–215, May 2008.
    [8] M. Abramovici, “In-system silicon validation and debug,” Design Test of Computers, IEEE, vol. 25, no. 3, pp. 216–223, May 2008.
    [9] CoreSight Architecture Specification v2.0, ARM Limited, 2013.
    [10] S. Tang and Q. Xu, “A multi-core debug platform for NoC-based systems,” in Design, Automation Test in Europe Conference Exhibition, 2007. DATE ’07, April 2007, pp. 1–6.
    [11] ——, “A debug probe for concurrently debugging multiple embedded cores and inter- core transactions in NoC-based systems,” in Design Automation Conference, 2008. AS- PDAC 2008. Asia and South Pacific, March 2008, pp. 416–421.
    [12] K. Goossens, B. Vermeulen, R. van Steeden, and M. Bennebroek, “Transaction-based communication-centric debug,” in Networks-on-Chip, 2007. NOCS 2007. First Interna- tional Symposium on, May 2007, pp. 95–106.
    [13] B. Vermeulen and K. Goossens, “A Network-on-Chip monitoring infrastructure for communication-centric debug of embedded multi-processor SoCs,” in VLSI Design, Au- tomation and Test, 2009. VLSI-DAT ’09. International Symposium on, April 2009, pp. 183–186.
    [14] K.-J. Lee, S.-Y. Liang, and A. Su, “A low-cost SOC debug platform based on on-chip test architectures,” in SOC Conference, 2009. SOCC 2009. IEEE International, Sept 2009, pp. 161–164.
    [15] M. Neishaburi and Z. Zilic, “An enhanced debug-aware network interface for Network- on-Chip,” in Quality Electronic Design (ISQED), 2012 13th International Symposium on, March 2012, pp. 709–716.
    [16] J.-S. Lin, “Design of transaction-level embedded tracer for many-core processors,” Mas- ter’s thesis, Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan, September 2014.
    [17] Y.-H. Chen, “Design and analysis of inter-pe communication on many-core platform,” Master’s thesis, Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan, November 2012.
    [18] OpenRISC 1000 Architecture Manual v1.1, OPENCORES.ORG, April 2014.
    [19] WISHBONE SoC Architecture Specification, Revision B.3, OPENCORES.ORG and Silicore, September 2002.
    [20] Open Core Protocol specification, Release 2.2, OCP-IP Alliance, 2006.
    [21] “Open core protocol (ocp) an introduction to interface spec- ification,” OCP-IP Meta Data Working Group, Jane 2010, http://ocpip.org/uploads/dynamic areas/Ct9Rr6XmkN84Y6MvTouu/947/OCP- HPCA 2010.pdf.
    [22] AMBA4 ATB Protocol Specification ATBv1.0 and ATBv1.1, ARM Limited, 2012.
    [23] P.-Y. Chen, “RTL realization of NoC-based multi-core platform,” Master’s thesis, De- partment of Computer Science, National Tsing Hua University, Hsinchu, Taiwan, Oc- tober 2011.
    [24] NoC solutuon 1.16 NoCcompiler User’s Guide, Arteris S.A., February 2009.
    [25] NoC solutuon 1.16 NoC Transaction and Transport Protocol Technical Reference, Ar- teris S.A., February 2009.
    [26] NoC solutuon 1.16 OCP Network Interface Units Technical Reference, Arteris S.A., February 2009.
    [27] NoC solutuon 1.16 Packet Transport Units Technical Reference, Arteris S.A., February 2009.
    [28] K.-C. Tasi, “Transaction-level embedded tracer architecture for NoC-based many-core platform,” Master’s thesis, Department of Computer Science, National Tsing Hua Uni- versity, Hsinchu, Taiwan, November 2013.
    [29] C.-T. Huang, K.-C. Tasi, J.-S. Lin, and H.-W. Chien, “Application-level embedded communication tracer for many-core systems,” in Design Automation Conference (ASP- DAC), 2015 20th Asia and South Pacific, Jan 2015, pp. 803–808.
    [30] “Odd-even sort,” Wikipedia, http://en.wikipedia.org/wiki/Odd-even sort.
    [31] “Graphics pipeline,” Wikipedia, http://en.wikipedia.org/wiki/Graphics pipeline.
    [32] R.-R. Lee and Y. Lo, “Load balancing graphics rendering process on a many-core archi- tecture,” in International Research Conference on Information Technology and Com- puter Sciences (IRCITCS 2013), Sept 2013.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE