簡易檢索 / 詳目顯示

研究生: 林承諺
Lin, Cheng-Yen
論文名稱: 異質多核心功耗模擬器之設計與實驗
The Design and Experiments of A Heterogeneous Multicore Power Simulator
指導教授: 李政崑
Lee, Jenq Kuen
口試委員: 黃錫瑜
楊武
張志偉
蘇泓萌
賴尚宏
黃慶育
許雅三
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2015
畢業學年度: 103
語文別: 英文
論文頁數: 69
中文關鍵詞: 多核心系統嵌入式系統數位訊號處理器模擬器功耗優化技術系統工具軟體
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 嵌入式多核心系統在現今的消費電子產品設計中扮演著越來越重要的角色,這樣的系統配備著多個異質運算單元,用以滿足現代新興應用的運算需求,此類產品往往是配備著電池的行動裝置,所以在應用開發人員的優化目標上必須同時考量效能以及功耗優化技術。然而,目前廣泛被軟體應用開發人員所使用的SID模擬器開發平台,卻缺少了功耗估算的功能,使得應用開發人員沒辦法利用這平台來開發功耗優化的相關技術。

    在本論文中,我們利用了SID模擬器開發工具,開發了一個異質多核心的功耗模擬器,我們的功耗模擬器除了提供全系統的模擬功能之外,亦可以針對系統上模擬的相關硬體IP,進行功耗估算的動作,其中包含了主要運算核心單元、數位訊號處理器、指令快取、記憶體系統、以及相關的周邊。我們的功耗估算流程主要是分為兩階段,分別是IP功耗模型的建立以及系統功耗模擬,在第一階段IP功耗模型的建立中,我們使用了PowerMixer{IP}這套工具,來針對系統上模擬的硬體IP,一一的建立起相對應的功耗模型,這些功耗模型之後被應用於第二階段的系統功耗模擬中,透過我們所提出的方法來分析模擬過程中產生的相關紀錄檔,從中萃取出影響功耗的相關參數,再藉由我們開發的CPE功耗分析模組,而得到最後功耗估算的數據,此外我們也提供了一套功耗分析的函式,讓軟體應用開發人員,可以任意的呼叫使用,針對特定的程式碼進行功耗分析,進而優化其軟體所消耗的功耗。我們的異質多核心功耗模擬器可以同時平行模擬多顆的數位訊號處理器,為了維持一定的模擬效能,我們也開發了一個管理元件,將每個模擬的數位訊號處理器一一對應到模擬主機上的執行緒上來執行,同時此管理元件也會處理數位訊號處理器彼此間共享資源的存取需求以及進行同步的功能。

    最後我們也藉由一系列的實驗來展示此異質多核心功耗模擬器的功能,首先我們利用了DSPstone benchmark來討論不同的編譯器優化選項對於數位訊號處理應用軟體的效能與功耗影響,再來我們也評估了一些多核心應用軟體,其中包含了FIR數位濾波器、人臉辨識暨表情誇張化應用以及前車偵測應用,在這過程中我們展示了軟體應用開發人員可藉由我們的功耗模擬器所提供的不同面相功耗評估數據來開發相對應的功耗優化技術。


    Embedded multicore systems are playing increasingly important roles in the design of consumer electronics.
    Such systems may provide heterogeneous computation power to meet the performance demand of modern applications.
    As many of the devices are with batteries, the objective of such systems is to optimize both performance and power characteristics of mobile devices.
    However, currently there are no power metrics supporting popular application design platforms (such as SID), that application
    developers used to develop their applications. This hinders the ability of application developers to optimize power consumption.
    In this thesis we present the design and experiments of
    a SID-based power-aware simulation framework for heterogeneous multicore systems. The proposed power simulator allows a full system simulation with power
    estimation support for targeting the intellectual properties (IP) used
    in the platform, including a main processing unit, digital signal processors (DSPs), instruction cache,
    memory subsystems, interconnections, and DMA.
    Our power estimation flow includes two phases, IP-level
    power modeling and power-aware system simulation. The first phase employs
    PowerMixer{IP} to construct the power model for the processor IP and other major IPs, while the second phase involves a power abstract interpretation method for summarizing the simulation trace, then with a CPE module estimating the power consumption based on the summarized trace information and the input of IP power models.
    In addition, a Manager component is devised to
    map each DSP component to a host thread
    and maintain the access to shared resources. The aim is to
    maintain the simulation performance as the number of
    simulated DSP components increases.
    A power-profiling API is also supported
    that developers of embedded software can use to tune the granularity of power-profiling
    for a specific code section of the target application.
    We demonstrate via case studies and experiments how application developers can
    use our SID-based power simulator for optimizing the power consumption of their applications.
    We characterize the power consumption of DSP applications with the DSPstone benchmark and discuss how compiler optimization levels with SIMD intrinsics influence the performance and power consumption. Also, we evaluate a wide range of multicore applications to demonstrate how our power simulator can be used by developers in the optimization process to illustrate different views of
    power dissipations of applications. Finally, we summarize the major contributions of this thesis as follows:

    - We propose a SID-based heterogeneous multicore power simulator. We incorporate two processors
    into the virtual platform to simulate a heterogeneous multicore system: a 32-bit Andes processor
    and a PAC-DSP. Our design allows simultaneous simulation of multiple PAC-DSP components by using a parallel
    simulation approach. A PAC Manager component is devised to map each PAC-DSP component to a host thread and maintain the accessing of shared resources. This makes it possible to maintain the simulation performance as the number of simulated PAC-DSP component increases.

    -We construct the power model of each target system IP includes, the Andes processor, PAC-DSPs, memory subsystem,
    interconnection, and DMA. Hence our power simulator allows a full system simulation with power estimation support
    for these targeting IPs that used in the virtual platform.

    -We also propose a power-profiling API to help developers of embedded software to tune the granularity of power-profiling for a specific code section of the target application.

    -We evaluate a wide range of DSP and multicore applications that include DSPstone, dual-core FIR application, multicore RMS, and vehicle detection application, in order to demonstrate how application developers can use our power simulator to exploit power optimizations in their applications.

    中文摘要iii Abstract v Acknowledgement viii 1 Introduction 1 2 Related Work 7 2.1 Parallel Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Power Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Related Processor Technology . . . . . . . . . . . . . . . . . . . . . . 9 3 SID-Based Multicore Simulator 11 3.1 Overview of SID Simulation Framework . . . . . . . . . . . . . . . . 11 3.2 Parallel Multicore Simulation . . . . . . . . . . . . . . . . . . . . . . 13 4 Multicore Power Simulator 22 4.1 IP-Level Power Modeling . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2 Power-Aware Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 26 5 Exploiting the Multicore Power Simulator 32 5.1 Power-Consumption Characterization for DSP Applications . . . . 33 5.2 Parallel Patterns for Low Power . . . . . . . . . . . . . . . . . . . . . 38 5.2.1 Case Study: Pipe and Filter . . . . . . . . . . . . . . . . . . . 39 5.2.2 Case Study: Shared Coefficient Object . . . . . . . . . . . . . 48 5.2.3 Case Study: Puppeteer Pattern . . . . . . . . . . . . . . . . . 54 5.3 Simulation Performance Evaluation . . . . . . . . . . . . . . . . . . . 58 6 Conclusion 62 Bibliography 64

    [1] Andes Tech. Andescore n1213-s product brief, http://www.andestech.com/en/products/. 2010.
    [2] Fabrice Bellard. Qemu, a fast and portable dynamic translator. In ATEC ’05: Proceedings of the annual conference on USENIX Annual Technical Conference, pages 41–46, Berkeley, CA, USA, 2005. USENIX Association.
    [3] A. Bona, M. Sami, D. Sciuto, C. Silvano, V. Zaccaria, and R. Zafalon. Reducing the complexity of instruction-level power models for vliw processors. Design Automation for Embedded Systems, 10:49–67, 2005.
    [4] David Brooks, Vivek Tiwari, and Margaret Martonosi. Wattch: a framework for architectural-level power analysis and optimizations. In Proceedings of the 27th annual international symposium on Computer architecture, ISCA ’00, pages 83–94, 2000.
    [5] Doug Burger, Todd M. Austin, and Steve Bennett. Evaluating future microprocessors: the simplescalar tool set. Technical report, 1996.
    [6] J. Adam Butts and Gurindar S. Sohi. A static power model for architects. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture, MICRO 33, pages 191–201, New York, NY, USA, 2000. ACM.
    [7] David Chih-Wei Chang. PAC digital signal processor. In Proceedings of the Fall Microprocessor Forum 2006. In-Stat/MDR, 2006.
    [8] Jui-Ming Chang and Massoud Pedram. Register allocation and binding for low power. In Proceedings of the 32nd annual ACM/IEEE Design Automation Conference, DAC’95, pages 29–35, New York, NY, USA, 1995. ACM.
    [9] Jianwei Chen, Murali Annavaram, and Michel Dubois. Slacksim: A platform for
    parallel simulations of cmps on cmps. SIGMETRICS Perform. Eval. Rev., 37(2):77–78,
    October 2009.
    [10] Yu-Chun Chen, Te-Feng Su, and Shang-Hong Lai. Efficient vehicle detection with adaptive scan based on perspective geometry. IEEE International Conference on Image Processing (ICIP), Sep. 2013.
    [11] Gilberto Contreras, Margaret Martonosi, Jinzhan Peng, Roy Ju, and Guei-Yuan Lueh. Xtrem: a power simulator for the intel xscale core. In Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems, LCTES ’04, pages 115–125, New York, NY, USA, 2004. ACM.
    [12] Gilberto Contreras, Margaret Martonosi, Jinzhang Peng, Guei-Yuan Lueh, and Roy Ju. The xtrem power and performance simulator for the intel xscale core: Design and experiences. ACM Trans. Embed. Comput. Syst., 6(1), February 2007.
    [13] James Donald and Margaret Martonosi. An efficient, practical parallelization methodology for multicore architecture simulation. IEEE Comput. Archit. Lett., 5(2):14–14, July 2006.
    [14] Richard M. Fujimoto. Parallel discrete event simulation. Commun. ACM, 33(10):30–53, October 1990.
    [15] Heny Hoffmann, Anant Agarwal, and Srinivas Devadas. Partitioning strategies: Spatiotemporal patterns of program decomposition. In Proceedings of the 21st IASTED International Conference on Parallel and Distributed Computing and Systems, PDCS 2009, Nov. 2009.
    [16] Chen-Wei Hsu, Jia-Lu Liao, Shan-Chien Fang, Chia-ChienWeng, Shi-Yu Huang,Wen-Tsan Hsieh, and Jen-Chieh Yeh. Powerdepot: integrating ip-based power modeling with esl power analysis for multi-core soc designs. In Proceedings of the 48th Design Automation Conference, DAC ’11, pages 47–52, 2011.
    [17] Chen-Wei Hsu, Jia-Lu Liao, Jen-Chieh Yeh, Ji-Jan Chen, Shi-Yu Huang, and Jing-Jia Liou. Memory-aware power modeling for pac dsp core. In Quality Electronic Design, 2009. ASQED 2009. 1st Asia Symposium on, pages 319 –324, july 2009.
    [18] Jingcao Hu, Youngsoo Shin, Nagu Dhanwada, and Radu Marculescu. Architecting voltage islands in core-based system-on-a-chip designs. In ISLPED ’04: Proceedings of the 2004 international symposium on Low power electronics and design, pages 180–185, Aug 2004.
    [19] C.J. Hughes, V.S. Pai, P. Ranganathan, and S.V. Adve. Rsim: simulating sharedmemory multiprocessors with ilp processors. Computer, 35(2):40–49, Feb 2002.
    [20] J. Janzen. Calculating memory system power for ddr sdram. Designline, 10(2), 2001.[21] Jinsong Ji, Chao Wang, and Xuehai Zhou. System-level early power estimation for memory subsystem in embedded systems. In Embedded Computing, 2008. SEC ’08. Fifth IEEE International Symposium on, pages 370 –375, oct. 2008.
    [21] Jinsong Ji, Chao Wang, and Xuehai Zhou. System-level early power estimation for memory subsystem in embedded systems. In Embedded Computing, 2008. SEC ’08. Fifth IEEE International Symposium on, pages 370 –375, oct. 2008.
    [22] Kurt Keutzer and Tim Mattson. Our pattern language (pol): A design pattern language for engineering (parallel) software. In ParaPLoP Workshop on Parallel Programming Patterns., 2009.
    [23] Kurt Keutzer and Tim Mattson. Architecture parallel software: design patterns in
    practice and teaching. In Presented as the 25th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2011, May 2011.
    [24] Chi-Bang Kuan and Jenq Kuen Lee. Compiler supports for vliw dsp processors with simd intrinsics. Concurrency and Computation: Practice and Experience, 24(5):517–532, 2012.
    [25] Chingren Lee, Jenq Kuen Lee, Tingting Hwang, and Shi-Chun Tsai. Compiler optimization on vliw instruction scheduling for low power. ACM Trans. Des. Autom. Electron. Syst., 8(2):252–268, April 2003.
    [26] Mike Tien-Chien Lee, V. Tiwari, S. Malik, and M. Fujita. Power analysis and minimization techniques for embedded dsp software. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 5(1):123 –135, Mar 1997.
    [27] Rainer Leupers, Grant Martin, Roman Plyaskin, Andreas Herkersdorf, Frank Schirrmeister, Tim Kogel, and Martin Vaupel. Virtual platforms: Breaking new grounds. In Proceedings of the Conference on Design, Automation and Test in Europe, DATE ’12, pages 685–690, San Jose, CA, USA, 2012. EDA Consortium.
    [28] Ding Li, Shuai Hao, William G. J. Halfond, and Ramesh Govindan. Calculating source line level energy information for android applications. In Proceedings of the 2013 International Symposium on Software Testing and Analysis, ISSTA 2013, pages 78–89, New York, NY, USA, 2013. ACM.
    [29] Ming-Chih Li, Chia-ChienWeng, Tsai-Yuan Tai, and Shi-Hunag. Extrapolation-based power modeling for memory compilers using MUX-oriented linear regression. In VLSI/CAD Symposium, august 2008.
    [30] Cheng-Yen Lin, Po-Yu Chen, Chun-Kai Tseng, Chung-Wen Huang, Chia-ChiehWeng, Chi-Bang Kuan, Shih-Han Lin, Shi-Yu Huang, and Jenq-Kuen Lee. Power aware sid-based simulator for embedded multicore dsp subsystems. In Proceedings of the Eighth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System
    Synthesis, CODES/ISSS ’10, pages 95–104, New York, NY, USA, 2010. ACM.
    [31] Cheng-Yen Lin, Chung-Wen Huang, Chi-Bang Kuan, Shi-Yu Huang, and Jenq-Kuen Lee. The design and experiments of a sid-based power-aware simulator for embedded multicore systems. ACM Trans. Des. Autom. Electron. Syst., 20(2):22:1–22:27, March 2015.
    [32] Cheng-Yen Lin, Chi-Bang Kuan, Wen-Li Shih, and JenqKuen Lee. Compilers for low
    power with design patterns on embedded multicore systems. Journal of Signal Processing Systems, pages 1–17, 2014.
    [33] Cheng-Yen Lin, Shao-Chung Wang, Ming-Yu Hung, Kun-Yuan Hsieh, and Jenq Kuen Lee. Software cache support and api design for embedded dsp processor. In SoC Design Conference (ISOCC), 2009 International, pages 161 –164, Nov. 2009.
    [34] Yung-Chia Lin, Yi-Ping You, and Jenq-Kuen Lee. Palf: compiler supports for irregular register files in clustered vliw dsp processors. Concurrency and Computation: Practice and Experience, 19(18):2391–2406, 2007.
    [35] Chia-Han Lu, Yung-Chia Lin, Yi-Ping You, and Jenq-Kuen Lee. Lc-grfa: global register file assignment with local consciousness for vliw dsp processors with non-uniform register files. Concurrency and Computation: Practice and Experience, 21(1):101–114, 2009.
    [36] Rong Luo, Hong Luo, Huazhong Yang, and Yuan Xie. An instruction-level analytical power model for designing the low power systems on a chip. In ASIC, 2005. ASICON 2005. 6th International Conference On, volume 2, pages 1094–1097, Oct 2005.
    [37] E. Macii, M. Pedram, and F. Somenzi. High-level power modeling, estimation, and optimization. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 17(11):1061 –1079, Nov 1998.
    [38] Berna L. Massingill, Timothy G. Mattson, and Beverly A. Sanders. Simd: an additional pattern for plpp (pattern language for parallel programming). In Proceedings of the 14th Conference on Pattern Languages of Programs, PLOP ’07, pages 6:1–6:15, 2007.
    [39] John C. McCullough, Yuvraj Agarwal, Jaideep Chandrashekar, Sathyanarayan Kuppuswamy, Alex C. Snoeren, and Rajesh K. Gupta. Evaluating the effectiveness of model-based power characterization. In Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference, USENIXATC’11, pages 12–12, Berkeley, CA, USA, 2011. USENIX Association.
    [40] J.E. Miller, H. Kasture, G. Kurian, C. Gruenwald, N. Beckmann, C. Celio, J. Eastep, and A. Agarwal. Graphite: A distributed parallel simulator for multicores. In High Performance Computer Architecture (HPCA), 2010 IEEE 16th International Symposium on, pages 1–12, jan. 2010.
    [41] Mayan Moudgill, John-DavidWellman, and Jaime H. Moreno. Environment for powerpc microarchitecture exploration. IEEE Micro, 19(3):15–25, May 1999.
    [42] Shubhendu S. Mukherjee, Steven K. Reinhardt, Babak Falsafi, Mike Litzkow, Mark D. Hill, David A. Wood, Steven Huss-Lederman, and James R. Larus. Wisconsin wind BIBLIOGRAPHY 68 tunnel ii: A fast, portable parallel architecture simulator. IEEE Concurrency, 8(4):12–20, October 2000.
    [43] A. Over, B. Clarke, and P. Strazdins. A comparison of two approaches to parallel simulation of multiprocessors. In Performance Analysis of Systems Software, 2007. ISPASS 2007. IEEE International Symposium on, pages 12–22, April 2007.
    [44] Subbarao Palacharla, Norman P. Jouppi, and J. E. Smith. Complexity-effective superscalar processors. In Proceedings of the 24th annual international symposium on Computer architecture, ISCA ’97, pages 206–218, New York, NY, USA, 1997. ACM.
    [45] Steven K. Reinhardt, Mark D. Hill, James R. Larus, Alvin R. Lebeck, James C. Lewis, and David A. Wood. The wisconsin wind tunnel: Virtual prototyping of parallel computers. In Proceedings of the 1993 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, SIGMETRICS ’93, pages 48–60, New York, NY, USA, 1993. ACM.
    [46] Siddharth Rele, Santosh Pande, Soner O¨ nder, and Rajiv Gupta. Optimizing static
    power dissipation by functional units in superscalar processors. In Proceedings of
    the 11th International Conference on Compiler Construction, CC ’02, pages 261–275,
    London, UK, UK, 2002. Springer-Verlag.
    [47] Suzanne Rivoire, Parthasarathy Ranganathan, and Christos Kozyrakis. A comparison of high-level full-system power models. In Proceedings of the 2008 Conference on Power Aware Computing and Systems, HotPower’08, pages 3–3, Berkeley, CA, USA, 2008. USENIX Association.
    [48] Greg Semeraro, David H. Albonesi, Steven G. Dropsho, Grigorios Magklis, Sandhya Dwarkadas, and Michael L. Scott. Dynamic frequency and voltage control for a multiple clock domain microarchitecture. In Proceedings of the 35th Annual ACM/IEEE International Symposium on Microarchitecture, MICRO 35, pages 356–367, Los Alamitos, CA, USA, 2002. IEEE Computer Society Press.
    [49] SID. SID simulator user’s guide, http://sourceware.org/sid. 2001.
    [50] Te-Feng Su, Jia-Jhe Li, Chih-Hsueh Duan, Shu-Fan Wang, and Shang-Hong Lai. Parallelized face based rms system on a multi-core embedded computing platform. In
    Parallel Processing Workshops (ICPPW), 2011 40th International Conference on, pages 199 –206, Sept. 2011.
    [51] V. Tiwari, S. Malik, and A Wolfe. Power analysis of embedded software: a first step towards software power minimization. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 2(4):437–445, Dec 1994.
    [52] Vivek Tiwari, Sharad Malik, Andrew Wolfe, and Mike Tien-Chien Lee. Instruction level power analysis and optimization of software. Journal of VLSI signal processing systems for signal, image and video technology, 13(2-3):223–238, 1996.
    [53] Vivek Tiwari, Deo Singh, Suresh Rajgopal, Gaurav Mehta, Rakesh Patel, and Franklin Baez. Reducing power in high-performance microprocessors. In Proceedings of the 35th Annual Design Automation Conference, DAC ’98, pages 732–737, New York, NY, USA, 1998. ACM.
    [54] N. Vijaykrishnan, M. Kandemir, M. J. Irwin, H. S. Kim, and W. Ye. Energy-driven integrated hardware-software optimizations using simplepower. In Proceedings of the 27th Annual International Symposium on Computer Architecture, ISCA ’00, pages 95–106, New York, NY, USA, 2000. ACM.
    [55] Chi Wu, Kun-Yuan Hsieh, Yung-Chia Lin, Chung-Ju Wu, Wen-Li Shih, S. C. Chen, Chung-Kai Chen, Chien-Ching Huang, Yi-Ping You, and Jenq-Kuen Lee. Integrating
    compiler and system toolkit flow for embedded vliw dsp processors. pages 215–222, 2006.
    [56] Hongbo Yang, Guang R. Gao, and Clement Leung. On achieving balanced power consumption in software pipelined loops. In Proceedings of the 2002 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, CASES ’02, pages 210–217, New York, NY, USA, 2002. ACM.
    [57] Yi-Ping You, Chung-Wen Huang, and Jenq Kuen Lee. Compilation for compact power-gating controls. ACMTrans. Des. Autom. Electron. Syst., 12(4), September 2007.
    [58] Yi-Ping You, Chingren Lee, and Jenq Kuen Lee. Compilers for leakage power reduction. ACM Trans. Des. Autom. Electron. Syst., 11(1):147–164, January 2006.
    [59] Dukyoung Yun, Sungchan Kim, and Soonhoi Ha. A parallel simulation technique for multicore embedded systems and its performance analysis. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 31(1):121–131, 2012.
    [60] W. Zhang, M. Kandemir, N. Vijaykrishnan, M. J. Irwin, and V. De. Compiler support for reducing leakage energy consumption. In DATE ’03: Proceedings of the conference on Design, Automation and Test in Europe, pages 1146–1147, 2003.
    [61] Vojin Zivojnovic, Juan M. Velarde, and Christian Schl¨ager. DSPstone: A DSP-oriented benchmarking methodology. In Proceedings of 5th International Conference on Signal Processing Applications and Technology, 1994.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE