簡易檢索 / 詳目顯示

研究生: 蔡杰霖
Tsai, Chieh-Lin
論文名稱: 利用指令執行軌跡之興趣區間擷取進行有效率的軌跡驅動晶片模擬
Extraction of region of interest from instruction trace for effective trace-driven simulation
指導教授: 金仲達
King, Chung-Ta
口試委員: 陳添福
Chen, Tien-Fu
李哲榮
Lee, Che-Rung
學位類別: 碩士
Master
系所名稱:
論文出版年: 2017
畢業學年度: 105
語文別: 英文
論文頁數: 29
中文關鍵詞: 軌跡驅動晶片模擬興趣區間
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在晶片開發的過程中,開發者經常會使用模擬器來驗證晶片的設計以及模擬晶片實際運行的效能狀況。當開發者希望針對晶片上的某些元件進行模擬時,軌跡驅動模擬經常被使用對特定的元件,例如快取記憶體或是晶片網路。其方法為透過收集程式執行的軌跡為輸入,模擬特定元件的執行狀況,相較於在模擬器上實際模擬程式執行,軌跡驅動模擬可以進行更快速的模擬。而在模擬過程中,相較於模擬完整的程式軌跡,開發者會對程式中的特定部分較有興趣,稱之為興趣區間。一般用來進行模擬的程式軌跡只包含執行之指令或是讀取的記憶體位置,在沒有額外特定標注的狀況下,開發者很難從程式軌跡中擷取出和興趣區間相關的部分。
    本篇論文提出了一個方法,延伸過去的研究,將程式軌跡轉換成有限狀態機的形式,並在其上對每一個狀態給予權重。將程式指令軌跡轉換成加權圖,並從中找出權重較重的子圖,即為和開發者的興趣區間相關之部分,並從軌跡中將其擷取。最後將此方法實作於晶片網路之軌跡驅動模擬,驗證此方法的可行性,並示範透過興趣區間擷取可以進行更有效並快速之軌跡驅動模擬。實驗結果顯示,本方法與實際額外對興趣區間函式標注進行擷取所得之軌跡相比,晶片網路模擬之網路延遲數值的誤差為2.55%,顯示本方法之可行性。


    In electronic system-level(ESL) design for computer systems, trace-driven simulation is normally used to simulate specific components of a computer system, e.g. cache or network-on-chip(NoC), in order to verify their designs and evaluate the resultant performance. While traces are often collected by third parties without knowing how they will be used later, ESL designers are more interested in specific portions of the trace, e.g. the main loop of the original program or a particular phase of the computation, which are referred to as the region-of-interest(ROI). The challenge is thus to extract the ROI for a specific design from a trace that was collected without knowing and hence properly annotating the desired ROI. In this thesis, we take on this challenge and consider extraction of ROI from instruction traces resulting from the executions of some (benchmark) programs. To solve this problem, we propose to first infer from the trace the high-level structure of the program by reducing the trace into a finite-state machine (FSM), called the trace-derived FSM(TD-FSM). Next, the ROI is extracted by selecting appropriate subgraphs from the TD-FSM. Finally, a replay procedure is used to generate a trace that is very similar to the original trace but contains only the ROI. To demonstrate the effectiveness of the proposed methodology, we show how the trace corresponding to the main loop of a program can be extracted to drive trace-driven NoC simulation. The simulation results are compared with those using the traces obtained from instrumentation on the original program source code. The comparisons show that the simulation results using our methodology and the annotated traces are differed by only 2.55% in average. The work presented in this thesis is only a starting point and the encouraging results are subject to certain restrictions, which will be discussed in detail.

    1 Introduction . . . . . . 1 2 Methodology . . . . . . 4 2.1 Trace-Derived Finite State Machine . . . . . . 4 2.2 Classifying Different Types of Events in TD-FSM . . . . . . 5 2.3 Finding High Weighted Subgraphs in TD-FSM . . . . . . 6 3 Application . . . . . . 7 3.1 Trace Collection . . . . . . 8 3.2 TD-FSM Derivation . . . . . . 8 3.3 ROI Extraction . . . . . . 9 3.4 Trace Replaying . . . . . . 10 3.5 NoC Trace-driven Simulation . . . . . . 10 4 Evaluation . . . . . . 12 4.1 Experiment Setup . . . . . . 12 4.2 Comparison of Replayed and Original Trace . . . . . . 13 4.3 Evaluation of ROI Extraction Method . . . . . . 17 4.4 Evaluation of ROI Iteration Reduction . . . . . . 22 ii 4.5 Discussions . . . . . . 23 4.6 Limitations . . . . . . 25 5 Conclusion . . . . . .27

    [1] S.C Tsai, C.P Chang, and C.T King, “Reverse engineering of dynamic parallel program
    behavior from execution traces”, in International Conference of Parallel and Distributed
    Systems (ICPADS), Jan 2017.
    [2] N. Agarwal, T. Krishna, L. Peh, and N. Jha, “Garnet: A detailed on-chip network
    model inside a full-system simulator”, in Proceedings of International Symposium on
    Performance Analysis of Systems and Software, April 2009, pp. 33–42.
    [3] N. Walkinshaw, R. Taylor, and J. Derrick, “Inferring extended finite state machine
    models from software executions”, in Reverse Engineering (WCRE), 2013 20th Working
    Conference on, Oct 2013, pp. 301–310.
    [4] Kai Koskimies and Erkki Mäkinen, “Automatic synthesis of state machines from trace
    diagrams”, Softw. Pract. Exper., vol. 24, no. 7, pp. 643–658, July 1994.
    [5] Jiaqi Tan, Xinghao Pan, Soila Kavulya, Rajeev Gandhi, and Priya Narasimhan, “Salsa:
    Analyzing logs as state machines”, in Proceedings of the First USENIX Conference
    on Analysis of System Logs, Berkeley, CA, USA, 2008, WASL’08, pp. 6–6, USENIX
    Association.
    [6] Paul V. Gratz and Stephen W. Keckler, “Realistic workload characterization and analysis
    for networks-on-chip design”, 2010.
    28
    [7] J. Hestness, B. Grot, and S. W. Keckler. ., “Netrace: Dependency-driven trace-based
    network-on-chip simulation”, in Proceedings of the Third International Workshop on
    Network on Chip Architectures, NoCArc’10, 2010, pp. 31–36.
    [8] Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney,
    Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood, “Pin: Building customized
    program analysis tools with dynamic instrumentation”, in Proceedings of the 2005
    ACM SIGPLAN Conference on Programming Language Design and Implementation,
    New York, NY, USA, 2005, PLDI ’05, pp. 190–200, ACM.
    [9] N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness,
    D. R. Hower, T. Krishna, K. Sewell M. Shoaib N. Vaish M. D. Hill S. Sardashti, R. Sen,
    and D. A. Wood, “The gem5 simulator”, in ACM SIGARCH Computer Architecture
    News,, August 2011, pp. 39(2):1–7.
    [10] “Matrix-matrix multiplication timings”, https://people.sc.fsu.edu/~jburkardt/
    c_src/mxm/mxm.html.
    [11] “Remove noise from an image”, https://people.sc.fsu.edu/~jburkardt/c_src/
    image_denoise/image_denoise.html.
    [12] “Svd program”, http://cacs.usc.edu/education/phys516/src/TB/svdcmp.c.
    [13] Ethan Brodsky, “C source code implementing k-means clustering algorithm”, https:
    //medphysics.wisc.edu/~ethan/kmeans/kmeans.c, Oct. 2011.
    [14] “perf: Linux profiling with performance counters”, https://perf.wiki.kernel.org/
    index.php/Main_Page.
    [15] “The spec cpu2006 benchmark”, http://www.spec.org/cpu2006/.

    QR CODE