研究生: |
周仲韓 Chou, Chung Han |
---|---|
論文名稱: |
考慮低功率設計中的時脈樹優化與時脈樹能源回收 Clock Tree Design Under Ultra Low Voltage and Energy Recycling |
指導教授: |
張世杰
Chang, Shih Chieh |
口試委員: |
李毅郎
Li, Yih Lang 黃婷婷 Hwang, TingTing 吳凱強 Wu, Kai Chiang 陳宏明 Chen, Hung Ming |
學位類別: |
博士 Doctor |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2016 |
畢業學年度: | 105 |
語文別: | 英文 |
論文頁數: | 82 |
中文關鍵詞: | 共振時脈 、電源網格模擬 、超低電壓 |
外文關鍵詞: | Resonant clock, Power grid simulation, Ultra-low voltage |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
積體電路中,時脈訊號是同步設計中共同的參考時間點。時脈散佈網路的設計會直接影響整個晶片的效能,例如時脈偏差以及耗能。本論文建立了一套快速的電源/地線雜訊模擬演算法,接著探討時脈訊號的議題中兩種節能方法,並且提出兩種結構設計用以解決其問題。
首先,我們提出了一個基於向量梯度法的電壓差模擬演算法。一般普遍認為向量梯度法適合解決大型但鬆散的對稱正定線性系統。使用向量梯度法的一個關鍵問題在於設計一個適合的預處理子。適合的預處理子可以大幅的節省計算時間並且有效率的使用記憶體。通用的預處理子通常非常容易尋找,但效果則是隨著不同問題而起伏。另一方面,針對問題而特別設計的預處理子通常效果良好但是非常難以設計。這個部分我們研究了「電源網格模擬競賽」的問題,並且根據電源網格設計了一個新的預處理子。實驗結果顯示比起通用預處理子,此預處理子可以讓收斂迴圈減少43%並且讓執行時間減少23%。
接著,我們討論了節能議題。在高效能的設計中,共振時脈被用來實現能源回收。但是共振時脈需要額外加入去耦電容,造成相當大的面積負擔。為了克服共振時脈的面積問題,我們提出一個新的共振時脈設計,稱作乒乓網格。乒乓網格包含兩個子網格,每個子網格互相扮演對方的去耦電容,兩個子網格的時脈運作完全反向。因此,比起傳統的共振時脈,乒乓網格不需要額外加入去耦電容。並且,比起傳統的共振時脈,乒乓網格可以減少大約一半的瞬間最大電流。
最後,多重電壓模式設計是另外一個相當實用的節能方法,並且不會犧牲晶片的效能。然而當運作電壓下降至超低壓等級時,不同電壓模式之間會出現相當大的時脈偏差。若使用傳統的「考慮電壓模式緩衝器」來消弭時脈偏差,會需要加入大量的緩衝器並且造成相當大的功率消耗。這個部分我們提出一個新的「考慮電壓模式緩衝器」架構,同時可以節能並且消除時脈偏差。我們提出的「考慮電壓模式緩衝器」包含兩個串聯的子緩衝器分別運作在不同電壓下:前端的子緩衝器運作在低電壓,用以初步消弭大量的時脈偏差;接著,後端的子緩衝器運作於高電壓,用以消弭較細的時脈偏差。
Clock signal is the common timing reference for all synchronous sequential components in integrated circuits. The design of clock distribution network can affect a chip’s performance directly such as clock skew and power consumption. In the beginning of this dissertation, we build a fast power /ground noise simulation environment. After that, we aim to the clock design issues of two different low power techniques, and propose two architectures to resolve the problem respectively.
First, we propose an IR drop simulation algorithm in this dissertation based on Preconditioned Conjugate Gradient (PCG) method. PCG has been demonstrated to be effective in solving large-scale linear systems for sparse and symmetric positive definite matrices. One critical problem in PCG is to design a good preconditioner, which can significantly reduce the runtime while keeping memory usage efficient. Universal preconditioners are simple and easy to construct, but their effectiveness is highly problem-dependent. On the other hand, domain-specific preconditioners that explore the underlying physical meaning of the matrices usually work better, but are difficult to design. In this part, we study the problem in the context of power grid simulation, and develop a novel preconditioner based on the power grid structure through simple circuit simulations. Experimental results show 43% reduction in the number of iterations and 23% speedup over existing universal preconditioners.
After that, we focus on the low-power issues. Resonant clock has proposed for energy recycling in high-performance design. However, the resonant clock often suffer from area overhead because of the need to insert large decoupling capacitors. To overcome the area overhead for resonant clock, we propose a novel resonant clock mesh structure, called Ping-Pong mesh. Ping-Pong mesh contains two sub-meshes, each of which plays the role of the decoupling capacitor of the other, and the clocks in two sub-meshes operate in completely opposite phases. As the result, a Ping-Pong mesh does not need additional decoupling capacitors as in previous works. Also, Ping-Pong mesh can reduce the power-ground surge current about half of previous works.
Finally, multi-power-mode design is another useful technique for lowing the chip power without sacrificing circuit speed. However, as the supply voltage is down to the ultra-low voltage level, a huge clock skew may occur among different power modes. If conventional power-mode-aware buffers (PMABs) are used to eliminate the clock skew, a large overhead on power consumption will be introduced. In this section, we propose a new PMAB architecture to save the power consumption for clock skew minimization. The proposed PMAB architecture is composed of two serially-connected sub-PMABs at two different voltage levels, respectively: in the front sub-PMAB, the low voltage level is used for coarse-grained clock skew minimization; then, in the back sub-PMAB, the high voltage level is used for fine-grained clock skew minimization.
[1] S.Ali, S. Tanner, and P.A. Farine, “A Robust, Low Power, High Speed Voltage Level Shifter with Built-In Short Circuit Current Reduction”, in Proc. IEEE European Conf. Circuit Theory Des. (ECCTD), Linkoping, Sweden, 2011, pp. 142-145.
[2] O. Axelsson, “A generalized SSOR method”, BIT, vol. 13, pp. 443-467, 1972.
[3] S. C. Chan, K. Shepard, and P. Restle, "Design of resonant global clock distributions,” in Proc. ICCD, pp. 248-253, 2003.
[4] S. C. Chan, et al., "A Resonant Global Clock Distribution for the Cell Broadband Engine Processor," in IEEE Journal of Solid-State Circuits, pp.64-72, 2009.
[5] S. C. Chan, P. Restle, K. Shepard, N. James, and R. Franch, "A 4.6 GHz resonant global clock distribution network," in Proc. ISSCC, pp. 342–343, 2004.
[6] Steven C. Chan et al, "Distributed Differential Oscillators for Global Clock Networks", IEEE JSSC, p2083, 2006.
[7] P.Y. Chen, K.H. Ho and T.T. Hwang, "Skew Aware Polarity Assignment in Clock Tree", in Proc. of IEEE IEEE International Conference on Computer Aided Design, pp. 376-379, 2007.
[8] W.H. Cheng and B.M. Baas, “Dynamic Voltage and Frequency Scaling Circuits with Two Supply Voltages”, in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), Seattle, WA, 2008, pp. 1236-1239.
[9] J.M. Chang and M. Pedram, “Energy Minimization Using Multiple Supply Voltage”, IEEE Trans. Very Large Scale Integr. (VLSI ) Syst., vol. 5, no. 4, pp. 425–435, Dec. 1997.
[10] V. Chi, "Salphasic distribution of clock signals for synchronous systems," in IEEE Trans. Comput., vol. 43, no. 5, pp. 597–602, May 1994.
[11] H.M. Chou, H.Yu, and S.C. Chang, “Useful-Skew Clock Optimization for Multi-Power-Mode Designs”, in Proc. IEEE Int. Conf. Computer-Aided Des. (ICCAD), San Jose, CA, 2011, pp. 647-650.
[12] M. Desai, R. Cvijetic, and J. Jensen, "Sizing of clock distribution networks for high performance CPU chips, " in Design Automation Conference, pp. 389–394, 1996.
[13] A. Drake, K. Nowka, T. Nguyen, J. Burns, and R. Brown, "Resonant clocking using distributed parasitic capacitance," in IEEE J. Solid-State Circuits, vol. 39, no. 9, pp. 1520–1528, Sep. 2004.
[14] G. Gammie, N. Ickes, M.E. Sinangil, R. Rithe, J. Gu, A. Wang, H. Mair, S. Datla, B. Rong, S. Honnavara-Prasad, L. Ho, G. Baldwin, D. Buss, A.P. Chandrakasan, and U. Ko, “A 28 nm 0.6V Low-Power DSP for Mobile Applications”, in Proc. IEEE Int. Symp. Solid-State Circuits Conf. (ISSCC), San Francisco, CA, 2011, pp. 132-133.
[15] N. Gould, Y. Hu, and J. Scott, “A numerical evalution of sparse direct solvers for the solution of large systems, symmetric linear systems of equation,” Tech. Report RAL-TR-2005-005, Rutherford Appleton Laboratory, 2005.
[16] M. R. Guthaus, G. Wilke, and R. Reis, "Non-uniform clock mesh optimization with linear programming buffer insertion," in Design Automation Conference, 2010.
[17] M. R. Hestenes and E. Stiefel, “Methods of conjugate gradient for solving linear systems,” J. Res. National Bureau of Standards, vol. 49, pp. 409-436, 1952.
[18] X. Hu and M. Guthaus. "Distributed LC resonant clock grid synthesis," in IEEE Transactions on Circuits and Systems I (TCAS-I), pp. 2749-2760, 2012.
[19] IBM Power Grid Benchmarks, http://dropzone.tamu.edu/~pli/PGBench
[20] ITSOL v. 2.0: Iterative solvers package,
http://www-users.cs.umn.edu/~saad/software/ITSOL/index.html
[21] H. Jang, D. Joo and T. Kim, "Buffer Sizing and Polarity Assignment in Clock Tree Synthesis for Power/Ground Noise Minimization", in IEEE Trans. on Computer Aided Design of Integrated Circuits and Systems, vol. 30, no. 1, pp. 96-109, 2011.
[22] D. S. Kershaw, “The Incomplete Cholesky-Conjugate Gradient Method for the Iterative Solution of Systems of Linear Equation,” J. Comp. Phys., vol. 26, pp.43-65, 1978.
[23] Z. Li, R. Balasubramanian, F. Liu, S. Nassif, "2011 TAU Power Grid Simulation Contest: Benchmark Suite and Results" in Proc. IEEE/ACM International conference on Computer-aided design, Nov. 2011.
[24] K.Y. Lin, H.T. Lin, and T.Y. Ho, “An Efficient Algorithm of Adjustable Delay Buffer Insertion for Clock Skew Minimization in Multiple Dynamic Supply Voltage Designs”, in Proc. IEEE Asia South Pacific Des. Automat. Conf. (ASP-DAC), Yokohama, Japan, 2011, pp. 825-830.
[25] K.Y. Lin, H.T. Lin, T.Y. Ho, and C.C. Tsai, “Load-Balanced Clock Tree Synthesis with Adjustable Delay Buffer Insertion for Clock Skew Reduction in Multiple Dynamic Supply Voltage Designs”, ACM Trans. Des. Automat. Electron. Syst., vol. 17, no.3, article no. 34, Jun. 2012.
[26] C.L. Lung, Z.Y. Zeng, C.H. Chou, and S.C. Chang, “Clock Skew Optimization Considering Complicated Power Modes”, in Proc. IEEE Des., Automat., and Test in Europe (DATE), Dresden, Germany, 2010, pp. 1474-1479.
[27] C.L. Lung and S.C. Chang, “Power-Mode-Aware Clock Tree and Synthesis Method Thereof”, U.S. Patent 8 179 181, May 26, 2011.
[28] S. Lütkemeier and U. Rückert, “A Subthreshold to Above-Threshold Level Shifter Comprising a Wilson Current Mirror”, IEEE Trans. Circuits Syst. II, vol. 57, no. 9, pp. 721-724, Sep. 2010.
[29] J. A. Meijerink and H. A. van der Vorst, “An iterative solution method for linear systems of which the coefficient matrix is a symmetric M-matrix,” Math. Comp., vol. 31, pp. 148-162, 1977.
[30] M. Mori, H. Chen, B. Yao, and C. K. Cheng, "A multiple level network approach for clock skew minimization with process variations," in Asia and South Pacific Design Automation Conference, pp. 263-268, 2004.
[31] Y.T. Nieh, S.H. Huang and S.Y. Hsu, "Minimizing Peak Current via Opposite-Phase Clock Tree", in Proc. of IEEE Design Automation Conference, pp. 182-185, 2005.
[32] A. M. Niknejad, "ASITIC: Analysis of Si Inductors and Transformers for ICs," http://rfic.eecs.berkeley.edu/~niknejad/doc-05-26-02/asitic.html
[33] M. M. Ozdal, C. Amin, A. Ayupov, S. Burns, G. Wilke, C. Zhuo, "The ISPD -2012 Discrete Cell Sizing Contest and Benchmark Suite," in Proc. ACM International Symposium on Physical Design, pp. 161-164, 2012.
[34] F. O’Mahony, C. Yue, M. Horowitz, and S.Wong, "Design of a 10GHz clock distribution network using coupled standing-wave oscillators, " in Proc. DAC, pp. 682–687, 2003.
[35] L. T. Pillage, R. A. Rohrer, C. Visweswariah, “Electronic Circuit and System Simulation Methods,” McGraw-Hill, 1995.
[36] A. Rajaram and D. Z. Pan, "MeshWorks: A Comprehensive Framework for Optimized Clock Mesh Network Synthesis," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, pp. 1945-1958, Dec. 2010.
[37] P. Restle, C. Carter, J. Eckhardt, B. Krauter, B. McCredie, K. Jenkins, A. Weger, and A. Mule, "The clock distribution of the POWER4 microprocessor," in ISSCC, pages 144–145, 2002.
[38] Y. SAAD, “Iterative Methods for Sparse Linear Systems”, siam, 2003.
[39] R. Samanta, G. Venkataraman and J. Hu, “Clock Buffer Polarity Assignment for Power Noise Reduction”, in Proc. of IEEE IEEE International Conference on Computer Aided Design, pp. 558-562, 2006.
[40] V. Sathe, S. Arekapudi, C. Ouyang, M. Papaefthymiou, A. Ishii, and S. Naffziger,"Resonant clock design for a power-efficient high-volume x86-64 microprocessor," in IEEE International Solid-State Circuits Conference, pp. 68-70, 2012.
[41] T. Sakurai, “Designing Ultra-Low Voltage Logic”, in Proc. IEEE Int. Symp. Low Power Electron. Des. (ISLPED), Fukuoka, Japan, 2011, pp. 57-58.
[42] J. R. Shewchuk, “An Introduction to the Conjugate Gradient Method Without the Agonizing Pain,” Technical Report CMU-CS-94-125, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, August 4, 1994.
[43] Y. S. Su, W.K. Hon, C.C. Yang, S.C. Chang, and. Y.J. Chang “Value Assignment of Adjustable Delay Buffers for Clock Skew Minimization in Multi-Voltage Mode Designs”, in Proc. IEEE Int. Conf. Computer-Aided Des. (ICCAD), San Jose, CA, 2009, pp. 535-538.
[44] U. H. Suhl and L. M. Suhl, “Computing sparse LU factorizations for large-scale linear programming bases,” ORSA J. on computer, vol. 2, pp.325-335, 1990.
[45] K. Takeuchi, K. Chang, K. Zhang, T. Yamauchi, and R. Gastaldi, “F2: Ultra-Low Voltage VLSIs for Energy Efficient Systems”, in Proc. IEEE Int. Symp. Solid-State Circuits Conf. (ISSCC), San Francisco, CA, 2012, pp. 514-515.
[46] B. Taskin, J. Demaio, O. Farell, M. Hazeltine, and R. Ketner, "Custom topology rotary clock router with tree subnetworks," in Trans. Design Automation Electron. Syst., vol. 14, no. 3, May 2009.
[47] U. R. Tida, R. Yang, C. Zhuo, and Y. Shi, "On the Efficacy of Through-Silicon-Via Inductors." in IEEE Transactions on Very Large Scale Integrated (VLSI) Systems, vol. 23, no. 7, pp. 1322-1334, 2015
[48] U. R. Tida, C. Zhuo, and Y. Shi, "Novel through-silicon-via inductor based on-chip DC-DC converter designs in 3D ICs," ACM J. Emerg. Technol. Comp. Syst., vol. 11, no. 2, pp. 16:1–16:14, 2014.
[49] J.R. Tolbert, X. Zhao, S.K. Lim, and S. Mukhopadhyay, “Slew-Aware Clock Tree Design for Reliable Subthreshold Circuits”, in Proc. IEEE Symp. Low Power Electron. Des. (ISLPED), San Jose, CA, 2009, pp. 15-20.
[50] W.P. Tu, C.H. Chou, S.H. Huang, S.C. Chang, Y.T. Nieh, and C.Y. Chou, “Low-Power Timing Closure Methodology for Ultra-Low Voltage Designs”, in Proc. IEEE Int. Conf. Computer-Aided Des. (ICCAD), San Jose, CA, 2013, pp. 697-704.
[51] Kenneth A. Van Goor, "Implementing differential resonant clock with DC blocking capacitor." U.S. Patent 8,729,975, issued May 20, 2014.
[52] G. Venkataraman, Z. Feng, J. Hu, and P. Li, "Combinatorial algorithms for fast clock mesh optimization, " in International Conference on Computer-Aided Design, pp 563-567, 2006.
[53] A. Vittal, H. Ha, F. Brewer and M. M. Sadowska, “Clock Skew Optimization for Ground Bounce Control”, in Proc. of IEEE International Conference on Computer Aided Design, pp. 395-399, 1996.
[54] J.Wood, T. C. Edwards, and S. Lipa, "Rotary traveling-wave oscillator arrays: A new clock technology," in IEEE J. Solid-State Circuits, vol. 36, no. 11, pp. 1654–1664, Nov. 2001.
[55] S.N. Wooters, B.H. Calhoun, and T.N. Blalock, “An Energy-Efficient Subthreshold Level Converter in 130-nm CMOS”, IEEE Trans. Circuits Syst. II, vol. 57, no. 4, pp. 290-294, Apr. 2010.
[56] M.C. Wu, M.C. Lu, H.M. Chen, and J.Y. Jou, “Performance-Constrained Voltage Assignment in Multiple Supply Voltage SoC Floorplanning”, ACM Trans. Des. Automat. Electron. Syst., vol. 15, no.1, article no. 3, Dec. 2009.
[57] L. Xiao, Z. Xiao, Z. Qian, Y. Jiang, T. Huang, H. Tian, and F.-Y. Young, "Local clock skew minimization using blockage-aware mixed treemesh clock network," in Proc. International Conference on Computer-Aided Design, pp. 458-462, 2010.
[58] J. Yang, Z. Li, Y. Cai, Q. Zhou, "PowerRush: A Linear Simulator for Power Grid," in Proc. IEEE/ACM International conference on Computer-aided design, Nov. 2011.
[59] C. Yeh et al., "Clock Distribution Architectures: a Comparative Study," in Proceedings of the IEEE International Symposium on Quality Electronic Design, pp. 85-91, 2006.
[60] Z. Yu and X. Liu, "Implementing multiphase resonant clocking on a finite-impulse response filter, " in IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 17, no. 11, pp. 1593–1601, Nov. 2009.
[61] Z. Zeng, T. Xu, Z. Feng, P. Li, "Fast Static Analysis of Power Grids: Algorithms and Implementations," in Proc. IEEE/ACM International conference on Computer-aided design, Nov. 2011.
[62] X. Zhao, J.R. Tolbert, C. Liu, S. Mukhopadhyay, and S.K. Lim, “Variation-Aware Clock Network Design Methodology for Ultra-Low Voltage (ULV) Circuits”, in Proc. IEEE Symp. Low Power Electron. Des. (ISLPED), Fukuoka, Japan, 2011, pp. 9-14.
[63] C. Ziesler, S. Kim, and M. Papaefthymiou, "A resonant clock generator for single-phase adiabatic systems, " in Proc. ISLPED, 2001.