研究生: |
羅偉恆 Lo, Wei Hen |
---|---|
論文名稱: |
在電路以及架構層級對良率以及效能上優化的設計 Yield Improvement and High-Performance Design in Circuit Level and Architecture Level |
指導教授: |
黃婷婷
Hwang, TingTing |
口試委員: |
金仲達
King, Chung-Ta 黃俊達 JHuang, uinn Dar 江蕙如 Jiang, Iris Hui-Ru 王廷基 Wang, Ting-Chi 王俊堯 Wang, Chun-Yao |
學位類別: |
博士 Doctor |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2015 |
畢業學年度: | 104 |
語文別: | 英文 |
論文頁數: | 106 |
中文關鍵詞: | 測試 、架構 、冗餘矽穿通道 、資料搬移 |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著製程進步,良率以及效能已經變得越來越重要。為了增加系統的良率以及效能,我們分別針對電路層級以及系統層級三個不同問題進行研究,分別為增進掃描鍊錯誤偵測率,3D-IC 的錯誤容忍問題,以及資料平行應用程式在記憶體內的干擾問題。
在現代製程科技中,錯誤的掃描鍊已經可能造成50%的良率損失,我們發
現掃描鍊偵測技術的有效與否不只取決於電路的邏輯依賴且包含了掃描元彼此的控制程度。所以,在本篇論文中,我們提出了一個利用電路架構來分割並重組掃描鍊的方法來增進偵測率。
另一方面,廣域的連結線所造成的延遲已經是效能與耗能的瓶頸所在,3D-IC的出現提供了許多效能以及耗能上的優化。但製造與疊加晶片的過程中可能會因表層粗細不均以及雜質等問題而導致矽穿通道的損壞。許多研究更進一步指出損壞的矽穿通道常會群聚出現。我們針對群聚出現的壞損矽穿通道設計了一簡單又能維持良率的環狀冗餘矽穿通道架構。
在系統層級方面,許多資料平行應用程式會使用共享的記憶體區塊來執行運算,平行的執行緒會在記憶體中體驗到記憶庫衝突,我們設計了一套動態搬移資料區塊的系統架構,可以有效降低記憶體中記憶庫衝突,有效提升系統效能。
With the advances of VLSI design technology, yield loss and performance have become more and more important. To improve the yield of circuits and the performance of systems, we targeted three different problems in circuit level and in system level, which are scan chain diagnosis problem, fault tolerance problem in 3D-ICs, and memory interference problem for data-parallel multi-threaded applications. First, to increase the diagnosis resolution of the scan chains, a scan chain partitioning
algorithm and a scan chain reordering algorithm have been proposed. In modern technology, scan design can be used to detect combinational failures in a circuit and improve the testability of a circuit. However, the defects of scan chains themselves are also critical for the yield loss of the chips. Since scan chains can take up a large portion of the chip area, faulty scan chains can be responsible for up to 50% of yield loss [1]. We observe that the effectiveness of scan chain diagnosis methods depend on not only logic dependency but also the controllability between scan flip-flops. Hence, in this dissertation, we propose a scan chain partitioning algorithm to increase the detectable values of scan cells in the faulty scan chain and a scan chain reordering algorithm to reduce the range of suspect faulty scan cells and to minimize the routing overhead. The experimental results show that our method can reduce the number of suspect scan cells from 378-31 to at most 3 for most cases of ITC’99 benchmarks.
Second, a ring-based redundant TSV architecture is proposed to improve the yield of 3D-ICs. The fabrication and bonding of TSVs may fail because of many factors, such as winding level of the thinned wafers, the surface roughness and cleaness of silicon dies, and bonding technology. In addition, faulty TSVs tend to cluster together because of imperfect bonding technology. To resolve this problem, the router-based redundant TSV architecture was proposed. Their method enables faulty TSVs to be repaired by redundant TSVs that are farther apart. In this dissertation, a new hardware efficient redundant TSV architecture for clustered fault is proposed. Simulation results show that for a given number of TSVs (8 × 8), TSV failure rate (1%), careful selection of grouping ratios, our design achieves 58.9% area reduction of MUXes per signal, 54.6% total area reduction per signal, and 50.54% total wire length reduction while the yield of our ring-based redundant TSV architectures can still maintain 98.47% to 99.00% as compared with router-based design [2]. The minimum shifting length of our ring-based redundant TSV architecture is at most 1 which guarantees the minimum timing overhead of each signal. The maximum extra shifting latency of our ring-based design is reduced 74.7% compared to that of router-based design when the number of faulty TSVs is set to 8.
Finally, a dynamic data migration method to eliminate memory interference of data parallel multi-threaded applications in multi-cores system has been proposed. Data parallelism is a
common parallel programming model that performs operations on a data set which is often regularly structured in an array. In other words, many thread may access the same shared data set. Thus, when the number of threads increases, the probability of memory interference in memory also increases. To address this issue, we provide a new software/hardware cooperative dynamic data migration method by exploiting the update-and-reuse property. Experimental evaluation in a 16-core x86 8-memory banks system shows that our method can improve the system performance by 13.2% compared to traditional OS page coloring method [3] and 9% compared to parallel application memory scheduling method [4].
[1] S. Kasapi, J. Liao, B. Cory, “Laser Voltage Imaging (LVI) for ATPG Scan Chain Diagnosis on 40nm CMOS,” LSI Testing Symposium, Osaka, Japan, pp. 1422-1426, November 2010.
[2] L. Jiang, Q. Xu, B. Eklow, “On effective TSV repair for 3D-stacked ICs,” DATE’12, pp. 793-798, March 2012.
[3] L. Liu, Z. Cui, M. Xing, Y. Bao, M. Chen, C. Wu, “A software memory partition approach for eliminating bank-level interference in multicore,” PACT’12, pp. 367-376, September 2012.
[4] E. Ebrahimi, R. Miftakhutdinov, C. Fallin, C. J. Lee, O. Mutlu, and Y. N. Patt, “Parallel Application Memory Scheduling,” MICRO’11, pp. 362-373, December 2011.
[5] S. Kundu, “Diagnosing Scan Chain Faults,” IEEE TVLSI, Vol. 2, No.4, pp. 512-516, December 1994.
[6] K. Stanley, ”High Accuracy Flush and Scan Software Diagnostic,” Proc. IEEE YOT 2000, pp 56 - 62, Oct. 2000.
[7] R. Guo, S. Venkataraman, “A Technique For Fault Diagnosis of Defects in Scan Chains,”ITC, pp. 268-277, 2001.
[8] Y. Huang, W.-T. Cheng, S.M. Reddy, C.-J. Hsieh, Y.-T. Hung,“Statistical Diagnosis for Intermittent Scan Chain Hold Time Fault”, ITC, pp.319-328, 2003
[9] J. S. Yang, S. Huang, “Quick Scan Chain Diagnosis Using Signal Profiling,” ICCD, 2004
[10] Y. Huang, W.T. Cheng and G. Crowell. “Using Fault Model Relaxation to Diagnose Real Scan Chain Defects,” ASP-DAC , pp. 1176-1179 ,2005
[11] A. Crouch,“Debugging and Diagnosing Scan Chains.”, EDFAS, pp. 16-24, Feb. 2005.
[12] J. Li, “Diagnosis of Single Stuck-at Faults and Multiple Timing Faults in Scan Chains,” IEEE TVLSI, Vol.13, No.6, pp. 708-718, June 2005
[13] J. Li,“Diagnosis of Multiple Hold-Time and Setup-Time Faults in Scan Chains”, IEEE TC, Vol. 54, No. 11. pp 1467-1472, Nov. 2005
[14] R. Guo, S. Venkataraman, “An algorithmic technique for diagnosis of faulty scan chains”, IEEE Trans. on CAD, pp. 1861-1868, Sept. 2006
[15] Y. Huang, W.-T. Cheng, N. Tamarapalli, J. Rajski, R. Klimgerberg, W. Hsu and Y.-S. Chen, “Diagnosis with Limited Failure Information”, ITC, paper 22.2, 2006
[16] R. Guo, Y. Huang, W.-T Cheng, “A complete test set to diagnose scan chain failures,”ITC, pp.1-10, Oct. 2007
[17] J. Hirase, N. Shindou and K. Akahori, “Scan Chain Diagnosis using IDDQ Current Measurement”, Proc. ATS, pp. 153-157, 1999.
[18] R. Agarwal, W. Zhang, P. Limaye, R. Labie, B. Dimcic, A. Phommahaxay, and P. Soussan, “Cu/Sn Microbumps Interconnect for 3D TSV Chip Stacking”, Proceedings of Electronic Components and Technology Conference (ECTC’10), pp. 858-863, 2010.
[19] N. Lin, J. Miao, P. Dixit, “Void formation over limiting current density and impurity analysis of TSV fabricated by constant-current pulse-reverse modulation,” In Microelectronics Reliability, 2013.
[20] J.U. Knickerbocker, et al. “Three-dimensional silicon integration. IBM Journal of Research and Development”, 52(6):553569, November 2008.
[21] J. Schafer, F. Policastri, R. Mcnulty, “Partner SRLs for Improved Shift Register Diagnostics,”Proc. VTS, pp. 198-201, 1992
[22] S. Edirisooriya, G. Edirisooriya, “Diagnosis of Scan Path Failures,”, Proc. VTS, pp. 250-255, 1995.
[23] K. De, A. Gunda, “Failure Analysis for Full-Scan Circuits”, Proc. ITC, pp. 636-645, Mar. 1995.
[24] S. Narayananan, A. Das, “An Efficient Scheme to Diagnose Scan Chains,” Proc. ITC, pp. 704-713, 1997.
[25] Y.Wu, “Diagnosis of Scan Chain Failures,”, Proc. Int’l Symp. on Defect and Fault Tolerance in VLSI Systems, pp. 217-222-10, 1998.
[26] P. Song, F. Motika, D. Knebel, R. Rizzolo, M. Kusko, J. Lee and M. McManus, “Diagnostic techniques for the IBM S/390 600MHz G5 Microprocessor”, Proc. ITC, pp. 1073-1082, 1999.
[27] C,L, Kong, M.R. Islam, “Diagnosis of Multiple Scan Chain Faults,” International Symposium for Testing and Failure Analysis, pp.510-516. November 2005.
[28] F. Motika, P. Nigh, P. Song, “Stuck-at fault scan chain diagnostic method” US Pat 7010735, March 7, 2006.
[29] A. Anderson, T. M. Burdine, D. O. Forlenza, O. P. Forlenza, W. J. Hurley, P. T. Tran,“Method, apparatus, and computer program product for implementing deterministic
based broken scan chain diagnostics”, US Pat 20050229057, July 1. 2008
[30] J. Ye, Y. Huang, Y. Hu, W. Cheng, R. Guo, L. Lai, et al., Diagnosis and layout aware (DLA) scan chain stitching, in Proc. IEEE ITC, Sep. 2013, pp. 110.
[31] L. Goldstein, “Controllability/observability analysis of digital circuits”, ISCAS, pp.685-693, 1979
[32] Kernighan B. W. Lin Shen,“An efficient heuristic procedure for partitioning graphs,”Bell Systems Technical Journal 49, pp.291-307, 1970
[33] Tessent Diagnosis, Mentor Graphics, Wilsonville, OR, USA, 2012.
[34] “http://www.cerc.utexas.edu/itc99/benchmarks/bench.html”, ITC99 benchmarks, 2009.
[35] “Design Compiler”, Synopsys, 2010.
[36] “SoC Encounter”, Cadence, 2012.
[37] R. Patti, “Three-Dimensional Integrated Circuits and the Future of System-on-Chip Designs,” Proc. of the IEEE, vol. 84, no. 6, June 2006.
[38] A. W. Topol, J. D. C. La Tulipe, L. Shi, et al., “Three Dimensional Integrated Circuits,”IBM Journal of Research and Development, vol. 50, no. 4/5, pp. 491-506, July/Sepetember 2006.
[39] L. Jiang, Y. Liu, L. Duan, Y. Xie, and Q. Xu, “Modeling TSV open defects in 3D-stacked DRAM,” ITC’10, pp. 1-9, November 2010.
[40] N. Lin, J. Miao, P. Dixit, “Void formation over limiting current density and impurity analysis of TSV fabricated by constant-current pulse-reverse modulation,” Microelectronics
Reliability, vol. 53, pp. 1943-1953, 2013.
[41] K. H. Lu, S. Ryu, Q. Zhao, X. Zhang, J. Im, R. Huang, and P. S. Ho, “Thermal Stress Induced Delamination of Through Silicon Vias in 3-D Interconnects,” ECTC’10, pp. 40-45, June 2010.
[42] U. Kang, et al. “8 Gb 3-D DDR3 DRAM using through-silicon-via technology. IEEE Journal of Solid-State Circuits”, 45(1):111119, Jan. 2010.
[43] A. Hsieh, T. Hwang, M. Chan, M. Tsai, C. Tseng, H. Li, “TSV Redundancy: Architecture and Design Issues in 3D IC,” DATE’10, pp. 166-171, March 2010.
[44] I. Loi, et al. “A low-overhead fault tolerance scheme for TSV-based 3D network on chip links”, In Proc. Intl Conf. on Computer-Aided Design, pp. 598602, 2008.
[45] I. Koren and Z. Koren, “Defect tolerance in VLSI circuits: techniques and yield analysis,”Proc. of the IEEE, 86(9):18191838, 1998.
[46] Murphy, B.T., “Cost-Size Optima of Monolithic Integrated Circuits,” Proc. IEEE no. 12 vol. 52, pp. 1537–1545, 1964
[47] B. C. Arnold, “Pareto Distributions,” International Co-operative Publishing House, 1983
[48] B. J. Ho, B. Nader, “A Generic Traffic Model for On-Chip Interconnection Networks,”The First International Workshop on Networks-on-Chip Architectures, 2009
[49] Y. Kim, D. Han, O. Mutlu, M. Harchol-Balter, “ATLAS: A scalable and high performance scheduling algorithm for multiple memory controllers,” HPCA’10, pp. 1-12, January 2010.
[50] A.V. Goldberg and S. Rao. “Beyond the flow decomposition barrier,” Journal of the ACM, 45(5):783797, 1998.
[51] Nangate, “The Nangate 45nm Open Cell Library,” http://www.nangate.com.
[52] “http://www.algorithmic-solutions.com”, LEDA Library
[53] L. Huaguo1, C. Hao, L. Yang, W. Wei, C. Tian and X. Hui, “Optimized Mid-bond Order for 3D-Stacked ICs Considering Failed Bonding,” VLSI-DAT’14, pp. 1-4, April 2014.
[54] V. Bandishti, I. Pananilath, and U. Bondhugula. “Tiling Stencil Computations to Maximize Parallelism,” SC’12, pp. 1-11, November 2012.
[55] M. M. Baskaran, N. Vydyanathan, U. K. Bondhugula, J. Ramanujam, A. Rountev, P. Sadayappan, “The Compiler-Assisted Dynamic Scheduling for Effective Parallelization of Loop Nests on Multicore Processors,” PPoPP’09, pp. 219-228, April 2009.
[56] Y. Kim, D. Han, O. Mutlu, M. Harchol-balter, “ATLAS: A Scalable and High-
Performance Scheduling Algorithm for Multiple Memory Controllers,” HPCA’12, pp. 1-12, February 2012.
[57] Y. Kim, M. Papamichael, O. Mutlu, M. Harchol-Balter, “Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior,” MICRO’10, pp. 65-76, December 2010.
[58] O. Mutlu, T. Moscibroda, “Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems,” ISCA ’08, pp. 63-74, June 2008.
[59] S. P. Muralidhara et al., “Reducing memory interference in multicore systems via application-aware memory channel partitioning,” MICRO ’11, pp. 374-385, June 2011.
[60] S. Rixner et al., “Memory access scheduling,” ISCA ’00, May 2000.
[61] C. Bienia, S. Kumar, J. Pal Singh, K. Li, “The PARSEC Benchmark Suite: Characterization and Architectural Implications,” PACT ’08, September, 2008.
[62] JEDEC. Standard No. 21-C. Annex K: Serial Presence Detect (SPD) for DDR3 SDRAM Modules, 2011.
[63] M. Awasthi et al., “Handling the problems and opportunities posed by multiple on-chip
memory controllers,” PACT’10, September, 2010.
[64] J. Demme, S. Sethumadhavan, “Rapid Identification of Architectural Bottlenecks via Precise Event Counting.” ISCA’11, pp. 353-364, June 2011.
[65] P. Magnusson et al. “Simics: A full system simulation platform.” Computer, 35(2), Feb 2002.
[66] X. Tang, “Diagnosis of VLSI circuit defects: defects in scan chain and circuit logic,” dissertation, University of Iowa, 2010
[67] M. M. K. Martin et al. “Multifacets general execution-driven multiprocessor simulator (GEMS)”
[68] U. Bondhugula, J. Ramanujam, and P. Sadayappan. “Pluto: A practical and fully automatic polyhedral parallelizer and locality optimizer.” Technical Report OSU-CISRC-10/07-TR70, The Ohio State University, Oct. 2007.
[69] M. Christen, O. Schenk, H. Burkhart, “PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures,”
IPDPS ’11, pp. 676-687, May 2011.
[70] Y. Zhao, S. Khursheed, B. M. Al-Hashimi, “Cost-Effective TSV Grouping for Yield Improvement of 3D-ICs,” ATS ’11, pp. 201-206, November 2011.
[71] U. Kang, et al. “8 Gb 3-D DDR3 DRAM using through-silicon-via technology,” IEEE Journal of Solid-State Circuits, 45(1):111119, January. 2010.
[72] D. H. Kim, S. Kim, S. K. Lin, “Impact of nano-scale through-silicon vias on the quality of today and future 3D IC designs”, In Proc. SLiP, pp. 1-8, June 2011.
[73] C. H. Stapper, F. M. Armstrong, and K. Saji, “Integrated circuit yield statistics,” Proc. IEEE, vol. 71, pp.453 - 470, 1983.