研究生: |
張苑瑩 |
---|---|
論文名稱: |
對晶片網路之分配品質和傳輸品質做最大化 On Maximizing the Quality of Allocation and Transmission in Network-on-Chip |
指導教授: | 金仲達 |
口試委員: |
金仲達
蔡仁松 王廷基 洪士灝 單智君 |
學位類別: |
博士 Doctor |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2013 |
畢業學年度: | 101 |
語文別: | 英文 |
論文頁數: | 70 |
中文關鍵詞: | 晶片網路 |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著半導體業的精進,晶片的發展已經延伸到3D晶片,而越來越多的核心、快取還有記憶體都可以被統整到一個系統整合單晶片上。為了在3D晶片上連結這些大量的元件,晶片網路變得不可或缺 [3,4],而如何設計和實作晶片網路也逐漸變得越來越重要,所以我們需要適當的評量指標來判斷晶片網路設計和實作的好壞。一個好的評量指標是可以幫助我們洞察問題的所在且可以讓設計者專心地只解決最關鍵的問題。
在論文中,我們從品質最佳化方面著手來探討晶片網路的設計和實作,並且討論品質相關的評量指標如何幫助發展高品質的設計。首先,我們從晶片網路的路由器微型架構(Router Microarchitecture)進行討論,我們針對交換器(Switch Allocator)的評量指標-分配品質(Quality-of-Allocation)做最佳化,進而提出了一個新穎的路由器設計-TS-Router。在路由器的微型架構中,交換器的分配(Switch Allocation)是最關鍵的階段。交換器分配主要是負責將從輸入埠取得的封包分配至它們各別要前往的輸出埠。基本上,這樣的分配就是在輸入和輸出埠間作配對。所以,一個有效率的路由器設計應該要能最大化輸入和輸出埠間的配對數。在以往的研究中,他們將每個週期的交換器分配分開來考量。而在我們的論文裡,我們認為不同週期的交換器分配是有關連的,所以這樣的配對問題是含有時間軸(考量過去和未來)的概念。因此,交換器分配必須涵蓋時間的概念才能夠達到最大化分配品質。根據這樣的理念,TS-Router透過預測未來以幫助現在做出好的分配決策進而最大化配對數。這樣的作法只需要極少的額外負荷就能簡單地被納入在其他的路由器。
接著,我們透過傳輸品質(Quality-of-Transmission)以探討在使用直通矽穿孔(TSV)連結的3D晶片上的晶片網路實作。TSV是一個極為普遍且有效率的方式來連結3D晶片中的不同晶片。然而,TSV必須數十數百個包裹在一起作傳輸,而這樣的方式很容易引起嚴重的串音(crosstalk)來干擾傳輸的品質。因此,我們提出了一個全新的動態機制ShieldUS來解決這樣的問題。在ShieldUS中,我們會動態地觀察資料傳輸的規則,將比較不會變動的資料位元分配到可用來當遮蔽以隔絕串音的TSV,如此一來,可大幅減少嚴重的串音發生。
在論文中,我們會在實驗中使用對應的評量指標來評估我們提出的機制。首先,我們使用合成的交通流量模型和真實的應用程式來評估TS-Router。實驗結果顯示TS-Router有著較多的配對數和較少的延遲時間。另外,我們亦將TS-Router使用硬體描述語言Verilog實作,所以在論文裡也有功耗和面積的評估結果;再者,我們將應用程式每次對記憶體存取的位址記錄下來,利用這些實質紀錄和利用機率來合成不同相似程度的模型來評估ShieldUS。實驗結果顯示ShieldUS可以從過去歷史中學習到資料傳輸模型並且可以作出適當的對應讓串音的干擾變小。更甚者,我們設計動態尋找間隔的方法IEU(Interval Equilibration Unit)來幫助我們辨識應用程式的特性以更加強ShieldUS。
[1] G. Michelogiannakis, N. Jiang, D. Becker, and W. J. Dally, “Packet chaining:
efficient single-cycle allocation for on-chip networks,” in Proceedings of the 44th
Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO-
44 ’11. New York, NY, USA: ACM, 2011, pp. 83–94. [Online]. Available:
http://doi.acm.org/10.1145/2155620.2155631
[2] M. Ahn and E. J. Kim, “Pseudo-circuit: Accelerating communication for on-chip
interconnection networks,” in Proceedings of the 2010 43rd Annual IEEE/ACM
International Symposium on Microarchitecture, ser. MICRO ’43. Washington,
DC, USA: IEEE Computer Society, 2010, pp. 399–408. [Online]. Available:
http://dx.doi.org/10.1109/MICRO.2010.10
[3] R. Kumar, V. Zyuban, and D. Tullsen, “Interconnections in multi-core architectures:
Understanding mechanisms, overheads and scaling,” in Computer Architecture, 2005.
ISCA'05. Proceedings. 32nd International Symposium on. IEEE, 2005, pp. 408–419.
[4] W. Dally and B. Towles, “Route packets, not wires: On-chip interconnection networks,”
in Design Automation Conference, 2001. Proceedings. IEEE, 2001, pp. 684–689.
[5] M. Hayenga and M. Lipasti, “The nox router,” in Proceedings of the 44th Annual
IEEE/ACM International Symposium on Microarchitecture. ACM, 2011, pp. 36–46.
[6] D. Sanchez, G. Michelogiannakis, and C. Kozyrakis, “An analysis of on-chip interconnection
networks for large-scale chip multiprocessors,” ACM Transactions on Architecture
and Code Optimization (TACO), vol. 7, no. 1, p. 4, 2010.
[7] H. Eberle, P. Garcia, J. Flich, J. Duato, R. Drost, N. Gura, D. Hopkins, and W. Olesinski,
“High-radix crossbar switches enabled by proximity communication,” in Proceedings
of the 2008 ACM/IEEE conference on Supercomputing. IEEE Press, 2008, p. 32.
[8] J. Kim, J. Balfour, and W. Dally, “Flattened butterfly topology for on-chip
networks,” in Proceedings of the 40th Annual IEEE/ACM International Symposium on
Microarchitecture, ser. MICRO 40. Washington, DC, USA: IEEE Computer Society,
2007, pp. 172–182. [Online]. Available: http://dx.doi.org/10.1109/MICRO.2007.15
[9] J. Kim, W. Dally, B. Towles, and A. Gupta, “Microarchitecture of a high-radix router,”
in ACM SIGARCH Computer Architecture News, vol. 33, no. 2. IEEE Computer
Society, 2005, pp. 420–431.
[10] L.-S. Peh and W. Dally, “A delay model and speculative architecture for pipelined
routers,” in High-Performance Computer Architecture, 2001. HPCA. The Seventh In-
ternational Symposium on, 2001, pp. 255 –266.
[11] H. Matsutani, M. Koibuchi, H. Amano, and T. Yoshinaga, “Prediction router: Yet
another low latency on-chip router architecture,” in High Performance Computer Ar-
chitecture, 2009. HPCA 2009. IEEE 15th International Symposium on. IEEE, 2009,
pp. 367–378.
[12] D. U. Becker and W. J. Dally, “Allocator implementations for network-on-chip
routers,” in Proceedings of the Conference on High Performance Computing Networking,
Storage and Analysis, ser. SC ’09. New York, NY, USA: ACM, 2009, pp. 52:1–52:12.
[Online]. Available: http://doi.acm.org/10.1145/1654059.1654112
[13] R. Mullins, A. West, and S. Moore, “Low-latency virtual-channel routers for on-chip
networks,” in Proceedings of the 31st annual international symposium on Computer
architecture, ser. ISCA ’04. Washington, DC, USA: IEEE Computer Society, 2004,
pp. 188–. [Online]. Available: http://dl.acm.org/citation.cfm?id=998680.1006717
[14] S. Pasricha, “Exploring serial vertical interconnects for 3d ics,” in Design Automation
Conference, 2009. DAC'09. 46th ACM/IEEE. IEEE, 2009, pp. 581–586.
[15] J.-S. Kim, C. S. Oh, H. Lee, D. Lee, H.-R. Hwang, S. Hwang, B. Na, J. Moon, J.-G.
Kim, H. Park, J.-W. Ryu, K. Park, S.-K. Kang, S.-Y. Kim, H. Kim, J.-M. Bang, H. Cho,
M. Jang, C. Han, J.-B. Lee, K. Kyung, J.-S. Choi, and Y.-H. Jun, “A 1.2v 12.8gb/s 2gb
mobile wide-i/o dram with 4x128 i/os using tsv-based stacking,” in Solid-State Circuits
Conference Digest of Technical Papers (ISSCC), 2011 IEEE International, feb. 2011,
pp. 496 –498.
[16] H. Kaul, D. Sylvester, and D. Blaauw, “Active shields: a new approach to shielding
global wires,” in Proceedings of the 12th ACM Great Lakes symposium on VLSI. ACM,
2002, pp. 112–117.
[17] N. Satyanarayana, A. Vinaya Babu, and M. Mutyam, “Delay-efficient bus encoding
techniques,” Microprocess. Microsyst., vol. 33, no. 5-6, pp. 365–373, Aug. 2009.
[Online]. Available: http://dx.doi.org/10.1016/j.micpro.2009.05.003
[18] J. Ma and L. He, “Formulae and applications of interconnect estimation considering
shield insertion and net ordering,” in Computer Aided Design, 2001. ICCAD 2001.
IEEE/ACM International Conference on. IEEE, 2001, pp. 327–332.
[19] R. Arunachalam, E. Acar, and S. Nassif, “Optimal shielding/spacing metrics for low
power design,” in VLSI, 2003. Proceedings. IEEE Computer Society Annual Symposium
on. IEEE, 2003, pp. 167–172.
[20] Y. Tamir and H. C. Chi, “Symmetric crossbar arbiters for vlsi communication
switches,” IEEE Trans. Parallel Distrib. Syst., vol. 4, no. 1, pp. 13–27, Jan. 1993.
[Online]. Available: http://dx.doi.org/10.1109/71.205650
[21] L. Ford and D. Fulkerson, “Maximal flow through a network,” Canadian Journal of
Mathematics, vol. 8, no. 3, pp. 399–404, 1956.
[22] N. McKeown, “The islip scheduling algorithm for input-queued switches,” Networking,
IEEE/ACM Transactions on, vol. 7, no. 2, pp. 188–201, 1999.
[23] N. Agarwal, T. Krishna, L. Peh, and N. Jha, “Garnet: A detailed on-chip network model
inside a full-system simulator,” in Performance Analysis of Systems and Software, 2009.
ISPASS 2009. IEEE International Symposium on. Ieee, 2009, pp. 33–42.
[24] W. Dally, “Virtual-channel flow control,” Parallel and Distributed Systems, IEEE Trans-
actions on, vol. 3, no. 2, pp. 194–205, 1992.
[25] W. Dally and B. Towles, Principles and practices of interconnection networks. Morgan
Kaufmann, 2004.
[26] H. Matsutani, M. Koibuchi, H. Amano, and T. Yoshinaga, “Prediction router: A lowlatency
on-chip router architecture with multiple predictors,” vol. 60, no. 6. IEEE,
2011, pp. 783–799.
[27] H. Wang, X. Zhu, L. Peh, and S. Malik, “Orion: a power-performance simulator for
interconnection networks,” in Microarchitecture, 2002.(MICRO-35). Proceedings. 35th
Annual IEEE/ACM International Symposium on. IEEE, 2002, pp. 294–305.
[28] H. Wang, L. Peh, and S. Malik, “Power-driven design of router microarchitectures in onchip
networks,” in Proceedings of the 36th annual IEEE/ACM International Symposium
on Microarchitecture. IEEE Computer Society, 2003, p. 105.
[29] N. Binkert, B. Beckmann, G. Black, S. Reinhardt, A. Saidi, A. Basu, J. Hestness,
D. Hower, T. Krishna, S. Sardashti et al., “The gem5 simulator,” ACM SIGARCH
Computer Architecture News, vol. 39, no. 2, pp. 1–7, 2011.
[30] M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu,
A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood, “Multifacet’s
general execution-driven multiprocessor simulator (gems) toolset,” SIGARCH Comput.
Archit. News, vol. 33, no. 4, pp. 92–99, Nov. 2005. [Online]. Available:
http://doi.acm.org/10.1145/1105734.1105747
[31] J. Hestness, B. Grot, and S. Keckler, “Netrace: dependency-driven trace-based networkon-
chip simulation,” in Proceedings of the Third International Workshop on Network on
Chip Architectures. ACM, 2010, pp. 31–36.
[32] J. Balfour and W. J. Dally, “Design tradeoffs for tiled cmp on-chip networks,”
in Proceedings of the 20th annual international conference on Supercomputing, ser.
ICS ’06. New York, NY, USA: ACM, 2006, pp. 187–198. [Online]. Available:
http://doi.acm.org/10.1145/1183401.1183430
[33] J. L. Henning, “Spec cpu2006 benchmark descriptions,” SIGARCH Comput.
Archit. News, vol. 34, no. 4, pp. 1–17, Sep. 2006. [Online]. Available:
http://doi.acm.org/10.1145/1186736.1186737
67
[34] R. Hoare, Z. Ding, and A. Jones, “A near-optimal real-time hardware scheduler for large
cardinality crossbar switches,” in Proceedings of the 2006 ACM/IEEE Conference on
Supercomputing. ACM, 2006, p. 94.
[35] G. Yuan, A. Bakhoda, and T. Aamodt, “Complexity effective memory access scheduling
for many-core accelerator architectures,” in Microarchitecture, 2009. MICRO-42. 42nd
Annual IEEE/ACM International Symposium on. IEEE, 2009, pp. 34–44.
[36] A. Kumar, L.-S. Peh, P. Kundu, and N. K. Jha, “Express virtual channels: towards
the ideal interconnection fabric,” in Proceedings of the 34th annual international
symposium on Computer architecture, ser. ISCA ’07. New York, NY, USA: ACM,
2007, pp. 150–161. [Online]. Available: http://doi.acm.org/10.1145/1250662.1250681
[37] M. Galles, “Spider: a high-speed network interconnect,” Micro, IEEE, vol. 17, no. 1,
pp. 34–39, 1997.
[38] L. Li, N. Vijaykrishnan, M. Kandemir, and M. Irwin, “A crosstalk aware interconnect
with variable cycle transmission,” in Design, Automation and Test in Europe Conference
and Exhibition, 2004. Proceedings, vol. 1, feb. 2004, pp. 102 – 107 Vol.1.
[39] B. Victor and K. Keutzer, “Bus encoding to prevent crosstalk delay,” in Proceedings
of the 2001 IEEE/ACM international conference on Computer-aided design, ser.
ICCAD ’01. Piscataway, NJ, USA: IEEE Press, 2001, pp. 57–63. [Online]. Available:
http://dl.acm.org/citation.cfm?id=603095.603107
[40] M. Mutyam, “Selective shielding: a crosstalk-free bus encoding technique,” in
Computer-Aided Design, 2007. ICCAD 2007. IEEE/ACM International Conference on,
nov. 2007, pp. 618 –621.
[41] K. Karmarkar and S. Tragoudas, “Scalable codeword generation for coupled buses,” in
Proceedings of the Conference on Design, Automation and Test in Europe, ser. DATE ’10.
3001 Leuven, Belgium, Belgium: European Design and Automation Association, 2010,
pp. 729–734. [Online]. Available: http://dl.acm.org/citation.cfm?id=1870926.1871102
[42] B.-J. Kwak, N.-O. Song, and L. Miller, “Performance analysis of exponential backoff,”
Networking, IEEE/ACM Transactions on, vol. 13, no. 2, pp. 343 – 355, april 2005.
[43] J. Lau, S. Schoenmackers, and B. Calder, “Transition phase classification and prediction,”
in High-Performance Computer Architecture, 2005. HPCA-11. 11th International
Symposium on, feb. 2005, pp. 278 – 289.
[44] T. Song, C. Liu, D. H. Kim, S. K. Lim, J. Cho, J. Kim, J. Pak, S. Ahn, J. Kim, and
K. Yoon, “Analysis of tsv-to-tsv coupling with high-impedance termination in 3d ics,”
in ISQED, 2011, pp. 122–128.
[45] C. Liu, T. Song, J. Cho, J. Kim, J. Kim, and S. K. Lim, “Full-chip tsv-to-tsv coupling
analysis and optimization in 3d ic,” in DAC, 2011, pp. 783–788.
[46] Z. Xu, A. Beece, D. Zhang, Q. Chen, K. neng Chen, K. Rose, and J.-Q. Lu, “Crosstalk
evaluation, suppression and modeling in 3d through-strata-via (tsv) network,” in 3D
Systems Integration Conference (3DIC), 2010 IEEE International, nov. 2010, pp. 1 –8.
[47] R. Weerasekera, M. Grange, D. Pamunuwa, H. Tenhunen, and L.-R. Zheng, “Compact
modelling of through-silicon vias (tsvs) in three-dimensional (3-d) integrated circuits,”
in 3D System Integration, 2009. 3DIC 2009. IEEE International Conference on, sept.
2009, pp. 1 –8.
[48] Y. Shin and T. Sakurai, “Coupling-driven bus design for low-power application-specific
systems,” in Design Automation Conference, 2001. Proceedings, 2001, pp. 750 – 753.
[49] F. Wang, Y. Xie, N. Vijaykrishnan, and M. J. Irwin, “On-chip bus thermal analysis
and optimization,” in Proceedings of the conference on Design, automation and
test in Europe: Proceedings, ser. DATE ’06. 3001 Leuven, Belgium, Belgium:
European Design and Automation Association, 2006, pp. 850–855. [Online]. Available:
http://dl.acm.org/citation.cfm?id=1131481.1131721
[50] C. Minkenberg and M. Gusat, “Speculative flow control for high-radix datacenter interconnect
routers,” in Parallel and Distributed Processing Symposium, 2007. IPDPS
2007. IEEE International. IEEE, 2007, pp. 1–10.