研究生: |
葉柏廷 Yeh, Po-Ting |
---|---|
論文名稱: |
具容錯的群播路由機制應用於Fat-Tree網路 A Fault-Tolerant Multicast Routing in Fat-Tree Networks |
指導教授: |
邱瀞德
Chiu, Ching-Te |
口試委員: |
林華君
Lin, Hwa-Chun 黃志煒 Huang, Chih-Wei |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2011 |
畢業學年度: | 99 |
語文別: | 英文 |
論文頁數: | 67 |
中文關鍵詞: | 群播路由 、胖樹 、動態容錯 、資料中心 |
外文關鍵詞: | Multicast Routing, Fat-Tree, Dynamic Fault-Tolerance, Data Center |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
性能和可靠性是大規模fat-tree網路的兩個主要問題。 我們提出具容錯的群播演算機制,其保證無死結以及在k個以上錯誤同時發生時的連接性應用於k-ray n-tree的fat-tree網路。我們合併了群播和容錯的方法。 在群播方面,我們提出三個方法去提高性能及避開死結。 路徑選擇演算法會利用交換機及目標的標籤來決定要送往的路徑。 切割封包成flit循環送出的方法及群播先行的方法來避面免死節的發生。 在容錯方法,one-hop向量儲存每個交換機下方連線的狀態並且計算出有故障的交換機。 end-routing向量紀錄一個封包遇到故障交換機的數量並在適當時機停止重新路由的機制。另外,我們建立了額外的廣播樹來解決原本重新路由演算法無法解決的狀況。在實驗方面我們使用SystemC發展k-ray n-tree的模擬器來評量性能和可靠性。最後與之前相關的論文做比較,我們提出的方法在傳輸量分別比單播、多重區域標識、硬體為群播路由演算法高出220%、85%、16%。在容錯方面,我們也有相當好的性能及可靠度在微量的面積負擔。
Performance and fault-tolerance are two dominant issues in large scale fat-trees. In this work, we present a local fault-tolerant multicast routing that guarantees connection and deadlock-free in k-ray n-tree up to k simultaneous faults. For multicast, the path selection utilizes labels of the fat-tree to decide outgoing ports. The multicast-first and the flit-by-flit round robin scheduling are adopted to solve multicast deadlocks. For fault-tolerance, the one-hop vectors stores and determines the link fault status of a switch. The end-routing vector records the number of switch faults in a routing path to avoid unnecessarily re-routing. In addition, we build a spare broadcast tree for fault-tolerance under the worst fault cases including switch faults at the bottom level of the fat-tree. To evaluate performance of the proposed schemes, we have developed a systemC simulator for k-ray n-tree. Compared to the unicast, the multiple local identifier routing scheme(MLID), and the hardware-based Multicast(HWm), the proposed method increases throughput by up to 220%, 85%, and 16% respectively in the adjacent traffic pattern. The average latency of the proposed algorithm is 21% more than the unicast, 62% less than that of MLID and 25% less than that of HWm under the random traffic. Techniques used to realize faulttolerance are often at the expense of considerable performance degradation. Our proposed scheme has a graceful degradation in performance and a moderate increase in area overhead as the number of link or switch faults in the system increases.
1] M. Al-Fares, A. Loukissas, and A. Vahdat, “A Scalable, Commodity Data Center Network Architecture,” in Proc. ACM Special Interest Group on Data Communications, Aug. 2008, pp. 63-74.
[2] J. Zhou, X. Y. Lin, C. H. Wu, and Y. C. Chung, “Multicast in Fat-Tree-Based InfiniBand Networks,” in Proc. IEEE International Symposium on Network Computing and Applications, Jul. 2005, pp. 239-242.
[3] X. Lin, Y. Chung, and T. Huang, “A Multiple LID Routing Scheme for Fat-Tree-Based InfiniBand Networks,” in Proc. IEEE International Parallel and Distributed Processing Symposium, Apr. 2004, pp. 1-13.
[4] S. Coll, F. J. Mora, J. Duato, and F. Petrini, “Efficient and Scalable Hardware-Based Multicast in Fat-Tree Networks,” IEEE Transactions on Parallel and Distributed System, vol. 20, no. 9, pp. 1285-1298, Sep. 2009.
[5] C. G. Requena, M. G. Requena, P. L. Rodriguez, and J. F. Duato,“FT2EI: A Dynamic Fault-Tolerant Routing Methodology for Fat Trees with Exclusion Intervals,” IEEE Transactions on Parallel and Distributed System, vol. 20, no. 6, pp. 802-817, Jun. 2009.
[6] F. O. Sem-Jacobsen, T. Skeie, O. Lysne, and J. Duato, “Dynamic Fault Tolerance in Fat Trees,” IEEE Transactions On Computers, vol. 60, no. 4, pp. 508-525, Apr. 2011.
[7] F. A. Samman, T. Hollstein, and M. Glesner “Adaptive and Deadlock-Free Tree-Based Multicast Routing for Networks-on-Chip,” IEEE Transactions On Very Large Scale Integration Systems, vol. 18, no. 7, pp. 1067-1080, Jul. 2010.
[8] S. Kumar and L. V. Kale, “Scaling All-to-All Multicast on Fat-tree Networks,” in Proc. International Conference on Parallel and Distributed Systems, Jul. 2004, pp. 205-214.
[9] F. O. Sem-Jacobsen, T. Skeie, O. Lysne, O. T rudbakken, E. Rongved, and B. Johnsen, “Siamese-Twin: A Dynamically Fault-Tolerant Fat Tree,” in Proc. International Parallel and Distributed Processing Symposium, Apr. 2005, pp. 100b.
[10] N. F. Tzeng, P. C. Yew, and C. Q. Zhu, “A Fault-Tolerant Scheme for Multistage Interconnection Networks,” in Proc. IEEE Computer Society Press, May 1985, pp. 368V375.
[11] J. Duato, S. Yalamanchili, and L. Ni, Interconnection Networks: An Engineering Approach, Revised Edition, Morgan Kaufmann, 2002.
[12] K. Hwang, Advanced Computer Architecture - Parallelism, Scalability, Programmability, McGraw-Hill, 1993.
[13] F. T. Leighton. Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes, Morgan Kaufmann, 1992.
[14] C. Guo, H. Wu, K. Tan, L. Shi, Y. Zhang, and S. Lu, “DCell A Scalable and Fault-Tolerant Network Structure for Data Centers,” in Proc. ACM Special Interest Group on Data Communications, Aug. 2008, pp. 75-86.
[15] M. Kliegl, J. Lee, J. Li, X. Zhang, C. Guo, and D. Rincon, “Generalized DCell Structure for Load-Balanced Data Center Networks,” in Proc. IEEE International Conference on Computer Communications, Mar. 2010, pp. 1-5.
[16] D. Li, C. Guo, H. Wu, K. Tan, Y. Zhang, and S. Lu, “FiConn: Using Backup Port for Server Interconnection in Data Centers,” in Proc. IEEE International Conference on Computer Communications, Apr. 2009, pp. 2276-2285.
[17] C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, and S. Lu, “BCube: A High Performance, Server-centric Network Architecture for Modular Data Centers,” in Proc. IEEE International Conference on Computer Communications, Apr. 2009, pp. 63-74.
[18] C. E. Leiserson. “Fat-trees: Universal networks hardware efficient supercomputing,” IEEE Transactions on Computers, vol. 34, no. 10, pp.892-901, Oct. 1985.
[19] P. K. McKinley, Y. j. Tsai, and D. F. Robinson, “Collective Communication in Wormhole-Routed Massively Parallel Computers,” Computer, vol. 28, no. 12, pp. 39-50, Dec. 1995.
[20] V. Dvorak, and J. Jaros , “Optimizing Collective Communications on 2D-Mesh and Fat Tree NoC,” in Proc. International Conference on Networks, Apr. 2010, pp. 22-27.
[21] R. Suzuki, S. Fukumoto, and K. Iwasakio, “Adaptive Checkpointing for Time Warp Technique with a Limited Number of Checkpoints,” in Proc. International Conference on Distributed Computing Systems Workshops, Jul. 2002, pp. 95-100.
[22] B. Kim, Jong Kim, S. Hong,and S. Lee, “A Real-Time Communication Method for Wormhole Switching Networks,” in Proc. International Conference on Parallel Processing, Jun. 1998, pp. 527-534.