研究生: |
張俊傑 Chun-Chieh Chang |
---|---|
論文名稱: |
應用關鍵字搜尋於使用繞送為基準之資源定位方法的對等式資源分享網路 Applying Keyword-Searching to Routing-Based Resource-Locating Scheme in Peer-to-Peer Networks |
指導教授: |
林華君
Hwa-Chun Lin |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2002 |
畢業學年度: | 90 |
語文別: | 中文 |
論文頁數: | 44 |
中文關鍵詞: | 對等式 、關鍵字搜尋 、分散式 、負載平衡 |
外文關鍵詞: | Peer-to-Peer, Keyword-Searching, Distributed, Load-Balancing |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著網際網路(Internet)的發達‚越來越多人開始在網際網路上分享彼此所擁有的資源(Resource)。這裡所提到的資源‚可以是檔案(file)、服務(Service)、或是電腦的運算能力。而眾多分享應用(application)中‚目前最熱門的是分享檔案(file)應用。在這些分享應用裡‚如何有效率找尋自己需要的資源‚則一直以來都是網路研究的重點項目之一。
傳統的搜尋架構‚主要是架構在要求-服務(Client-Server)的模式下‚也就是將所有可利用的資源索引於固定伺服器(Server)的資料庫中‚而資源要求者(Client)再透過伺服器搜尋自己需要的資源。但是這種架構存在著許多的缺點‚主要缺點在於容易因為單一伺服器的損壞‚而造成整個分享服務的中斷。除此之外‚在要求-服務的架構底下‚單一伺服器有限的硬體資源也會影響整個分享機制的效能。
對等式(Peer-to-Peer)的資源分享模式可以解決要求-服務模式的缺點。所謂的對等式架構‚相對於要求-服務架構而言‚即是沒有固定的伺服器做為分享資源的集中管理站‚所分享的資源平均存在於每個在對等式網路上的節點‚並不集中索引於固定的伺服器上。這裡所謂的節點可以是個人電腦、工作站、或是任何可以執行對等式服務軟體的網路終端設備。而當需要利用某項資源時‚則必須在對等式網路上搜尋所要的資源‚此時網路上的節點可以視需求扮演資源提供者(Server)、資源要求者(Client)、或為轉送訊息者(Router)。當本身無法提供別人欲搜尋的資源‚將訊息轉送其他節點以繼續搜尋時‚即是扮演Router角色。
目前有許多對等式網路分享架構在設計開發中。而不同的對等式網路分享架構‚有著不同的搜尋資源方式。我們將目前的對等式網路分享架構依其不同的搜尋資源方式分類‚大致上分類為兩種 Flooding-Based Resource-Locating 與 Routing-Based Resource-Locating 架構。所謂的Flooding-Based Resource-Locating‚即無法得知分享資源所存在的位址‚所以當我們搜尋資源時‚必須採用訊息洪流類(Flooding-Like)的搜尋方式‚廣泛詢問每個存在於對等式網路上的節點。而這類架構的代表為Gnutella[7]‚是目前已被廣泛使用的檔案分享系統。除此之外‚Freenet[6]利用深度優先搜尋方式在對等式網路上搜尋所需資源‚亦可歸屬於這一類架構。至於Routing-Based Resource-Locating‚則是每個欲分享的資源‚皆會利用雜湊函數(Hash Function)‚將資源名稱或代號對應到某個網路上的節點位址‚進而將分享資源的實體或是位址資訊(Location Information)配置在該節點位址所對應的節點上。所以當我們搜尋資源時‚若已知所需資源之名稱或代號‚則只需繞送(Route)到配置該資源資訊的節點上‚即可獲得該資源。而今屬於這種架構的有﹕Pastry、Tapestry、CAN 與 Chord[8,9,10,11]。
由於 Routing-Based Resource-Locating 架構能夠明確定址所分享的資源‚不需廣泛詢問所有節點‚所以比起 Flooding-Based Resource-Locating 架構‚Routing-Based Resource-Locating 架構在搜尋資源時‚會產生較少的網路流量(Network Traffic)。也因此‚多數研究試圖發展更有效率的 Routing-Based Resource-Locating 架構‚甚而有人提出改進 Gnutella 為近似於 Routing-Based Resource-Locating 架構的方法[17]。因為依分類‚Gnutella 原屬 Flooding-Based Resource-Locating架構。不過‚也由於 Routing-Based Resource-Locating 架構下的分享/搜尋機制‚依賴於利用雜湊函數‚將資源的名稱或是資源的代號明確對應到對等式網路上的節點位址‚而造成 Routing-Based Resource-Locating 架構無法提供較為彈性的搜尋方法‚例如﹕關鍵字搜尋。此處關鍵字搜尋的定義是在每筆資源皆擁有各自關鍵字集合(set)的情況下‚可透過關鍵字查詢相關的資源‚而此處每筆分享資源的關鍵字集合‚來源可以是資源名稱中有意義的子字串‚或是資源的 meta-data[26]‚ 而 meta-data 是用來描述資料的資料‚如﹕資料的提供者、資料型別或資料內容簡述等。
如何在 Routing-Based Resource-Locating 架構上提供關鍵字搜尋功能﹖此一問題‚目前並沒有學者提出相關的解決方法。我們將實際提出一個在 Routing-Based Resource-Locating 架構下提供關鍵字搜尋的機制‚用以解決原架構無法提供關鍵字搜尋的問題‚並分析此一機制對於原Routing-Based Resource-Locating 架構的影響。
[1]S. Lawrence and C.L. Giles
“Searching the World Wide Web”
Science, vol. 280, no. 5360, 1998, pp. 98-100. Available online at http://citeseer.nj.nec.com/lawrence98searching.html
[2]S. Lawrence and C.L. Giles
“Accessibility of Information on the Web”
Nature, vol. 400, 1999, pp. 107-109”
[3]Roger M. Needham
“Denial of service: An example”
Communications of the ACM, vol. 37, no. 11, pp. 42-46, November 1994
[4]Li Gong
“Peer-to-Peer in Action”
IEEE Internet Computing, Vol. 6, Issue. 1, pp37-39 Jan/Feb 2002
[5]Botros, S.; Waterhouse, S.
“Search in JXTA and other distributed networks”
Peer-to-Peer Computing, 2001 Proceedings
First International Conference on, Aug 2001. pp. 30-35
[6]Ian Clarke; Oskar Sandberg; Brandon Wiley; Theodore W. Hong.
“Freenet: A distributed anonymous information storage and retrieval system”
In Workshop on Design Issues in Anonymity and Unobservability, pp. 311-320, July 2000. ICSI, Berkeley, CA, USA
[7]The Gnutella protocol specification, 2000
http://www.gnutella.co.uk/library/pdf/gnutella_protocol_0.4.pdf.
[8]A. Rowstron and P. Druschel
“Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems.” Accepted for Middleware, November 2001. http://research.microsoft.com/~antr/PAST/
[9]Ben Y. Zhao; John D. Kubiatowicz; Anthony D. Joseph
“Tapestry: An Infrastructure for Fault-tolerant wide-area Location and Routing”
U.C. Berkeley Technical Report UCB/CSD-01-1141, April 2001
[10]Sylvia Ratnasamy; Paul Francis; Mark Handley; Richard Karp; Scott Shenker
“A Scalable Content-Addressable Network”
In Proceedings of the ACM SIGCOMM, 2001
[11]Ion Stoica; Robert Morris; David Karger; M. Frans Kaashoek; Hari Balakrishnan.
“Chord: A Scalable Peer-to-Peer Lookup Service for Internet Application”
ACM SIGCOMM 2001, SAN Diego, CA, August 2001
[12]Li Gong
“JXTA: a network programming environment”
IEEE Internet Computing, Vol. 5, Issue. 3, May-June 2001 pp. 88-95
[13]Paulson; L.D.
“Microsoft, Sun announce P2P technologies”
Computer, Vol. 34, Issue. 9, Sept 2001, pp. 21-21
[14]Steve Waterhouse; David M. Doolin; Gene Kan; Yaroslay Faybishenko
“Distributed Search in P2P Networks”
IEEE Internet Computing, Vol. 6, No. 1, January/February 2002, pp. 68-72
[15]D. Heimbigner
“Adapting publish/subscribe middleware to achieve Gnutella-like functionality”
ACM Symposium on Applied Computing , 2001
[16]Portmann, M.; Sookavatana, P.; Ardon, S.; Seneviratne, A.
“The cost of peer discovery and searching in the gnutella peer-to-peer file sharing protocol”
Networks, 2001. Proceedings. Ninth IEEE, International Conference on, 2001
[17]Aberer, K.; Ruceva, M.; Hauswirth, M.; Schmidt, R.
“Improving data access in P2P systems”
IEEE Internet Computing, Vol. 6, Issue. 1, Jan-Feb 2002, pp. 58-67
[18]Ben Shneiderman.
“Universal Usability”
Communications of the ACM May 2000, Vol. 43, Issue. 5
[19]FIPS 180-1. Secure Hash Standard. U.S. Department of Commerce/NIST, National Technical Information Service, Springfield, VA, Apr. 1995.
[20]Napster: http://www.napster.com/
[21]Pinkerton, B.
“Finding What People Want: Experience with the WebCrawler”
In Proceedings of the Second International World Wide Web Conference, Chicago, Illinois, USA. 1994.
[22]K. Sripanidkulchai.
“The popularity of Gnutella queries and its implications on scalability”
February 2001, available at: http://www.cs.cmu.edu/~kunwadee/research/p2p/gnutella.html
[23] W. Li.
“Zipf’s Law”
January 1, 1999, available at:
http://linkage.rockefeller.edu/wli/zipf/
[24]W. Li.
“Random texts exhibit Zipf’s-law-like word frequency distribution”
IEEE Transactions on Information Theory, 38(6)﹕1842-1845, 1992
[25]Strategy Alley.
“White paper on the viability of the internet for business”, April 1998. Available at http://www.gvu.gatech.edu/user_surveys/other_papers/
[26]Cathro, W.
“Matching discovery and recovery”
In Proceedings of the Seminar on Standards Australia, August 1997
Available at http://www.nla.gov.au/nla/staffpaper/cathro3.html
[27]John Mckechnie; Sameh Shaaban; Stephen Lockley.
“Computer Assisted Processing of Large Unstructured Document Sets”
In Proceedings of the ACM Symposium on Document Engineering
November, 2001
[28]Stephane Zrehen; Michael A. Arbib
“Understanding Jokes: A Neural Approach to Content-Based Information Retrieval”
In Proceedings of the second international conference on Autonomous agents, May, 1998.
[29]Shinkai, D.; Yoshida, T.; Nishida, S.
“Complement Keywords for query toward efficient information retrieval”
Systems, Man, and Cybernetics, 1999.
IEEE SMC ’99 Conference Proceedings, 1999
IEEE International Conference on, Volume:1, 1999. pp. 916-921 Vol.1
[30]The Information Mapping project of CSLI research group at Stanford University, under the direction of Stanley Peters
Available at: http://www-csli.stanford.edu/semlab/infomap.html
[31]Mei Kobayashi; Koichi Takeda.
“Information Retrieval on the Web”
ACM Computing Surveys (CSUR), June, 2000. Volume 32, Issue 2
[32]Craig Silverstein; Monika Henzinger; Hannes Marais; Michael
“Analysis of a Very Large AltaVista Query Log”
Technical Report 1998-014, COMPAQ System Research Center, 1998
Available at: http://citeseer.nj.nec.com/context/1043722/70663
[33]Bernard J. Jansen
“An Investigation Into the Use of Simple Queries on Web IR Systems”
Information Research: An Electronic Journal. 6(1), 2000
Available at: http://jimjansen.tripod.com/academic/pubs/ir2000/ir2000.html
[34]Christine L. Borgman
“Why are online Catalogs Still Hard to Use ?”
Journal of the American Society for Information Science, 47(7): 493-503 1996