An Efficient Algorithm for Answering Top-k Typical Range Representatives Query

簡易檢索 / 詳目顯示

回結果列表

研究生：	劉富翃 Liu, Fu-Home
論文名稱：	An Efficient Algorithm for Answering Top-k Typical Range Representatives Query 回答典型範圍代表者詢問的一個有效率的演算法
指導教授：	韓永楷 Hon, Wing-Kai
口試委員:
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2009
畢業學年度：	97
語文別：	英文
論文頁數：	39
中文關鍵詞：	典型、代表者、典型範圍代表者
外文關鍵詞：	Typicality, Representative, Typical Range Representatives
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

當我們解釋一個新觀念給我們的同事時,最正式的方法就是給它的定義和描述它的特徵。然而,當我們的聽眾是一個小孩時,我們通常會避免用正式的方法因為它難以被瞭解。在這個實例上的一個更好的方法就是先給一個這個觀念的「典型的例子」。
一個相似的實例出現在大型資料庫的搜尋中,例如像是我們想要用一些關鍵字在 Google 中做搜尋。當我們搜尋資料庫且取得一組結果時,一般來說我們渴望的是在那些回傳的結果中的一些「典型」的例子。
我們為了檢索最典型的資料而定義典型範圍代表者詢問並且設計對此詢問之有效率的演算法。在 RAM 模型之下,我們的演算法會在 O(n log n) 時間內回答此詢問,而且我們猜想這個範圍是最理想的。而在 external-memory 模型之下,我們提出另一個會回答詢問用 O(SORT(n)) I/O 數的演算法,而 SORT(n) 是在此模型下排序 n 筆資料以 I/O 數來計算的下界。

When we explain a new concept to our colleagues, the most formal way is to give its definition and to describe its attributes. However, if our listener is a child, we will normally avoid the formal way as it is difficult to be understood. A better approach in this case is to first give a “typical example” of the concept.
A similar case occurs in searching large databases. For instance when we want to search Google by some keywords, As we search the database and receive a set of result, what we desire in general is some “typical” examples from those reported results.
We define the top-k typical range representatives query for retrieving the top-most typical data and design the efficient algorithms for the query. In the RAM model, our algorithm will answer the query in O(n log n) time, where we conjecture the bound is optimal. And in the external-memory model, we propose another algorithm which will answer the query in O(SORT(n)) I/O’s, where SORT(n) is the lower bound on the number of I/O’s to sort n data in the model.

Introduction 1
1 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . 3
Problem Definition 4
1 New Definition on Top-k Typicality Query . . . . . . . . . . 4
2 Our Problem: Reporting Top-k Typical Range Representatives 5
Efficient Algorithm In RAM Model 7
1 A Brute-Force Algorithm . . . . . . . . . . . . . . . . . . . . 7
2 Our Proposed Algorithm . . . . . . . . . . . . . . . . . . . . 9
2.1 Speed-Up In Computing Typicality Values . . . . . . 10
2.2 Speed-Up In Removing Data Points . . . . . . . . . . 12
2.3 Time Analysis . . . . . . . . . . . . . . . . . . . . . . 13
Efficient Algorithm in External-Memory Model 15
1 A Simple External Memory Algorithm . . . . . . . . . . . . 16
2 An Efficient External Memory Algorithm . . . . . . . . . . . 19
2.1 Algorithm of Merging Two Tidy Data Sets . . . . . . 20
2.2 Correctness of Merging Two Tidy Data Sets . . . . . 24
2.3 Algorithm of Merging Multiple Tidy Data Sets . . . . 31
Conclusion 38

                                

[1] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein (2001). Introduction to Algorithms, MIT Press.
[2] Y. Han (2002). Deterministic sorting in O(n log log n) time and linear space. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC), pages 602–608.
[3] M. Hua, J. Pei, A. W. Fu, X. Lin, and H. Leung (2007). Efficiently answering top-k typicality queries on large databases. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB), pages 890–901.
[4] J. S. Vitter (2008). Algorithms and Data Structures for External Memory. Series on Foundations and Trends in Theoretical Computer Science, volume 2, number 4, pages 305–474.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文