研究生: |
黃敏維 Min-Wei Huang |
---|---|
論文名稱: |
在不確定的資料環境下有效計算子空間中Top-K機率天際線的方法 Efficient Computation of Sub-space Top-K Probabilistic Skylines on Uncertain Data |
指導教授: |
陳良弼
Arbee L.P. Chen |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2008 |
畢業學年度: | 96 |
語文別: | 英文 |
論文頁數: | 42 |
中文關鍵詞: | 機率天際線查詢 、Top-K查詢 、不確定的資料 |
外文關鍵詞: | Probabilistic Skyline Query, Top-K Query, Uncertain Data |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
The skyline query is used to find a set of non-dominated objects in a multi-dimensional dataset. Recently, it has been further applied on uncertain data to provide advanced analysis for some important applications, such as environmental monitoring and market analysis. Due to the uncertainty, the dominance relationship between objects becomes uncertain and comes with a probability value. Previous works define the skyline model on uncertain data as probabilistic skyline and provide query processing methods that need a probability threshold to compute the qualified skyline objects. However, it is inconvenient for the users to give a suitable probability threshold without prior knowledge. Furthermore, only full dimensional skyline queries are considered in these works. Since different users may be interested in different dimensions, sub-space skyline queries are much more practical in many applications. In this thesis, we propose a novel skyline query processing method on uncertain data, which derives the top-k objects with the highest probabilities to be in the skyline of the user-demanded subspace. It is more user-friendly for users to give a number of how many answers they want rather than a probability threshold. In our method, two strategies are developed to efficiently prune objects which are not the top-k answers, and therefore substantially save the computation cost. The first strategy filters out the objects with no opportunity to be the skyline and finds a probability upper-bound of each remaining object for further pruning. Then, the second strategy provides a tight bound for pruning objects whose probability upper-bounds are much lower than the k-th highest probability. Finally, extensive experiments with real datasets are performed to demonstrate the efficiency and scalability of our approach.
[1] S. Borzsonyi, D. Kossmann, and K. Stocker. The Skyline Operator. In Proc. ICDE, pages 421–430, 2001.
[2] C.Y. Chan, H. Jagadish, K.-L. Tan, A. Tung, and Z. Zhang. Finding k-Dominant Skylines in High Dimensional Space. In Proc. SIGMOD, pages 503–514, 2006.
[3] C.Y. Chan, H. Jagadish, K.-L. Tan, A. Tung, and Z. Zhang. On High Dimensional Skylines. In Proc. EDBT, pages 478–495, 2006.
[4] R. Cheng, Y. Xia, S. Prabhakar, R. Shah, J. S. Vitter, “Efficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data,” In Proc. VLDB, pages 876-887, 2004.
[5] J. Chomicki, P. Godfrey, J. Gryz, and D. Liang. Skyline with Presorting. In Proc. ICDE, pages 717–719, 2003.
[6] P. Godfrey, R. Shipley, and J. Gryz, “Maximal Vector Computation in Large Data Sets,” In Proc. VLDB, pages 229–240, 2005.
[7] W. Jin, A.K.H. Tung, M. Ester, J. Han. On Efficient Processing of Subspace Skyline Queries on High Dimensional Data. In Proc. SSDBM, pages 12-12, 2007.
[8] D. Kossman, F. Ramsak, and S. Rost. Shooting Stars in the Sky: An Online Algorithm for Skyline Queries. In Proc. VLDB, pages 275–286, 2002.
[9] H. Kung, F. Luccio, and F. Preparata. On finding the maxima of a set of vectors. Journal of the ACM, 22(4): 469–476, 1975.
[10] C. Li, B. Ooi, A. Tung, and S. Wang. DADA: A Data Cube for Dominant Relationship Analysis. In Proc. SIGMOD, pages 659–670, 2006.
[11] X. Lin, Y. Yuan, W. Wang, and H. Lu, “Stabbing the Sky: Efficient Skyline Computation over Sliding Windows,” In Proc. ICDE, pages 502–513, 2005.
[12] M. Morse, J.M. Patel, H.V. Jagadish. Efficient Skyline Computation over Low-Cardinality Domains. In Proc. VLDB, pages 267-278, 2007.
[13] D. Papadias, Y. Tao, G. Fu, and B. Seeger. An Optimal and Progressive Algorithm for Skyline Queries. In Proc. SIGMOD, pages 467–478, 2003.
[14] J. Pei, W. Jin, M. Easter, and Y. Tao. Catching the Best View of Skyline: A Semantic Approach Based on Decisive Subspaces. In Proc. VLDB, pages 253–264, 2005.
[15] J. Pei, A.W.C. Fu, X. Lin, H. Wang. Computing Compressed Multidimensional Skyline Cubes Efficiently. In Proc. ICDE, pages 96-105, 2007.
[16] J. Pei, B. Jiang, X. Lin, Y. Yuan. Probabilistic Skylines on Uncertain Data. In Proc. VLDB, pages 15-26, 2007
[17] K.L. Tan, P.K. Eng, B.C. Ooi. Efficient progressive skyline computation In Proc. VLDB, pages 301-310, 2001
[18] Y. Tao, X. Xiao, J. Pei. SUBSKY: Efficient Computation of Skylines in Subspaces. In Proc. ICDE, pages 65-65, 2006
[19] A. Vlachou, C. Doulkeridis, Y. Kotidis, M. Vazirgiannis. SKYPEER: Efficient Subspace Skyline Computation over Distributed Data. In Proc. ICDE, pages 416-425, 2007.
[20] Y. Yuan, X. Lin, Q. Liu, W. Wang, J. Yu, and Q. Zhang. Efficient computation of the skyline cube. In Proc. VLDB, pages 241–252, 2005.
[21] T. Xia, D. Zhang. Refreshing the Sky: The Compressed Skycube with Efficient Support for Frequent Updates. In Proc. SIGMOD, pages 491-502, 2006.