在不確定的資料環境下有效計算子空間中Top-K機率天際線的方法

簡易檢索 / 詳目顯示

回結果列表

研究生：	黃敏維 Min-Wei Huang
論文名稱：	在不確定的資料環境下有效計算子空間中Top-K機率天際線的方法 Efficient Computation of Sub-space Top-K Probabilistic Skylines on Uncertain Data
指導教授：	陳良弼 Arbee L.P. Chen
口試委員:
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications
論文出版年：	2008
畢業學年度：	96
語文別：	英文
論文頁數：	42
中文關鍵詞：	機率天際線查詢、Top-K查詢、不確定的資料
外文關鍵詞：	Probabilistic Skyline Query, Top-K Query, Uncertain Data
相關次數：	點閱：88 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

The skyline query is used to find a set of non-dominated objects in a multi-dimensional dataset. Recently, it has been further applied on uncertain data to provide advanced analysis for some important applications, such as environmental monitoring and market analysis. Due to the uncertainty, the dominance relationship between objects becomes uncertain and comes with a probability value. Previous works define the skyline model on uncertain data as probabilistic skyline and provide query processing methods that need a probability threshold to compute the qualified skyline objects. However, it is inconvenient for the users to give a suitable probability threshold without prior knowledge. Furthermore, only full dimensional skyline queries are considered in these works. Since different users may be interested in different dimensions, sub-space skyline queries are much more practical in many applications. In this thesis, we propose a novel skyline query processing method on uncertain data, which derives the top-k objects with the highest probabilities to be in the skyline of the user-demanded subspace. It is more user-friendly for users to give a number of how many answers they want rather than a probability threshold. In our method, two strategies are developed to efficiently prune objects which are not the top-k answers, and therefore substantially save the computation cost. The first strategy filters out the objects with no opportunity to be the skyline and finds a probability upper-bound of each remaining object for further pruning. Then, the second strategy provides a tight bound for pruning objects whose probability upper-bounds are much lower than the k-th highest probability. Finally, extensive experiments with real datasets are performed to demonstrate the efficiency and scalability of our approach.

Introduction.........................................................................1
Related Work.........................................................................10
Problem Definition and a Basic Method................................................13
1  Problem Definition.................................................................14
2  Bound-Based Top-K Probabilistic Skylines Computation...............................14
3  A Basic Method.....................................................................16
Top-K Probabilistic Skylines Computation on Uncertain Data...........................20
1  WorstSky...........................................................................20
2  BestSky............................................................................26
Experiments..........................................................................33
1  Experimental Settings..............................................................33
2  Pruning Capability of WorstSky.....................................................34
3  Computation Time...................................................................35
4  Scalability with Size of Dataset...................................................36
5  Scalability with Dimensionality....................................................38
6  Scalability with Different Values of k.............................................39
Conclusions and Future Work..........................................................40
Reference............................................................................41

                                

[1] S. Borzsonyi, D. Kossmann, and K. Stocker. The Skyline Operator. In Proc. ICDE, pages 421–430, 2001.
[2] C.Y. Chan, H. Jagadish, K.-L. Tan, A. Tung, and Z. Zhang. Finding k-Dominant Skylines in High Dimensional Space. In Proc. SIGMOD, pages 503–514, 2006.
[3] C.Y. Chan, H. Jagadish, K.-L. Tan, A. Tung, and Z. Zhang. On High Dimensional Skylines. In Proc. EDBT, pages 478–495, 2006.
[4] R. Cheng, Y. Xia, S. Prabhakar, R. Shah, J. S. Vitter, “Efficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data,” In Proc. VLDB, pages 876-887, 2004.
[5] J. Chomicki, P. Godfrey, J. Gryz, and D. Liang. Skyline with Presorting. In Proc. ICDE, pages 717–719, 2003.
[6] P. Godfrey, R. Shipley, and J. Gryz, “Maximal Vector Computation in Large Data Sets,” In Proc. VLDB, pages 229–240, 2005.
[7] W. Jin, A.K.H. Tung, M. Ester, J. Han. On Efficient Processing of Subspace Skyline Queries on High Dimensional Data. In Proc. SSDBM, pages 12-12, 2007.
[8] D. Kossman, F. Ramsak, and S. Rost. Shooting Stars in the Sky: An Online Algorithm for Skyline Queries. In Proc. VLDB, pages 275–286, 2002.
[9] H. Kung, F. Luccio, and F. Preparata. On finding the maxima of a set of vectors. Journal of the ACM, 22(4): 469–476, 1975.
[10] C. Li, B. Ooi, A. Tung, and S. Wang. DADA: A Data Cube for Dominant Relationship Analysis. In Proc. SIGMOD, pages 659–670, 2006.
[11] X. Lin, Y. Yuan, W. Wang, and H. Lu, “Stabbing the Sky: Efficient Skyline Computation over Sliding Windows,” In Proc. ICDE, pages 502–513, 2005.
[12] M. Morse, J.M. Patel, H.V. Jagadish. Efficient Skyline Computation over Low-Cardinality Domains. In Proc. VLDB, pages 267-278, 2007.
[13] D. Papadias, Y. Tao, G. Fu, and B. Seeger. An Optimal and Progressive Algorithm for Skyline Queries. In Proc. SIGMOD, pages 467–478, 2003.
[14] J. Pei, W. Jin, M. Easter, and Y. Tao. Catching the Best View of Skyline: A Semantic Approach Based on Decisive Subspaces. In Proc. VLDB, pages 253–264, 2005.
[15] J. Pei, A.W.C. Fu, X. Lin, H. Wang. Computing Compressed Multidimensional Skyline Cubes Efficiently. In Proc. ICDE, pages 96-105, 2007.
[16] J. Pei, B. Jiang, X. Lin, Y. Yuan. Probabilistic Skylines on Uncertain Data. In Proc. VLDB, pages 15-26, 2007
[17] K.L. Tan, P.K. Eng, B.C. Ooi. Efficient progressive skyline computation In Proc. VLDB, pages 301-310, 2001
[18] Y. Tao, X. Xiao, J. Pei. SUBSKY: Efficient Computation of Skylines in Subspaces. In Proc. ICDE, pages 65-65, 2006
[19] A. Vlachou, C. Doulkeridis, Y. Kotidis, M. Vazirgiannis. SKYPEER: Efficient Subspace Skyline Computation over Distributed Data. In Proc. ICDE, pages 416-425, 2007.
[20] Y. Yuan, X. Lin, Q. Liu, W. Wang, J. Yu, and Q. Zhang. Efficient computation of the skyline cube. In Proc. VLDB, pages 241–252, 2005.
[21] T. Xia, D. Zhang. Refreshing the Sky: The Compressed Skycube with Efficient Support for Frequent Updates. In Proc. SIGMOD, pages 491-502, 2006.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文