簡易檢索 / 詳目顯示

研究生: 蘇惠珠
Su, Amber Hui-Zhu
論文名稱: 具不確定性資料串流之連續型機率天際線查詢
Continuous Probabilistic Skyline Queries over Uncertain Data Streams
指導教授: 陳良弼
Chen, Arbee L.P.
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2010
畢業學年度: 98
語文別: 英文
論文頁數: 29
中文關鍵詞: 不確定性資料資料串流連續型查詢機率天際線
外文關鍵詞: Uncertain data, Data stream, Continuous query, Probabilistic skyline
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,許多針對不確定資料,處理機率天際線查詢的方法已被提出。在這些方法中,定義不確定的物件是一個物件由許多資料所構成,每個資料帶有一個機率值。而所謂的機率天際線即是從這些不確定的物件中,找到所有沒有被任何其他不確定的物件所支配,且天際線機率是高於使用者所給定的門檻值的不確定的物件,即為機率天際線的答案。然而,在許多即時應用環境中,資料以連續性且不間斷方式進入系統。因此在這篇論文中,探討如何在不確定的資料串流下,處理連續型機率天際線的查詢。此外,隨著日月累積,資料量越來越多,舊有的資料相對可參考的價值降低,因此利用滑動視窗來記錄最近的單位時間內的資料。為了避免隨著時間而滑動視窗改變,須重新計算所有不確定的物件的天際線機率值,因此藉由估計不確定的物件的天際線機率上下限值,減少計算成本,達到快速判斷此物件是否為答案的可能性。故在此論文中,提出兩個方法去解決此問題。第一個方法為以現有的在靜態且確定的資料下,計算天際線的方法為基礎,針對此篇論文應用環境做改變,利用不斷更新天際線機率的上下限值,進而達到有效判斷不確定的物件是否為答案。第二個方法是針對不確定的物件的一些資料,記錄彼此的支配關係,設計資料結構,利用此去計算天際線機率的上下限值。此外,利用此兩種方法去解決在不確定的資料串流下,連續地處理前k個機率天際線的查詢之問題。並且,藉由一系列的實驗驗證所提出的兩種方法效能的表現。


    Recently, some approaches of finding probabilistic skylines on uncertain data have been proposed. In these approaches, a data object is composed of instances, each associated with a probability. The probabilistic skyline is then defined as a set of non-dominated objects with probabilities exceeding or equaling a given threshold. In many applications, data are generated as a form of continuous data streams. Accordingly, we make the first attempt to study a problem of continuously returning probabilistic skylines over uncertain data streams in this thesis. Moreover, the sliding window model over data streams is considered here. To avoid recomputing the probability of being not dominated for each uncertain object according to the instances contained in the current window, our main idea is to estimate the bounds of these probabilities for early determining which objects can be pruned or returned as results. We first propose a basic algorithm adapted from an existing approach of answering skyline queries on static and certain data, which updates these bounds by repeatedly processing instances of each object. Then, we design a novel data structure to keep dominance relation between some instances for rapidly tightening these bounds, and propose a progressive algorithm based on this new structure. Moreover, these two algorithms are also adapted to solve the problem of continuously maintaining top-k probabilistic skylines. Finally, a set of experiments are performed to evaluate these algorithms, and the experiment results reveal that the progressive algorithm much outperforms the basic one, directly demonstrating the effectiveness of our newly designed structure.

    Acknowledgement i Abstract ii Table of Contents iii List of Figures iv 1 Introduction 1 2 Related Works 5 3 Preliminaries 7 3.1 Problem Definition 7 3.2 The Kernel Idea of Our Solutions 9 4 A Basic Algorithm 11 4.1 A Data Structure for Keeping All Instances in the Sliding Window 11 4.2 The Basic Algorithm 12 5.1 Time Dominant Graph 14 5.2 Maintenance of TDG 15 5.3 The TDG Algorithm 17 6 An Extension 20 7 Performance Evaluation 21 8 Conclusion 27 References 28

    [1] M.J. Atallah and Y. Qi. Computing all skyline probabilities for uncertain data. In Proceedings of the Twenty-Eigth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS2009, Rhode Island, USA, pp. 279-287.
    [2] S. Börzsönyi, D. Kossmann, and K. Stocker. The skyline operator. In Proceedings of the 17th International Conference on Data Engineering, ICDE2001, Heidelberg , Germany, pp. 421 - 430.
    [3] J. Chomicki, P. Godfrey, J. Gryz, and D. Liang. Skyline with presorting. In Proceedings of the 19th International Conference on Data Engineering, ICDE2003, Bangalore, India, pp. 717-816.
    [4] http://www.databaseSports.com/
    [5] P. Godfrey, R. Shipley, and J. Gryz. Maximal vector computation in large data sets. In Proceedings of the 31st International Conference on Very Large Data Bases, VLDB2005, Trondheim, Norway, pp. 229-240.
    [6] D. Kossmann, F. Ramsak, and S. Rost. Shooting Stars in the sky: An online algorithm for skyline queries. Proceedings of 28th International Conference on Very Large Data Bases, VLDB2002, Hong Kong, China, pp. 275-286.
    [7] J.J. Li, S.L. Sun, and Y.Y. Zhu. Efficient maintaining of skyline over probabilistic data stream. The 4th International Conference on Natural Computation, ICNC 2008, Jinan, China, pp. 378 – 382.
    [8] X. Lin, Y. Yuan, W. Wang, and H. Lu. Stabbing the sky: Efficient skyline computation over sliding windows. In Proceedings of the 21st International Conference on Data Engineering, ICDE2005, Tokyo, Japan, pp. 502-513.
    [9] K.C.K. Lee, B. Zheng, H. Li, and W.C. Lee. Approaching the skyline in Z order. In Proceedings of the 33th International Conference on Very Large Data Bases, VLDB2007, Vienna, Austria, pp. 279-290.
    [10] J. Pei, B. Jiang, X. Lin, and Y. Yuan. Probabilistic skylines on uncertain data. In Proceedings of the 33th International Conference on Very Large Data Bases, VLDB2007, Vienna, Austria, pp. 15-26.
    [11] Y. Tao and D. Papadias. Maintaining sliding window skylines on data streams. IEEE Transactions on Knowledge and Data Engineering, TKDE2006, 18(2), pp. 377-391.
    [12] L. Zou and L. Chen. Dominant Graph: An efficient indexing structure to answer top-k queries. In Proceedings of the 24th International Conference on Data Engineering, ICDE2008, Cancún, México, pp. 536-545.
    [13] W. Zhang, X. Lin, Y. Zhang, W. Wang, and J.X. Yu. Probabilistic skyline operator over sliding windows. In Proceedings of the 25th International Conference on Data Engineering, ICDE2009, Shanghai, China, pp. 1060-1071.
    [14] S. Zhang, N. Mamoulis, and D.W. Cheung. Scalable skyline computation using object-based space partitioning. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD2009, Providence, Rhode Island, USA, pp. 483-494.
    [15] P. Godfrey, R. Shipley, and J. Gryz. Maximal vector computation in large data sets. In Proceedings of the 31st International Conference on Very Large Data Bases, VLDB2005, Trondheim, Norway, pp. 229-240.
    [16] l. Bartolini, P. Ciaccia, and M. Patella. Efficient sort-based skyline evaluation. ACM Transactions on Database Systems, TODS2008, 33(4), pp.31-49.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE