研究生: |
薛佩如 Juliana Hsieh |
---|---|
論文名稱: |
資料串流環境下查詢方案的最佳化策略 An Optimization Strategy for Efficient Query Execution over Streaming Sources |
指導教授: |
陳良弼
Arbee L.P. Chen |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2006 |
畢業學年度: | 94 |
語文別: | 英文 |
論文頁數: | 37 |
中文關鍵詞: | 查詢最佳化 、多元連接 、資料串流 、連續查詢 |
外文關鍵詞: | query optimization, multiway join, streaming, continuous queries |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著網際網路的發達,網路的安全也日益重要!因應網路的犯罪及病毒的流竄,各網際網路服務提供者開始監控網路的資料串流,為了能夠找出網路中不正常的使用者位址,必須對通過網路伺服器或是網路交換器的各個串流的資料做連接。在已知的技術中,多個資料串流的連接有下列幾種查詢計劃,分別是(1)二元連接,(2)多元連接,和(3)綜合連接。在不一樣的網路環境中,例如流速以及連接率,對不一樣的連接方式會產生不一樣的處理時間,為了因應網路的龐大資料量,我們必須根據不一樣的流速及連接率找到處理時間最少的查詢計畫,也稱為查詢最佳化。根據資料串流的數目,我們產生所有的查詢計畫。窮舉法對所有產生的查詢計畫計算出各自的費用,而後找出最佳計畫。根據實驗,雖然窮舉法總是可以為我們找到最佳的查詢計畫,但是最佳化所需的時間以及記憶體最多只允許我們處理6∼7個資料串流。窮舉法導致時間及記憶體的龐大。因此,我們必須減少產生出的查詢計畫,因此我們提出貪婪法。在產生查詢計畫的步驟上,對在加入資料串流後所產生的所有查詢計畫,根據各個還未加入資料串流的資訊先行計算出所有計畫的期望費用,我們只對有最小期望費用的計畫產生新的查詢計畫,因此我們每次比較所有已經產生出計畫的期望費用,我們將此費用訂為臨界值,比較此臨界值與所有未完成計畫的期望費用,若有任何查詢計畫的期望費用小於此臨界值,此臨界值的查詢計畫將是我們最終的近似最佳計畫。根據實驗,雖然我們無法保證貪婪法能夠每次都找到最佳計畫,但是我們找到的計畫都相當接近最佳計畫,而且可以應用於20個資料串流以上的環境。因此,我們提出的方法可以在時間及空間上有效的解決多資料串流連接的問題。
Continuous queries over data streams, particularly the joins of streams, have gained popularity as the scope of their applications has increased in the past years. Applications range from network monitoring to sensor processing for environmental monitoring or inventory tracking. The cost of evaluating such queries over streaming sources may vary according to the order in which the joins of streams are processed. In order to lower the cost of executing a query, the query optimizer needs to generate an execution plan that better fits the current conditions of the environment. Existing optimizers try to resolve the above problem by finding a better probing order for multi-way join operators or choosing a better sequence for the binary join operators. However, there are cases where the performance of a hybrid plan (query plan containing both types of operators) exceeds the performance of query plans composed of a single multi-way operator or trees binary join operators. We address the problem of finding a low-cost execution plan in order to execute continuous multi-way join queries over infinite data streams. The search space encompasses plans consisting of a single multi-way operator, plans composed of binary join operators, and hybrid plans. We propose heuristics with a partial cost-based optimization technique to address the three main components of a query optimizer, namely the search space, the cost model and the search strategy. The cost model is used to evaluate all feasible query plans and the heuristics are used to prune the candidates that cannot lead us to good plans. In our work, we evaluate the performance of the proposed approach by comparing the time needed to produce the low-cost query execution plan and quality of our result with the optimum solution and the single multi-way operator with probing order. The result shows that our methodology can find a better plan for the current environment and that is close to the optimal query plan.
[1] A. Arasu, B. Babcock, S. Babu, J. McAlister, and J. Widom. Characterizing Memory Requirements for Queries over Continuous Data Streams. In Proc. of the ACM PODS Symp. on Principles of Database Systems, 2002, pp. 221-232.
[2] A. M. Ayad and J. Naughton. Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, 2004.
[3] B. Babcock, S. Babu, M. Datar, R. Motwani, J. Widom. Models and Issues in Data Stream Systems. In Proc. of the ACM PODS Symp. on Principles of Database Systems, 2002.
[4] S. Babu, and J. Widom. Continuous Queries over Data Streams. SIGMOD Record, 2001, vol. 30.
[5] S. Babu and J. Widom. Exploiting k-Constraints to Reduce Memory Overhead in Continuous Queries over Data Streams. ACM TODS Transactions on Database Systems, 2004, vol. 29.
[6] S. Babu, and J. Widom. StreaMon: An Adaptive Engine for Stream Query Processing. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, 2004.
[7] S. Babu, R. Motwani, K. Munagala. Adaptive Ordering of Pipelined Stream Filters. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, 2004.
[8] S. Chandrasekaran and M. J. Franklin. Streaming Queries over Streaming Data. In Proc. of VLDB Very Large Data Bases, 2002.
[9] S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss, and M. Shah. TelegraphCQ: Continuous DataFlow Processing for an Uncertain World. In Proc. of the CIDR Conference, 2003.
[10] J. Chen, D. J. DeWitt, F. Tian, Y. Wang. NiagaraCQ: A Scalable Continuous Query System for Internet Databases. In Proc. of the ACM SIGMOD Conf. on Management of Data, 2000.
[11] A. Dobra, J. Gehrke, M. Garofalakis, and R. Rastogi. Processing Complex Aggregate Queries over Data Streams. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, 2002.
[12] L. Golab and M. T. Özsu. Processing Sliding Window Multi-Joins in Continuous Queries over Data Streams. In Proc. of the VLDB Int. Conf. on Very Large Data Bases, 2003.
[13] M. A. Hammad, W. G. Aref and A. K. Elmagarmid. Stream Window Join: Tracking Moving Objects in Sensor-Network Databases. In Proc. of the SSDBM Int. Conf. on Scientific and Statistical Database Management, 2003.
[14] Z. G. Ives, D. Florescu, M. Friedman, A. Y. Levy, D. S. Weld. An Adaptive Query Execution System for Data Integration. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, 1999, pp.299-310.
[15] J. Kang, J. F. Naughton, and S. D. Viglas. Evaluating Window Joins over Unbounded Streams. In Proc. of the ICDE Int. Conf. on Data Engineering, 2003.
[16] H. Kostowski and K. T. Claypool. Analytical and Experimental Evaluation of Stream-Based Join. In Proc. of the ICEIS International Conference on Enterprise Information Systems, 2005.
[17] C. Lee, C. S. Shih, and Y. H. Chen. Optimizing Large Join Queries Using a Graph-Based Approach. IEEE Transactions on Knowledge and Data Engineering, 2001, vol. 13, no. 2.
[18] C. G. Legaria, A. Pellenkoft, and M. Kersten. Fast, Randomized Join-Order Selection – Why Use Transformations? In Proc. of the VLDB Int. Conf. on Very Large Data Bases, 1994.
[19] S. Madden, M. Shah, J. M. Hellerstein, and V. Raman. Continuously Adaptive Continuous Queries over Streams. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, 2002.
[20] J. F. Naughton, D. J. DeWitt, D. Maier, A. Aboulnaga, J. Chen, L. Galanis, J. Kang, R. Krishnamurthy, Q. Luo, N. Prakash, R. Ramamurthy, J. Shanmugasundaram, F. Tian, K. Tufte, S. Viglas. Y. Wang, C. Zhang, B. Jackson, A. Gupta, R. Chen. The Niagara Internet Query System. IEEE Transactions on Knowledge and Data Engineering, 2001.
[21] W. Scheufele, G. Moerkotte. On the Complexity of Generating Optimal Plan with Cross Products. In Proc. of the ACM PODS Symp. on Principles of Database Systems, 1997.
[22] M. Steinbrunn, G. Moerkotte, and A. Kemper. Heuristic and Randomized Optimization for the Join Ordering Problem. The International Journal on Very Large Data Bases, 1997.
[23] T. Urhan and M. J. Franklin. XJoin: A Reactively-Scheduled Pipelined Join Operator. IEEE Data Engineering Bulletin, 2000, vol. 23, no.2, pp.27-33.
[24] S. D. Viglas and J. F. Naughton. Rate-Based Query Optimization for Streaming Information Sources. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, 2002, pp.37-48.
[25] S. Viglas, J. Naughton, and J. Burger. Maximizing the Output Rate of Multi-way Join Queries over Streaming Information Sources. In Proc. of the VLDB Int. Conf. on Very Large Data Bases, 2003.
[26] F. Waas and A. Pellenkoft. Probabilistic Bottom-up Join Order Selection – Breaking the Curse of NP-completeness. CWI, 1999.
[27] A. N. Wilschut and P. M. G. Apers. Pipelining in Query Execution. Conference on Databases, Parallel Architectures and their Applications, 1991.
[28] Y. Zhu, E. A. Rundensteiner and G.. T. Heineman. Dynamic Plan Migration for Continuous Queries over Data Streams. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, 2004.