簡易檢索 / 詳目顯示

研究生: 李澤威
Lee, Tse-Wei
論文名稱: 採用基於資料溫度的壓縮機制結合非揮發性記憶體的日誌結構合併樹應用
TbLSM: Temperature-based Data Compaction for Log-Structured Merge Tree with NVM
指導教授: 石維寬
Shih, Wei-Kuan
口試委員: 張原豪
Chang, Yuan-Hao
梁郁珮
Liang, Yu-Pei
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2022
畢業學年度: 111
語文別: 英文
論文頁數: 31
中文關鍵詞: 日誌結構合併樹非揮發性記憶體資料庫系統資料溫度資料壓縮
外文關鍵詞: data compaction, data temperature
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 非揮發性記憶體 ( NVM ) 不僅在系統斷電之後保證資料不會消失,還提供了和DRAM相同等級的讀寫速度。而基於DRAM-SSD的日誌結構合併樹 (LSM-Tree)是一種分層的文件管理方法,被廣泛應用於鍵-值的存儲系統,可以利用混和NVM的架構來取得更好的性能。先前已經有許多這方面的研究了,但其中大多數都沒有考慮到鍵的實際使用情況。因此,在此篇論文中,我們提出了 TbLSM,一種在DRAM-NVM-SSD架構上基於鍵溫度的數據壓縮方法的LSM-Tree,除了能減少 DRAM 和 SSD之間的序列化-反序列化開銷,還通過永久存儲的分層設計來縮短被頻繁訪問的鍵其讀取延遲。設計核心在於: TbLSM是利用了 NVM 的字節尋址能力 ( byte-addressability ) 和低延遲 ( low latency ) 的特性,把鍵按照其訪問頻率來進行分類。我們將 TbLSM 與以下基於 LSM-Tree 的系統進行比較:TLSM 和 LevelDB-NVM。經過評估表明,使用基於溫度的壓縮的 TbLSM 能夠比 TLSM 和 LevelDB-NVM 具有更好的性能。


    Non-volatile memory (NVM) not only provides data persistence but is almost as fast as DRAM. The Log-Structured Merge-tree (LSM-Tree) is a file management data structure, which is based on the DRAM-SSD architecture and is widely used in key-value storage systems. It can achieve better performance by using the hybrid architecture with NVM. There's a lot of research on this, but most of them do not take into account the key usage. In this paper, we proposed TbLSM, temperature-based data compaction for LSM-tree with NVM to reduce the overhead of serialization-deserialization between DRAM and SSD and to shorten the read latency of frequent-accessed keys by the persistent storage tiering design. The core idea is that: TbLSM leverages the byte-addressability and low latency of NVM to classify keys by their access frequency. We compare TbLSM with LSM-Tree-based systems: TLSM and LevelDB-NVM. Our evaluations show that TbLSM which uses temperature-based compaction has better performance than TLSM and LevelDB-NVM.

    Contents Abstract (Chinese) I Abstract II Contents III List of Figures V List of Tables VI 1 Introduction 2 2 Background & Related work 4 2.1 LSM-tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.1 LevelDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 NVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.1 PMDK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 LSM-tree with NVM . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3.1 NoveLSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3.2 TLSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3 Motivation 10 4 Methodology 11 III 4.1 Design of Node Structure . . . . . . . . . . . . . . . . . . . . . . . . 11 4.2 Temperature-based Compaction . . . . . . . . . . . . . . . . . . . . 12 4.2.1 Level definition . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.2.2 Trigger Compaction . . . . . . . . . . . . . . . . . . . . . . . 14 4.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.2.4 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 5 Evaluation 20 5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5.2 Workloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.2.1 DB Bench . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.2.2 YCSB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.3 Set Reference Times Threshold . . . . . . . . . . . . . . . . . . . . 22 5.4 Overhead And Performance . . . . . . . . . . . . . . . . . . . . . . 23 5.4.1 Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5.4.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5.5 The Impact Of Data Size . . . . . . . . . . . . . . . . . . . . . . . . 27 6 Conclusion 29 Bibliography 30

    [1] Joy Arulraj and Andrew Pavlo. How to build a non-volatile memory database
    management system. In Proceedings of the 2017 ACM International Confer-
    ence on Management of Data, pages 1753–1758, 2017.
    [2] Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C Hsieh, Deborah A Wal-
    lach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E Gruber.
    Bigtable: A distributed storage system for structured data. ACM Transac-
    tions on Computer Systems (TOCS), 26(2):1–26, 2008.
    [3] Sanjay Ghemawat and Jeff Dean. Leveldb. https://github.com/google/
    leveldb.
    [4] Sudarsun Kannan, Nitish Bhat, Ada Gavrilovska, Andrea Arpaci-Dusseau,
    and Remzi Arpaci-Dusseau. Redesigning {LSMs} for nonvolatile memory
    with {NoveLSM}. In 2018 USENIX Annual Technical Conference (USENIX
    ATC 18), pages 993–1005, 2018.
    [5] Jihwan Lee, Won Gi Choi, Doyoung Kim, Hanseung Sung, and Sanghyun
    Park. Tlsm github open source. https://github.com/hwani3142/
    leveldb-pmdk.
    [6] Jihwan Lee, Won Gi Choi, Doyoung Kim, Hanseung Sung, and Sanghyun
    Park. Tlsm: Tiered log-structured merge-tree utilizing non-volatile memory.
    IEEE Access, 8:100948–100962, 2020.
    30
    [7] Wenjie Li, Dejun Jiang, Jin Xiong, and Yungang Bao. Hilsm: an lsm-based
    key-value store for hybrid nvm-ssd storage systems. In Proceedings of the
    17th ACM International Conference on Computing Frontiers, pages 208–216,
    2020.
    [8] Apache HBase team. Apache hbase. https://hbase.apache.org/.
    [9] Cassandra team. Cassandra. http://cassandra.apache.org/.
    [10] RocksDB team. Rocksdb. https://github.com/facebook/rocksdb.
    [11] Wikipedia contributors. Cold data — Wikipedia, the free encyclope-
    dia. https://en.wikipedia.org/w/index.php?title=Cold_data&oldid=
    1094308249, 2022. [Online; accessed 22-September-2022].
    [12] Wikipedia contributors. Leveldb — Wikipedia, the free encyclope-
    dia. https://en.wikipedia.org/w/index.php?title=LevelDB&oldid=
    1108873612, 2022. [Online; accessed 19-September-2022].
    [13] Ling Zhan, Kai Lu, Zhilong Cheng, and Jiguang Wan. Rangekv: an effi-
    cient key-value store based on hybrid dram-nvm-ssd storage structure. IEEE
    Access, 8:154518–154529, 2020.

    QR CODE