簡易檢索 / 詳目顯示

研究生: 林柏臣
Lin, Po-Chen
論文名稱: 嵌入式系統設計中規格手冊之端對端表格擷取
End-to-End Table Extraction from Datasheets for Embedded Systems Design
指導教授: 周百祥
Chou, Pai H.
口試委員: 韓永楷
Hon, Wing-Kai
謝孫源
Hsieh, Sun-Yuan
李皇辰
Lee, Huang-Chen
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 72
中文關鍵詞: 嵌入式系統電子設計自動化資訊擷取表格Transformer
外文關鍵詞: Embedded system, Electronic Design Automation, Information Extraction, Table, Transformer
相關次數: 點閱:74下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文提出了一種表格擷取工具TABLET,用於從商用現貨(COTS)元件規格手冊中提取表格結構和內容。表格為各領域呈現技術數據之關鍵形式,其簡潔易讀的特性廣受工程師青睞。然而,由於這些表格主要是為人類閱讀而繪製的,因此在繪製和解釋方式上缺乏標準。此外,儘管 PDF 是標準的電子文件格式,但這些規格書的PDF文件通常由不同的工具創建,幾乎沒有留下任何結構和詮釋資料來輔助表格擷取。即使是像ChatGPT-4和Gemini等基於LLM的強大AI工具,仍然無法正確解析許多此類表格。自動擷取這些表格對於構建新一代基於商用現貨(COTS)的嵌入式系統設計工具的元件庫是必要的,因為這些工具需要規格手冊知識來協助設計人員進行正確性檢查、元件推薦和連接建議。

    我們提出的TABLET是一種視覺Transformer方法,它將結構識別和光學字符識別(OCR)整合為一個端到端模型。它輸入PDF文件中表格的光柵化圖像,並以HTML或任何其他表格格式輸出識別的表格。實驗結果表明,我們提出的工具在速度和準確性方面都優於最先進的表格擷取方法。


    This thesis proposes TABLET, a table parser for extracting the tabular structure and content from datasheets for commercial off-the-shelf (COTS) components. Tables are one of the most prevalent and crucial forms for presenting technical data in multiple domains in a concise, easy-to-read way. However, since these tables are drawn mainly for human consumption, there is a lack of rigorous standard in the way they are drawn and interpreted. Moreover, despite being the standard electronic document file format, PDF files for these datasheets are often created by different tools that leave little or no structure and no metadata to assist with table extraction. Even the powerful LLM-based AI tools such as ChatGPT-4o and Gemini still fail to parse many such tables correctly. Automatic extraction of these tables is necessary for constructing the component library for a new generation of design tools for COTS-based embedded systems, because the tools need the datasheet knowledge to assist designers with correctness checking, component recommendations, and connection suggestions.

    Our proposed TABLET is a vision-transformer approach that integrates structural recognition and optical character recognition (OCR) in an end-to-end model. It inputs a rasterized image of the table from the PDF file and outputs the recognized table in HTML or any other tabular data format. Experimental results show that our proposed tool outperforms the state-of-the-art table extraction approaches in terms of speed and accuracy.

    Contents i Acknowledgments vi 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 COTS-based System Design . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.2 New Design Tool: Sysmaker . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.3 Table Extraction from Datasheets . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 Related Work 7 2.1 Rule-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Object Detection-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Graph-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.4 Sequence-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3 Methodology 11 3.1 Sysmaker Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.1.1 Design Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.1.2 Specification Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1.3 Interface Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1.4 Resource Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1.5 Structural Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1.6 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Component Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3 Scope of This Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4 Problem Statement 18 4.1 Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.2.1 Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2.2 Additional Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5 Technical Approach 25 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5.1.1 Vision Transformer Approach . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.1.2 Combined Tasks within the Transformer Model . . . . . . . . . . . . . . . . 26 5.2 Model Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5.2.1 Swin Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5.2.2 BART . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5.3 Output Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.3.1 Advantages of HTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.3.2 Disadvantages of Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.4 Training Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 5.4.1 Pre-Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 5.4.2 Fine-Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.5 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.6 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.6.1 Greedy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.6.2 Beam search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.7 Enhancement Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 6 Implementation 40 6.1 Environment and Training Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 6.2 Visualization tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 6.3 Extraction pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 7 Results and Evaluation 45 7.1 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 7.1.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 7.1.2 Character Error Rate (CER) . . . . . . . . . . . . . . . . . . . . . . . . . . 46 7.1.3 Grid-based Table Structure (GriTS) . . . . . . . . . . . . . . . . . . . . . . 46 7.1.4 Tree-Edit-Distance-based Similarity (TEDS) . . . . . . . . . . . . . . . . . 46 7.1.5 Structural Tree-Edit-Distance-based Similarity (S-TEDS) . . . . . . . . . . . 47 7.2 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 7.2.1 PubTables-1M . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 7.2.2 Datasheet Table Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . 49 7.3 Repetitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 8 Conclusions and Future Work 58 8.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 8.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Appendix A 66

    [Alt92] Naomi S Altman. An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3):175–185, 1992.
    [Boc21] Bocsh. BMA253 Digital, triaxial acceleration sensor, October 2021. Available at https://www.bosch-sensortec.com/media/boschsensortec/downloads/datasheets/bst-bma253-ds000.pdf, v1.4.
    [CFBEY+22] Zach Colter, Morteza Fayazi, Zineb Benameur-El Youbi, Serafina Kamp, Shuyan Yu, and Ronald Dreslinski. Tablext: A combined neural network and heuristic based table extractor. Array, 15:100220, 2022.
    [CHX+19] Zewen Chi, Heyan Huang, Heng-Da Xu, Houjin Yu, Wanxuan Yin, and Xian-Ling Mao. Complicated table structure recognition. arXiv preprint arXiv: 1908.04729, 2019.
    [CMS+20] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part I, volume 12346 of Lecture Notes in Computer Science, pages 213–229. Springer, 2020.
    [Esp24] Espressif Systems. ESP32-C3 Series, April 2024. Available at https://www.espressif.com/sites/default/files/documentation/esp32-c3_datasheet_en.pdf, v1.7.
    [EVGW+10] Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2):303–338, 2010.
    [GLW+21] Zheng Ge, Songtao Liu, Feng Wang, Zeming Li, and Jian Sun. YOLOX: Exceeding YOLO series in 2021. arXiv preprint arXiv: 2107.08430, 2021.
    [GVM22] Andrea Gemelli, Emanuele Vivoli, and S. Marinai. Graph neural networks and representation embedding for table extraction in PDF documents. International Conference on Pattern Recognition, 2022.
    [HBD+19] Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The curious case of neural text degeneration. International Conference on Learning Representations, 2019.
    [HZRS15] Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. Computer Vision and Pattern Recognition, 2015.
    [IP-23] IP-XACT Working Group. IEEE standard for IP-XACT, standard structure for packaging, integrating, and reusing IP within tool flows. IEEE Std 1685-2022 (Revision of IEEE Std 1685-2014), pages 1–750, 2023.
    [KHY+21] Geewook Kim, Teakgyu Hong, Moonbin Yim, J. Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, and Seunghyun Park. OCR-free document understanding transformer. European Conference on Computer Vision, 2021.
    [LHY+20] Yiren Li, Zheng Huang, Junchi Yan, Yi Zhou, Fan Ye, and Xianhui Liu. GFTE: Graph-based financial table extraction. ICPR Workshops, 2020.
    [LLC+21] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin Transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
    [LLG+19] M. Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdel rahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. Annual Meeting of the Association for Computational Linguistics, 2019.
    [LLWL23] Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors, Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, 2023.
    [LPP+20] Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive nlp tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY, USA, 2020. Curran Associates Inc.
    [Mid24] Midjourney. Midjourney AI art generator. https://www.midjourney.com, July 2024. Accessed: 2024-07-07.
    [NW70] Saul B. Needleman and Christian D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48(3):443–453, 1970.
    [Ope24] Open AI. ChatGPT-4. https://openai.com/chatgpt, July 2024. Accessed: 2024-07-07.
    [PCL+24] ShengYun Peng, Aishwarya Chakravarthy, Seongmin Lee, Xiaojing Wang, Rajarajeswari Balasubramaniyan, and Duen Horng Chau. UniTable: Towards a unified framework for table recognition via self-supervised pretraining. arXiv preprint arXiv: 2403.04822, 2024.
    [PEL+23] Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. SDXL: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv: 2307.01952, 2023.
    [QMS19] S. Qasim, Hassan Mahmood, and F. Shafait. Rethinking table recognition using graph neural networks. IEEE International Conference on Document Analysis and Recognition, 2019.
    [RDGF15] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You Only Look Once: Unified, real-time object detection. arXiv preprint arXiv: 1506.02640, 2015.
    [RHGS15] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards realtime object detection with region proposal networks. arXiv preprint arXiv: 1506.01497, 2015.
    [RSR+19] Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv: 1910.10683, 2019.
    [RWC+19] Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
    [Smi07] Ray Smith. An overview of the Tesseract OCR Engine. In ICDAR ’07: Proceedings of the Ninth International Conference on Document Analysis and Recognition, pages 629–633, Washington, DC, USA, 2007. IEEE Computer Society.
    [SPA22] Brandon Smock, Rohith Pesala, and Robin Abraham. PubTables-1M: Towards comprehensive table extraction from unstructured documents. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4634–4642, June 2022.
    [SPA23] Brandon Smock, Rohith Pesala, and Robin Abraham. GriTS: Grid table similarity metric for table structure recognition. In Gernot A. Fink, Rajiv Jain, Koichi Kise, and Richard Zanibbi, editors, Document Analysis and Recognition - ICDAR 2023 - 17th International Conference, San José, CA, USA, August 21-26, 2023, Proceedings, Part V, volume 14191 of Lecture Notes in Computer Science, pages 535–549. Springer, 2023.
    [STM22] STMicroelectronics. STA8089GA Automotive GPS/Galileo/GLONASS/BeiDou/QZSS receiver, December 2022. Available at https://www.st.com/resource/en/datasheet/sta8089ga.pdf, Rev 8.
    [STM24] STMicroelectronics. STM32F070CB STM32F070RB STM32F070C6 STM32F070F6 ARM®-based 32-bit MCU, up to 128 KB flash, USB FS 2.0, 11 timers, ADC, communication interfaces, 2.4 - 3.6 V, April 2024. Available at https://www.st.com/resource/en/datasheet/stm32f070c6.pdf, Rev 4.
    [Tex22] Texas Instruments. TPS2521xx, 2.7V–5.7V, 4A, 31mΩ True Reverse Current Blocking eFuse with Input Reverse Polarity Protection, March 2022. Available at https://www.ti.com/lit/ds/symlink/tps2521.pdf, Revision A.
    [UI24] Unstructured-IO. unstructured-inference, 2024. Accessed: 2024-07-11.
    [ZSJ20] Xu Zhong, Elaheh ShafieiBavani, and Antonio Jimeno-Yepes. Image-based table recognition: Data, model, and evaluation. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXI, volume 12366 of Lecture Notes in Computer Science, pages 564–580. Springer, 2020.

    QR CODE