簡易檢索 / 詳目顯示

研究生: 林士能
Shih-Neng Lin
論文名稱: 專利文件語意之擷取與比對
Semantic Information Extraction and Comparison for Patent Documents
指導教授: 蘇豐文
Von-Wun Soo
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2005
畢業學年度: 93
語文別: 中文
論文頁數: 159
中文關鍵詞: 專利申請範圍語意結構正規表示式相似度
外文關鍵詞: claim, semantic structure, regular expression, similarity
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在21世紀,智慧財資產已經成為經濟競爭力的關鍵,而專利文件能在法律上有效保障智慧財資產。然而,由於專利文件的數量呈爆發性的成長,專利申請範圍(claim)的格式也變化多端,現有的專利檢索、分析、比對技術面臨了嚴重的瓶頸。
    在我的論文中,提出了一個專利申請範圍語意結構(semantic structure)擷取與比對的方法。在專利文件語意結構擷取方面,若使用者選定一領域的某些專利文件進行剖析,使用者必須先建立專業領域辭庫與該領域的本體知識(ontology)以利系統做進一步的分析。接著,系統利用自然語言處理技術剖析和註記(annotate)每一條專利申請範圍,並且用正規表示式擷取出重要的資訊。最後,這些資訊便被轉換成為機讀式格式(machine readable format)的語意結構,XML和OWL,以便加速知識分享和知識推理,並可透過圖形化方式呈現此結構。
    在專利文件比對方面,使用者以某一篇專利為基礎,並從它的語意結構中選擇所要查詢的發明元件後,系統將執行相似度比對演算法(similarity algorithm),找出它篇專利中是否存在相似度高的發明元件。此演算法會考慮元件的語意註記資訊、元件的結構、元件的屬性來計算兩元件的相似度。
    經由實驗驗證得知,此方法能有效的剖析和擷取專利申請範圍的語意內容,並利用圖形化的方式呈現,協助使用者了解專利申請範圍的重點。另外,相似度比對演算法大致上也能找出類似的發明元件,提供使用者另一種異於關鍵字搜尋的專利檢索方法。


    In the 21st century, intelligient property has become the key factor of competitive strength of the global economy, and patent document can protect the intelligient property in law effectively. Unfortunately, because of the explosion in patent documents and the claim formats are full of variety, the technologies of patent documents in search、analysis、and comparison have faced some serious bottleneck.
    In my thesis, I propose an approach to extract the semantic structure of claim and compare their difference on the basis of semantic structures. In the aspect of semantic structure extraction, if the user chooses some patents of the same domain for the patent system to analyze, they need to construct the domain thesaurus and ontology first for further parsing. Second, the system will parse and annotate every claim by natural language processing technics and extract the important information from claims by regular expression. At last, this information is translated into semantic structure in machine readable formats, XML and OWL, for speeding the knowledge sharing and knowledge inference and can be displayed in graph.
    In the aspect of claim comparison, when the user chooses an invention component in the semantic structure of a patent to query, the system will execute the similarity algorithm to see if there are some similar components exist in other patents.
    The algorithm will take two components's semantic annotation、semanitc structure、 and attributes into consideration to calculate the similarity of the two components.
    The experimental results show that the approach can effectively parse and extract the semantic content of claims, and assist users to understand the the focal
    point of claims by GUI environment. On the other hand, the similarity algorithm can find the similar invention components substantially and provides a different way from
    keyword search to search patent.

    中文摘要.......................................................................................................................ii Abstract....................................................................................................................... iii 誌 謝......................................................................................................................iv 第一章 簡介...............................................................................................................1 1.1 緒論..............................................................................................................1 1.2 目的與動機..................................................................................................1 1.2.1 目的....................................................................................................2 1.2.2 動機....................................................................................................2 1.2.3 專利文件轉換為機讀式架構的優點................................................2 1.3 研究限制......................................................................................................3 1.4 論文架構......................................................................................................3 第二章 背景...............................................................................................................5 2.1 本體論與語意網..........................................................................................5 2.2 專利文件的簡介..........................................................................................9 2.3 專利文件檢索............................................................................................15 2.4 化學機械研磨............................................................................................18 第三章 專利文件語意結構分析與擷取.................................................................20 3.1 擷取專利文件語意結構的重要性............................................................22 3.2 專業領域辭典建立....................................................................................23 3.3 專利文件語法/語意註記..........................................................................24 3.4 正規表示式擷取語意結構........................................................................27 3.4.1 何謂正規表示式..............................................................................29 3.4.2 用來擷取專利語意結構的八類正規表示式..................................30 3.4.2.1 一般類..................................................................................31 3.4.2.2 專利申請範圍類..................................................................31 3.4.2.3 元件類..................................................................................32 3.4.2.4 參考類..................................................................................34 3.4.2.5 屬性類..................................................................................35 3.4.2.6 功能描述類..........................................................................38 3.4.2.7 從屬關係類..........................................................................39 3.4.2.8 空間關係類..........................................................................40 3.4.3 專利申請範圍語意結構擷取範例..................................................42 3.4.4 建立專利申請範圍的機讀式檔案..................................................47 3.5 圖形化呈現專利文件語意結構..................................................................48 第四章 專利申請範圍之發明元件相似度比對.....................................................49 4.1 相似度比對的相關研究..............................................................................50 4.2 相似度比對前的資訊收集..........................................................................52 4.2.1 元件資訊收集..................................................................................52 4.2.2 關係資訊收集..................................................................................53 4.3 元件相似度比對..........................................................................................55 4.3.1 專有名詞在辭典中的相似度比對..................................................56 4.3.2 元件的三元體統計比對..................................................................58 4.3.3 元件之屬性相似度比對..................................................................60 4.3.4 相似度比對演算法..........................................................................62 第五章 實驗.............................................................................................................63 5.1 專利申請範圍元件擷取實驗......................................................................64 5.1.1 實驗目的..........................................................................................64 5.1.2 實驗設計..........................................................................................64 5.1.3 實驗結果..........................................................................................65 5.1.3.1 元件詞性統計實驗結果......................................................65 5.1.3.2 詞性擷取元件實驗結果......................................................67 5.1.4 問題探討..........................................................................................69 5.1.5 實驗總結..........................................................................................70 5.2 正規表示式擷取專利文件語意結構實驗..................................................71 5.2.1 實驗目的..........................................................................................71 5.2.2 實驗設計..........................................................................................71 5.2.3 實驗結果..........................................................................................71 5.2.3.1 擷取訓練資料語意結構實驗..............................................73 5.2.3.2 擷取測試資料語意結構實驗..............................................74 5.2.4 問題探討..........................................................................................74 5.2.5 實驗總結..........................................................................................76 5.3 相似度比對演算法實驗..............................................................................78 5.3.1 實驗目的..........................................................................................78 5.3.2 實驗設計..........................................................................................78 5.3.3 實驗結果..........................................................................................78 5.3.4 問題探討..........................................................................................87 5.3.5 實驗總結..........................................................................................89 第六章 系統展示與操作.........................................................................................91 6.1 系統展示......................................................................................................92 6.1.1 專利維護工具..................................................................................92 6.1.2 辭典編輯工具..................................................................................93 6.1.3 本體知識建構工具..........................................................................93 6.1.4 正規表示式維護工具......................................................................94 6.1.5 專利語意結構擷取工具..................................................................94 6.1.6 專利語意結構展示圖工具..............................................................95 6.1.7 專利比對工具..................................................................................96 6.2 系統操作示範..............................................................................................98 第七章 結論與未來研究方向.................................................................................99 7.1 成果摘要......................................................................................................99 7.2 未來研究方向............................................................................................100 7.2.1 自動化分析專利申請範圍的寫作模式........................................100 7.2.2 專利語意結構擷取範圍的擴充....................................................101 7.2.3 相似度比對演算法的延伸............................................................101 參考文獻...................................................................................................................103 附錄............................................................................................................................105 附件一 正規表示式表....................................................................................105 1.1 一般類................................................................................................105 1.2 專利申請範圍類................................................................................106 1.3 元件類................................................................................................109 1.4 參考類................................................................................................ 111 1.5 屬性類................................................................................................113 1.6 功能描述類........................................................................................118 1.7 從屬關係類........................................................................................118 1.8 空間關係類........................................................................................123 附件二 化學機械研磨法領域辭典................................................................135 2.1 辭典架構圖........................................................................................135 2.2 OWL 格式的本體知識.........................................................................136 附件三 專利申請範圍轉為XML 和OWL 的格式..............................................142 3.1 XML 格式.............................................................................................142 3.2 OWL 格式-以專利6524176 申請範圍1 為例...................................144 圖目錄 圖 1 語意網技術層面圖...............................................6 圖 2 RDF 範例圖.....................................................7 圖 3 晶圓和研磨墊的示意圖...........................................8 圖 4 晶圓和研磨墊的三元體圖.........................................8 圖 5 用OWL 來描述晶元和研磨墊的關係.................................8 圖 6 美國國家專利局專利:6524176,第1 頁...........................10 圖 7 美國國家專利局專利:6524176,第3 頁...........................11 圖 8 專利分析流程圖................................................15 圖 9 研磨頭與研磨墊運動機構示意圖..................................19 圖 10 專利文件語意結構擷取流程圖...................................22 圖 11 辭典建構流程圖...............................................24 圖 12 語意/語法加註流程圖..........................................25 圖 13 由JAVANLP 解析器剖析句子所得的解析樹狀圖......................26 圖 14 語意/語法加註範例............................................27 圖 15 屬性類正規表示式剖析專利申請範圍範例圖.......................38 圖 16 專利申請範圍之語意結構示意圖.................................43 圖 17 專利:6524176 專利申請範圍:1 (獨立項) .......................44 圖 18 研磨墊上的塞子與孔的結構圖...................................44 圖 19 研磨墊的兩個層與實際的顯微照片對照圖.........................45 圖 20 專利申請範圍之語意結構圖範例.................................46 圖 21 由專業領域辭典產生的本體知識(局部),以OWL 的格式表達.........47 圖 22 元件之間的關係本體知識(局部),以OWL 的格式表達...............48 圖 23 專利申請範圍之發明元件相似度比對流程圖.......................50 圖 24 JACCARD'S COEFFICIENT ...........................................52 圖 25 三元體統計表.................................................53 圖 26 空間關係圖...................................................54 圖 27 生物分類辭典範例.............................................56 圖 28 元件上位三元體相似度比對範例.................................59 圖 29 系統架構圖...................................................91 圖 30 專利申請範圍維護工具.........................................92 圖 31 專利申請範圍中專業詞彙判結果.................................92 圖 32 詞彙詞庫編輯工具.............................................93 圖 33 本體知識建構工具.............................................93 圖 34 正規表示式維護工具...........................................94 圖 35 專利語意結構擷取工具.........................................94 圖 36 正規表示式擷取結果範例.......................................95 圖 37 專利語意結構展示圖工具.......................................96 圖 38 相似度比對輸入介面...........................................97 圖 39 相似度比對輸出介面...........................................97 圖 40 兩篇專利申請範圍的語意結構比對..............................102 表目錄 表 1 專利文件的欄位說明.............................................9 表 2 正規表示式之超字元列表與功能說明..............................29 表 3 用來擷取專利申請範圍語意結構的八類正規表示式..................30 表 4 監控參數表....................................................36 表 5 詞性說明表....................................................65 表 6 用詞性擷取元件的相關問題......................................69 表 7 正規表示式評估項目說明表......................................72 表 8 擷取訓練資料語意結構實驗結果..................................73 表 9 擷取測試資料語意結構實驗結果..................................74 表 10 用正規表示式擷取語意的相關問題...............................74 表 11 相似度比對演算法實驗結果執行結果.............................80 表 12 相似度比對演算法之正確率.....................................87 表 13 相似度比對實驗結果的相關問題探討列表.........................87

    [1] T. Gruber, "Ontolingua:A translation approach to portable ontology
    specifications," Knowledge Acquisition, pp. 199-200, 1993.
    [2] Guarino and Giaretta, "Ontologies and knowledge bases: towards a
    terminological clarification," Towards very large knowledge bases:
    knowledge building and knowledge sharing, pp. 25-32, 1995.
    [3] "Semantic Web," in www.w3.org/DesignIssues/Semantic.html.
    [4] T. Berners-Lee, J. Handler, and O.Lassila, "The Semantic Web,"
    Scientific American, vol. 184, 2001.
    [5] "World Wide Web Consortium," in http://www.w3c.org.
    [6] P. Hayes, "RDF Semantics. W3C Recommendation 10 February 2004," in
    http://www.w3.org/TR/rdf-mt/, 2004.
    [7] P. F. Patel-Schneider, P. Hayes, and I.Horrocks, "OWL Web Ontology
    Language Semantics and Abstract Syntax, W3C Recommendation 10
    February 2004," in http://www.w3.org/TR/owl-semantics/, 2004.
    [8] T. Berners-Lee, "The layered technologies of the Semantic Web," in
    http://www.w3.org/2000/Talks/1206-xml2k-tbl/slide10-0.html.
    [9] D. Brickley and R. V. Guha, "RDF Vocabulary Description Language
    1.0: RDF Schema, W3C Recommendation 10 February 2004," in
    http://www.w3.org/TR/PR-rdf-schema/, 2004.
    [10] V. Haarslev and R. Moller, "RACER system description," presented
    at In Proc. of the Int. Joint Conf. on Automated Reasoning
    (IJCAR'2001), 2001.
    [11] I. Horrocks, "Using an expressive description logic: FaCT or
    fiction?" presented at In Proc. of the 6th Int. Conf. on Principles
    of Knowledge Representation and Reasoning, 2001.
    [12] 夏文龍, "專利對產業界的價值," 智慧財產權管理, pp. 20-21, 1998.
    [13] 陳黛君, "專利管理重點概論," presented at 科專成果運用推動計畫-
    提升學界科專成果運用效益輔導課程, 台北, 台灣, 2005.
    [14] 車慧中, "由專利價值分析評估加值運用策略," presented at 科專成果
    運用推動計畫-提升學界科專成果運用效益輔導課程, 台北, 台灣,
    2005.
    [15] 車慧中, "IP 管理關鍵考量 - 電子機械類案例解析與全程輔導,"
    presented at 科專成果運用推動計畫-提升學界科專成果運用效益輔導
    課程, 台北, 台灣, 2005.
    [16] 王世仁, 專利工程導論. 臺北市: 俊傑, 2002.
    [17] 黃玲淑, "IP 管理關鍵考量 - 生醫化學類案例解析全程輔導," presented
    at 科專成果運用推動計畫-提升學界科專成果運用效益輔導課程, 台北,
    台灣, 2005.
    [18] 陳達仁 and 黃慕萱, 專利資訊與專利檢索. 臺北市: 文華圖書館管理,
    2002.
    [19] 楊慶昌, "技術成果商業化," presented at 科專成果運用推動計畫-提升
    學界科專成果運用效益輔導課程, 台北, 台灣, 2005.
    [20] A. Fujii and T. Ishikawa, "Document Structure Analysis in
    Associative Patent Retrieval," presented at NTCIR Workshop 4, Tokyo,
    Japan, 2004.
    [21] L. Chen, N. Tokuda, and H. Adachi, "A patent document retrieval
    system addressing both semantic and syntactic properties,"
    presented at Proceedings of ACL Workshop on Patent Corpus
    Processing, Sapporo, Japan, 2003.
    [22] 蔡明義, 蔡志成, and 蔡明蒔, "應用田口法於晶片化學機械平坦化製程
    參數之實驗探討," presented at 中國機械工程學會第十五屆學術研討會
    論文集(V)-新興工程技術, 台灣新竹, 1999.
    [23] 黃允良, "金屬化學機械平坦化之終點監測," in 動力機械工程學系, vol.
    博士論文. 台灣新竹: 國立清華大學, 2002.
    [24] D. Klein and C. Manning, "Fast Exact Inference with a Factored Model

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE