研究生: |
林士能 Shih-Neng Lin |
---|---|
論文名稱: |
專利文件語意之擷取與比對 Semantic Information Extraction and Comparison for Patent Documents |
指導教授: |
蘇豐文
Von-Wun Soo |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2005 |
畢業學年度: | 93 |
語文別: | 中文 |
論文頁數: | 159 |
中文關鍵詞: | 專利申請範圍 、語意結構 、正規表示式 、相似度 |
外文關鍵詞: | claim, semantic structure, regular expression, similarity |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在21世紀,智慧財資產已經成為經濟競爭力的關鍵,而專利文件能在法律上有效保障智慧財資產。然而,由於專利文件的數量呈爆發性的成長,專利申請範圍(claim)的格式也變化多端,現有的專利檢索、分析、比對技術面臨了嚴重的瓶頸。
在我的論文中,提出了一個專利申請範圍語意結構(semantic structure)擷取與比對的方法。在專利文件語意結構擷取方面,若使用者選定一領域的某些專利文件進行剖析,使用者必須先建立專業領域辭庫與該領域的本體知識(ontology)以利系統做進一步的分析。接著,系統利用自然語言處理技術剖析和註記(annotate)每一條專利申請範圍,並且用正規表示式擷取出重要的資訊。最後,這些資訊便被轉換成為機讀式格式(machine readable format)的語意結構,XML和OWL,以便加速知識分享和知識推理,並可透過圖形化方式呈現此結構。
在專利文件比對方面,使用者以某一篇專利為基礎,並從它的語意結構中選擇所要查詢的發明元件後,系統將執行相似度比對演算法(similarity algorithm),找出它篇專利中是否存在相似度高的發明元件。此演算法會考慮元件的語意註記資訊、元件的結構、元件的屬性來計算兩元件的相似度。
經由實驗驗證得知,此方法能有效的剖析和擷取專利申請範圍的語意內容,並利用圖形化的方式呈現,協助使用者了解專利申請範圍的重點。另外,相似度比對演算法大致上也能找出類似的發明元件,提供使用者另一種異於關鍵字搜尋的專利檢索方法。
In the 21st century, intelligient property has become the key factor of competitive strength of the global economy, and patent document can protect the intelligient property in law effectively. Unfortunately, because of the explosion in patent documents and the claim formats are full of variety, the technologies of patent documents in search、analysis、and comparison have faced some serious bottleneck.
In my thesis, I propose an approach to extract the semantic structure of claim and compare their difference on the basis of semantic structures. In the aspect of semantic structure extraction, if the user chooses some patents of the same domain for the patent system to analyze, they need to construct the domain thesaurus and ontology first for further parsing. Second, the system will parse and annotate every claim by natural language processing technics and extract the important information from claims by regular expression. At last, this information is translated into semantic structure in machine readable formats, XML and OWL, for speeding the knowledge sharing and knowledge inference and can be displayed in graph.
In the aspect of claim comparison, when the user chooses an invention component in the semantic structure of a patent to query, the system will execute the similarity algorithm to see if there are some similar components exist in other patents.
The algorithm will take two components's semantic annotation、semanitc structure、 and attributes into consideration to calculate the similarity of the two components.
The experimental results show that the approach can effectively parse and extract the semantic content of claims, and assist users to understand the the focal
point of claims by GUI environment. On the other hand, the similarity algorithm can find the similar invention components substantially and provides a different way from
keyword search to search patent.
[1] T. Gruber, "Ontolingua:A translation approach to portable ontology
specifications," Knowledge Acquisition, pp. 199-200, 1993.
[2] Guarino and Giaretta, "Ontologies and knowledge bases: towards a
terminological clarification," Towards very large knowledge bases:
knowledge building and knowledge sharing, pp. 25-32, 1995.
[3] "Semantic Web," in www.w3.org/DesignIssues/Semantic.html.
[4] T. Berners-Lee, J. Handler, and O.Lassila, "The Semantic Web,"
Scientific American, vol. 184, 2001.
[5] "World Wide Web Consortium," in http://www.w3c.org.
[6] P. Hayes, "RDF Semantics. W3C Recommendation 10 February 2004," in
http://www.w3.org/TR/rdf-mt/, 2004.
[7] P. F. Patel-Schneider, P. Hayes, and I.Horrocks, "OWL Web Ontology
Language Semantics and Abstract Syntax, W3C Recommendation 10
February 2004," in http://www.w3.org/TR/owl-semantics/, 2004.
[8] T. Berners-Lee, "The layered technologies of the Semantic Web," in
http://www.w3.org/2000/Talks/1206-xml2k-tbl/slide10-0.html.
[9] D. Brickley and R. V. Guha, "RDF Vocabulary Description Language
1.0: RDF Schema, W3C Recommendation 10 February 2004," in
http://www.w3.org/TR/PR-rdf-schema/, 2004.
[10] V. Haarslev and R. Moller, "RACER system description," presented
at In Proc. of the Int. Joint Conf. on Automated Reasoning
(IJCAR'2001), 2001.
[11] I. Horrocks, "Using an expressive description logic: FaCT or
fiction?" presented at In Proc. of the 6th Int. Conf. on Principles
of Knowledge Representation and Reasoning, 2001.
[12] 夏文龍, "專利對產業界的價值," 智慧財產權管理, pp. 20-21, 1998.
[13] 陳黛君, "專利管理重點概論," presented at 科專成果運用推動計畫-
提升學界科專成果運用效益輔導課程, 台北, 台灣, 2005.
[14] 車慧中, "由專利價值分析評估加值運用策略," presented at 科專成果
運用推動計畫-提升學界科專成果運用效益輔導課程, 台北, 台灣,
2005.
[15] 車慧中, "IP 管理關鍵考量 - 電子機械類案例解析與全程輔導,"
presented at 科專成果運用推動計畫-提升學界科專成果運用效益輔導
課程, 台北, 台灣, 2005.
[16] 王世仁, 專利工程導論. 臺北市: 俊傑, 2002.
[17] 黃玲淑, "IP 管理關鍵考量 - 生醫化學類案例解析全程輔導," presented
at 科專成果運用推動計畫-提升學界科專成果運用效益輔導課程, 台北,
台灣, 2005.
[18] 陳達仁 and 黃慕萱, 專利資訊與專利檢索. 臺北市: 文華圖書館管理,
2002.
[19] 楊慶昌, "技術成果商業化," presented at 科專成果運用推動計畫-提升
學界科專成果運用效益輔導課程, 台北, 台灣, 2005.
[20] A. Fujii and T. Ishikawa, "Document Structure Analysis in
Associative Patent Retrieval," presented at NTCIR Workshop 4, Tokyo,
Japan, 2004.
[21] L. Chen, N. Tokuda, and H. Adachi, "A patent document retrieval
system addressing both semantic and syntactic properties,"
presented at Proceedings of ACL Workshop on Patent Corpus
Processing, Sapporo, Japan, 2003.
[22] 蔡明義, 蔡志成, and 蔡明蒔, "應用田口法於晶片化學機械平坦化製程
參數之實驗探討," presented at 中國機械工程學會第十五屆學術研討會
論文集(V)-新興工程技術, 台灣新竹, 1999.
[23] 黃允良, "金屬化學機械平坦化之終點監測," in 動力機械工程學系, vol.
博士論文. 台灣新竹: 國立清華大學, 2002.
[24] D. Klein and C. Manning, "Fast Exact Inference with a Factored Model