研究生: |
劉柏菁 Po-Ching Liu |
---|---|
論文名稱: |
從MSN query log分析使用者的查詢需求 Are We Searching for the Same Thing? A Large-Scale Analysis of Search Engine Logs |
指導教授: |
陳宜欣
Yi-Shin Chen |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2007 |
畢業學年度: | 95 |
語文別: | 英文 |
論文頁數: | 46 |
中文關鍵詞: | 語意不清的查詢 、使用者意向 、資訊需求 、使用者目的 、查詢日誌分析 |
外文關鍵詞: | ambiguous query, query intention, information need, user goal, query log analysis |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
輸入相同搜尋字串時,使用者意向(user intention)可能會因人而異,是許多網路搜尋技術所關切的問題,但是實際查詢時是否有如此的狀況,目前並沒有人提出強力的論證;所以本研究的主旨就是要分析使用者的搜尋行為,來了解使用者在搜尋相同字串時的意向歧異度,以作為相關網路搜尋技術的參考。
分析使用者的實際查詢、瀏覽行為是了解使用者意向的重要途徑;本研究取得了微軟研究中心提供的微軟搜尋引擎查詢日誌(MSN query logs)和使用者點擊日誌(click logs),一共兩千多萬筆的實際使用者查詢相關資料來做分析。
我們把使用者查詢後點擊的網頁視為虛擬關聯度回饋(pseudo relevance feedback),從中萃取出與使用者意向相關的資料:查詢目的(user goal)以及資訊需求(information need)。分析結果發現,查詢目的為尋找網站 (Navigational)的搜尋字串,大多有很一致的使用者意向;而查詢目的為蒐集知識 (Informational)及獲取資源 (Resource)的搜尋字串,使用者意向就有較明顯因人而異的狀況;另外我們也提出了可以快速分類查詢目的的自動化方法,由以上研究,可以幫助搜尋引擎決定該用何種技術來處理不同的搜尋字串。
Many researchers have been working on advanced search techniques such as personalization, query reformulation, and collaborative filtering to enhance the quality of the search results. One common motivation addressed in these research works is the ambiguous-query problem. However, few studies have evaluated the ambiguity of query strings. We analyze 15 million queries from MSN search engine logs in this study. Three major questions are addressed: 1) How many people are using the same query string to search? 2) Are they searching for the same thing while using the same query string? 3) Can we distinguish ambiguous and unambiguous query strings?
[1] S. M. Beitzel, E. C. Jensen, A. Chowdhury, O. Frieder, and D. Grossman. Temporal analysis of a very large topically categorized web query log. J. Am. Soc. Inf. Sci. Technol., 58(2):166–178, 2007.
[2] S. M. Beitzel, E. C. Jensen, A. Chowdhury, D. Grossman, and O. Frieder. Hourly analysis of a very large topically categorized web query log. In SIGIR ’04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 321–328, New York, NY, USA, 2004. ACM Press.
[3] A. Broder. A taxonomy of web search. SIGIR Forum, 36(2):3–10, 2002.
[4] B. J. Jansen, D. L. Booth, and A. Spink. Determining the user intent of web search engine queries. In WWW ’07: Proceedings of the 16th international conference on World Wide Web, pages 1149–1150, New York, NY, USA, 2007. ACM Press.
[5] B. J. Jansen and A. Spink. How are we searching the web? A comparison of nine search engine query logs. In Information Processing and Management, volume 42, 2006.
[6] T. Joachims. Optimizing search engines using clickthrough data. In KDD ’02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 133–142, New York, NY, USA, 2002. ACM Press.
[7] T. Joachims, L. Granka, B. Pan, H. Hembrooke, and G. Gay. Accurately interpreting clickthrough data as implicit feedback. In SIGIR ’05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 154–161, New York, NY, USA, 2005. ACM Press.
[8] C. C. J. E. L. Kwan Yi, Jamshid Beheshti and A. Large. User search behavior of domain-specific information retrieval systems: An analysis of the query logs from psycinfo and abc-clios historical abstracts/america: History and life. THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 57(9):1208–1220, 2006.
[9] U. Lee, Z. Liu, and J. Cho. Automatic identification of user goals in web search. In WWW ’05: Proceedings of the 14th international conference on World Wide Web, pages 391–400, New York, NY, USA, 2005. ACM Press.
[10] M. Mitra, A. Singhal, and C. Buckley. Improving automatic query expansion. In SIGIR ’98: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 206–214, New York, NY, USA, 1998. ACM Press.
[11] A. Pretschner and S. Gauch. Ontology based personalized search. In ICTAI ’99: Proceedings of the 11th IEEE International Conference on Tools with Artificial Intelligence, page 391, Washington, DC, USA, 1999. IEEE Computer Society.
[12] F. Radlinski and T. Joachims. Query chains: learning to rank from implicit feedback. In KDD ’05: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 239–248, New York, NY, USA, 2005. ACM Press.
[13] D. E. Rose and D. Levinson. Understanding user goals in web search. In WWW ’04: Proceedings of the 13th international conference on World Wide Web, pages 13–19, New York, NY, USA, 2004. ACM Press.
[14] D. Shen, J.-T. Sun, Q. Yang, and Z. Chen. A comparison of implicit and explicit links for web page classification. In WWW’06: Proceedings of the 15th international conference on World Wide Web, pages 643–650, New York, NY, USA, 2006. ACM Press.
[15] C. Silverstein, H. Marais, M. Henzinger, and M. Moricz. Analysis of a very large web search engine query log. SIGIR Forum, 33(1):6–12, 1999.
[16] M. Speretta and S. Gauch. Personalized search based on user search histories. In WI ’05: Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, pages 622–628, Washington, DC, USA, 2005. IEEE Computer Society.
[17] A. Spink and B. J. Jansen. A study of web search trends. Webology, 2004.
[18] O. S. O. H. C. . J. J. Spink, A. Us versus european web searching trends. ACM SIGIR Forum, 2002.
[19] N. Stojanovic, R. Studer, and L. Stojanovic. An approach for step-by-step query refinement in the ontology-based information retrieval. In WI ’04: Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence, pages 36–43, Washington, DC, USA, 2004. IEEE Computer Society.
[20] J. Teevan, S. T. Dumais, and E. Horvitz. Personalizing search via automated analysis of interests and activities. In SIGIR ’05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 449–456, New York, NY, USA, 2005. ACM Press.
[21] G. Q. Tingting He, Xinhui Tu and D. JI. Chinese query expansion based on related term group. In 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering. IEEE NLP-KE ’05., 2005.
[22] S. Wedig and O. Madani. A large-scale analysis of query logs for assessing personalization opportunities. In KDD ’06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 742–747, New York, NY, USA, 2006. ACM Press.