研究生: |
劉劭芃 Liu, Shao Pong |
---|---|
論文名稱: |
以通話資料及主觀評論建立惡意電話篩選系統 Establishing a Classifier of Abnormal Phone Calls Based on Call Data and Subjective Comments |
指導教授: | 王茂駿 |
口試委員: |
王茂駿
石裕川 郭建甫 |
學位類別: |
碩士 Master |
系所名稱: |
工學院 - 工業工程與工程管理學系 Department of Industrial Engineering and Engineering Management |
論文出版年: | 2013 |
畢業學年度: | 101 |
語文別: | 中文 |
論文頁數: | 59 |
中文關鍵詞: | 惡意電話篩選 、手機通話資料 、詐騙 、推銷 、騷擾 、用戶主觀評論 |
外文關鍵詞: | Abnormal call filtering, Mobile call data, User’s subjective comments, Different kind of spam calls |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本研究之目的為透過智慧型手機應用程式所蒐集的通話資料以及使用者針對電話的主觀評論,建立一個篩選不同類型電話的分類器。本研究參考專家意見以及評論資料將用戶不願接聽的電話分為詐騙、推銷以及騷擾三個種類,應用使用者主觀評論的意見對各電話進行類別的判斷,搭配通話資料所擷取出來的屬性訓練惡意電話的分類模型。
本研究訓練模型所使用的屬性,參考文獻以及專家意見而建置。讀取原始通話紀錄資料後,編寫程式得以自動擷取、計算出所需的屬性,這些屬性能對通話資料進行更完整的描述,更容易觀察出該電話的行為。
為建立不同類型電話的分類器,本研究針對不同類型的電話進行通話屬性的顯著性檢定及描述性統計,找出詐騙、推銷、騷擾電話異於正常電話的撥打模式及行為,這些模式也作為分類系統建置的參考依據。
本研究所建立之訓練模型,可應用於智慧型手機的應用程式,若來電被預測為惡意電話則提供使用者警示,以作為手機安全機制;該判斷結果可促進主觀評論的蒐集,回饋的主觀評論也能持續的加入訓練資料中以優化系統。
This study used call logs collected from smartphone application system to establish an classifier, to classify the unwanted phone calls including: Fraud, Sales and Trick. Each phone number was classified by user’s subjective comments. And we used the class label and attributes from the call log data to train the classifier, which can differentiate normal calls and abnormal calls.
The attributes we used in the training model were based on the reference and the expert opinions. We wrote a parser to read original call log data, and specified the attributes we need. The attributes we defined can give more detail description to the original data, make it easier to find the behavior mode of the calls.
To find the behavior of each type of calls, various statistical analyses were performed. The analyses results can be applied to establish the classifier for differentiating different type of calls.
The classifier of this study can be applied in mobile phone as a warning mechanism. If a coming call is predicted as abnormal call, then the system will give a warning message. The continuous collection of the user’s subjective comments, can be very helpful to improve the effectiveness of the classifier and the alarming system.
[1] 呂金河(1992)。變異數分析。台北市:三民書局。
[2] 林震岩(2006)。多變量分析 spss的操作與應用。台北市:智勝文化。
[3] 洪漢周(2003),新興詐欺犯罪趨勢與對策研究。國立中央警察大學警學叢刊,第34卷第一期,頁141。
[4] 美國總統經濟報告(The Economic Report of the President ) 網站 http://www.gpo.gov/fdsys/browse/collection.action?collectionCode=ERP
[5] 美國總統經濟報告(The Economic Report of the President ),2009年,第九章
[6] 美國聯邦通信委員會 網站http://www.fcc.gov/guides/unwanted-telephone-marketing-calls
[7] 英國Telephone Preference Service 網站 http://www.tpsonline.org.uk/tps/index.html
[8] 陳順宇(1998)。多變量分析。臺北市 : 華泰書局, 1998。
[9] 維基百科-National Do Not Call List 網站http://en.wikipedia.org/wiki/National_Do_Not_Call_List
[10] 澳大利亞聯邦法(Commonwealth of Australia Law) - Do Not Call Register Act 2006 網站 http://www.comlaw.gov.au/Series/C2006A00088
[11] 澳洲通訊與媒體管理局(Australian Communications and Media Authority) 網站https://www.donotcall.gov.au/
[12] 蕭正松(2010),防制新興詐欺犯罪關鍵成功因素之研究。高雄師範大學成人教育研究所碩士畢業論文。
[13] 警政署165反詐騙專線 官方網站:http://165.gov.tw/work_stat.aspx
[14] Balasubramaniyan, V.A., Ahamad, M., Park, H. (2007). CallRank: Combating SPIT using call duration, social networks and global reputation. in Fourth Conference on Email and Anti-Spam (CEAS 2007).
[15] Barandela, R., Sánchez, J., García, V., Rangel, E. (2003). Strategies for Learning in Class Imbalance Problems. Pattern Recognition, Vol. 36, No. 3, 849–851.
[16] Becker, R., Volinsky, C., Wilks, A. (2010). Fraud Detection in Telecommunications: History and Lessons Learned. Technometrics, Vol. 52, No. 1, 20-33.
[17] Chaisamran, N., Okuda, T., Blanc, G., Yamaguchi, S. (2011). Trust-based voip spam detection based on call duration and human relationships. in: Proc. of the 11th Int. Symp. on Applications and the Internet, SAINT (2011).
[18] Chawla, N.V., Japkowicz, N., Kotcz, A. (2004) Editorial: special issue on learning from imbalanced data sets, ACM SIGKDD Explorations Newsletter, Vol.6, No.1, 1-6.
[19] Dong, Z., Song, G., Xie, K., Sun, Y., Wang, J. (2009). Adequacy of Data for Mining Individual Friendship Pattern from Cellular Phone Call Logs. Fuzzy Systems and Knowledge Discovery, FSKD '09. Vol. 5, 573-577.
[20] Eagle, N., Pentland, A. S., & Lazer, D. (2009). Inferring friendship network structure by using mobile phone data. Proceedings of the National Academy of Sciences, Vol.106, No.36, 15274-15278.
[21] Ghosh, M.(2010). Telecoms fraud. Computer Fraud & Security, Vol.2010, no 7, 14-17.
[22] Gonzalez, M. C., Hidalgo, C. A., & Barabasi, A. L. (2008). Understanding individual human mobility patterns. Nature, Vol.453, No.7196, 779-782.
[23] Grabosky, P.N., Smith,R.G., Wright,P. (1996).Crime and Telecommunications. Trends and Issues in Crime and Criminal Justice, Vol.59, 1-6.
[24] Han, J., Kamber, M., Pei J., (2012). Data Mining: Concepts and Techniques, Third Edition. Morgan Kaufmann Publishers, USA.
[25] Hilas, C.S., Mastorocostas, P.A. (2008). An application of supervised and unsupervised learning approaches to telecommunications fraud detection. Knowledge Based Systems, Vol. 21, No.7, 721-726.
[26] Japkowicz, N. (2000). The class imbalance problem: Significance and strategies. in Proceedings of the 2000 International Conference on Artificial Intelligence (IC-AI’2000).Vol. 1, 111-117.
[27] Jolliffe, I. T. (2010). Principal Component Analysis (2nd ed.). New York: Springer.
[28] Kubat, M., & Matwin, S. (1997). Addressing the curse of imbalanced training sets: one-sided selection. in Machine Learning –International Workshop Then Conference. 179-186.Morgan Kaufmann Publishers, Inc.
[29] Lane, N.D., Miluzzo, E., Hong Lu, Peebles, D. (2010). A survey of mobile phone sensing. Communications Magazine, IEEE, Vol.48, Issue: 9, 140-150.
[30] Longadge, R., Dongre, S. S., Malik, L. (2013).Class Imbalance Problem in Data Mining: Review. International Journal of Computer Science and Network, Vol. 2; Issue: 1, 83-89.
[31] Lopes, J., Belo, O., Vieira, C. (2011).Applying user signatures on fraud detection in telecommunications networks. ICDM'11 Proceedings of the 11th international conference on Advances in data mining: applications and theoretical aspects, 286-299.
[32] Laurila, J. K., Gatica-Perez, D., Aad, I., Blom, J., Bornet, O., Do, T. M. T., ... & Miettinen, M. (2012). The mobile data challenge: Big data for mobile computing research. In Proceedings of the Workshop on the Nokia Mobile Data Challenge, in Conjunction with the 10th International Conference on Pervasive Computing. 1-8.
[33] Lynch, C. (2008). Big data: How do your data grow?. Nature, Vol.455, No.7209, 28-29.
[34] Raine, L., & Wellman, B. (2012) Networked. The new social operationg system. Cambridge: MIT Press.
[35] Schommer, C. (2001). Discovering Fraud Behaviour in Call Detailed Records. Mining your own Business. IBM Redbook series. Vol. 3 Telecommunications.
[36] Seshadri, M., Machiraju, S., Sridharan, A., Bolot, J., Faloutsos, C. (2008). Mobile call graphs: beyond power-law and lognormal distributions. in 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 596-604.
[37] Shover, N., Coffey, G.S., Sanders C.R. (2004). Dialing for Dollars: Opportunities, Justifications, and Telemarketing Fraud, Qualitative Sociology March 2004, Vol.27, Issue: 1, 59-75.
[38] Snijders, C., Matzat, U., Reips, U. (2012) International Journal of Internet Science, 2012, Vol.7, No.1, 1-5.
[39] Stefanowski, J., & Wilk, S. (2006). Rough sets for handling imbalanced data: combining filtering and rule-based classifiers. Fundamenta Informaticae, Vol.72, No.1, 379-391.
[40] Vaz de Melo, P.O.S., Akoglu, L., Faloutsos, C., Loureiro, A.A.F. (2010). Surprising Patterns for the Call Duration Distribution of Mobile Phone Users. Machine Learning and Knowledge Discovery in Databases, Lecture Notes in Computer Science. Vol. 6323, 354-369.
[41] Xu, W., Pang, Y., Ma, J., Wang, S., Hao, G., Zeng, S., Qain, Y. (2008) Fraud detection in telecommunication: a rough fuzzy set based approach. International Conference of Machine Learning and Cybernetics, 1249-1253.
[42] Yang, Q., Wu, X., (2006). 10 challenging problems in data mining research. International Journal of Information Technology & Decision Making. Vol.5, No.4, 597–604.
[43] Zikopoulos, P., & Eaton, C. (2011). Understanding big data: Analytics for enterprise class hadoop and streaming data. McGraw-Hill Osborne Media.