研究生: |
黃川洁 Huang,Chuan Chi |
---|---|
論文名稱: |
De-Identification: 找出最小k-anonymity的高效演算法 De-Identification: Efficient algorithm to find minimal k-anonymity |
指導教授: |
黃之浩
Huang,Chih Hao |
口試委員: |
翁詠祿
易志偉 |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 通訊工程研究所 Communications Engineering |
論文出版年: | 2015 |
畢業學年度: | 104 |
論文頁數: | 32 |
中文關鍵詞: | De-identification 、k-anonymity 、minimal generalization 、generalization |
外文關鍵詞: | De-identification, k-anonymity, minimal generalization, generalization |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在現今世界裡,資料蒐集是一種不斷在進行的行為,也因此個人隱私也越來越重要。因為像是健康或是人口統計的研究,我們會需要公開資料,也因此部分明顯可以辨識身分的屬性需要被移除,像是姓名、病歷號碼。但資料蒐集會不斷進行,因此仍有可能會被其他的資料裡的屬性,像是郵遞區號、性別、年齡重新辨識出受害者,而像這樣的方式我們稱之為連結攻擊。
而防止受害者被重新辨識的方法,稱為De-identification。k-anonymity即為一個有效防止連結攻擊的方法之一。並且利用generalization或suppression來確保每一個受害者都無法從k個裡辨識出來。在此論文中,我們會探討什麼是k-anonymity,並且再依Samarati提出的minimal generalization的定義來找出minimal generalization。我們會介紹由X. SUN 提出的Hash-based algorithm,並且提出一個不僅能改善效能還可以增加彈性的新的演算法來找出minimal generalization以達到k-anonymity。
Abstract
Nowadays, data collection is an ongoing process, so personal privacy becomes more and more important. We need to publish data for purposes such as public health and demographic research. And some attributes which can clearly be used to identify individuals (such as name and medical record ID) are generally removed. Nevertheless, the database that people can sometimes join with other public databases on attributes (such as ZIP code, gender as well as age) to re-identify individuals can be called the “linking" attack.
The way to protect individuals from re-identification is called “de-identification”. k-anonymity is an efficient way to prevent the linking attack by generalization or suppression so that no individual can be uniquely distinguished from a group of size k. In this thesis, we will discuss what k-anonymity is and find the minimal generalization in the sense of [13] , described by Samarati. We will introduce the hash-based algorithm proposed by X. SUN, and propose a much more efficient as well as flexible algorithm that can be used to find the minimal generalization achieving k-anonymity.
[1] R. J. BAYARDO and R. AGRAWAL, Data privacy through optimal k-anonymization, Data Engineering, 2005. ICDE 2005. Proceedings. 21st International Conference on, IEEE, 2005, pp. 217-228.
[2] R. DEWRI, I. RAY and D. WHITLEY, On the Optimal Selection of k in the k-Anonymity Problem, Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on, IEEE, 2008, pp. 1364-1366.
[3] B. FUNG, K. WANG, R. CHEN and P. S. YU, Privacy-preserving data publishing: A survey of recent developments, ACM Computing Surveys (CSUR), 42 (2010), pp. 14.
[4] S. C. HUANG, P.-J. WAN, X. JIA, H. DU and W. SHANG, Minimum-latency broadcast scheduling in wireless ad hoc networks, INFOCOM 2007. 26th IEEE International Conference on Computer Communications. IEEE, IEEE, 2007, pp. 733-739.
[5] S. KIYOMOTO and Y. MIYAKE, How to Find an Appropriate K for K-Anonymization, Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS), 2014 Eighth International Conference on, IEEE, 2014, pp. 273-279.
[6] F. KOHLMAYER, F. PRASSER, C. ECKERT, A. KEMPER and K. KUHN, Flash: efficient, stable and optimal k-anonymity, Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Confernece on Social Computing (SocialCom), IEEE, 2012, pp. 708-717.
[7] K. LEFEVRE, D. J. DEWITT and R. RAMAKRISHNAN, Incognito: Efficient full-domain k-anonymity, Proceedings of the 2005 ACM SIGMOD international conference on Management of data, ACM, 2005, pp. 49-60.
[8] P. SAMARATI, Protecting respondents identities in microdata release, Knowledge and Data Engineering, IEEE Transactions on, 13 (2001), pp. 1010-1027.
[9] P. SAMARATI and L. SWEENEY, Generalizing data to provide anonymity when disclosing information, PODS, 1998, pp. 188.
[10] P. SAMARATI and L. SWEENEY, Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression, Technical report, SRI International, 1998.
[11] X. SUN, M. LI, H. WANG and A. PLANK, An efficient hash-based algorithm for minimal k-anonymity, Proceedings of the thirty-first Australasian conference on Computer science-Volume 74, Australian Computer Society, Inc., 2008, pp. 101-107.
[12] L. SWEENEY, Achieving k-anonymity privacy protection using generalization and suppression, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10 (2002), pp. 571-588.
[13] L. SWEENEY, k-anonymity: A model for protecting privacy, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10 (2002), pp. 557-570.