基於Transformer架構來理解Android 應用程序中對行為保真度級別之描述

簡易檢索 / 詳目顯示

回結果列表

研究生：	李娜 Shubhranshi Kapoor
論文名稱：	基於Transformer架構來理解Android 應用程序中對行為保真度級別之描述 Utilizing Transformer-based Architecture to Understand the Description to Behavior Fidelity Level in Android Applications
指導教授：	孫宏民 Sun, Hung-Min
口試委員:	許富皓 Hsu, Fu-Hau 黃育綸 Huang, Yu-Lun
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊安全研究所 Institute of Information Security
論文出版年：	2022
畢業學年度：	110
語文別：	英文
論文頁數：	64
中文關鍵詞：	多重標籤文本分類、手機應用、Android安全、危險允許、Transformer架構、深度學習
外文關鍵詞：	Multi-label Text Classification, Mobile Application, Android Security, Dangerous Permissions, Transformer Architecture, Deep Learning
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著智慧型手機的使用在這十年中大幅增加，讓用戶了解手機應用程式中暴
露的數據隱私觀念變得相當重要。手機應用程式收集、使用和共享的有關應用
程式功能和用戶數據訊息提及在數據字段中。由於缺乏隱私意識或對該領域的
了解有限，大多數用戶在下載任何應用程式之前不會閱讀信息。此外，用戶缺
乏有效的方法來查找給定類別中的安全應用。根據之前的研究，分析手機應用
程式的數據有助於評估其權限使用情況。本研究的目的是通過分析手機應用程
式的數據計算，並描述數據行為正確性級別，以確保開發人員描述該應用程式
請求的權限使用之間的差距。在這項研究中，從Google Play 商店收集了126,279
個移動應用程式的數據，並訓練了Transformer 架構的BERT 和XLNet 模型來預
測危險的權限使用。XLNet 和BERT 模型的總體準確率分別為92.4% 和91.0%。
對於這個多重標籤文本分類任務，使用基於深度學習的Text CNN 模型作為基礎
模型，準確率可達到90.6%。預測權限與實際權限之間的相關性用於確保數據正
確性級別。

With the prevalence of smartphones and applications for all aspects of daily
life, it is crucial to make users aware about their data exposed in mobile applications. The information about applications functionality and user data which the mobile app collects, uses and shares is mentioned in the app’s metadata fields. Most users do not read the information before downloading an application due to lack of privacy awareness or their limited knowledge of technology. There is a lack of efficient methods for users to find secure apps in a given category. Previous studies found that analyzing mobile apps’ metadata can help evaluate their permission usage. This research calculates the description to behavior fidelity level by analyzing the metadata of mobile apps to determine the gap between developer's descriptions and the permission usage requested by that application. Metadata of 126,279 mobile applications are collected from Google Play Store and transformer architecture-based BERT and XLNet model are trained to predict dangerous permission usage. XLNet and BERT model give overall accuracy of 92.4% and 91.0% respectively. For this multi-label text classification task, a deep learning based Text CNN model is used as a baseline model, which gives an accuracy of 90.6%. The correlation between the predicted permissions and actual permissions is used to determine the fidelity level.

Abstract (Chinese) I
Abstract II
Acknowledgements III
Contents IV
List of Figures VII
List of Tables IX
Introduction 1
Background 6
1 Android Dangerous Permissions . . . . . . . . . . . . . . . . . . . . 6
2 CNN for Text Classification . . . . . . . . . . . . . . . . . . . . . . 7
3 Transformer based models . . . . . . . . . . . . . . . . . . . . . . . 9
3.1 Bidirectional Encoder Representations from Transformers
(BERT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 XLNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Related Work 15
1 Android App Metadata . . . . . . . . . . . . . . . . . . . . . . . . . 15
2 Description to Behaviour Fidelity . . . . . . . . . . . . . . . . . . . 16
3 Deep Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . 17
4 Class Imbalance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Proposed Methodology 19
1 Data Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.1 Description Preprocessing . . . . . . . . . . . . . . . . . . . 26
2.2 Permissions Preprocessing . . . . . . . . . . . . . . . . . . . 27
3 Tokenization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1 Tokenizer with GloVe word embeddings . . . . . . . . . . . . 29
3.2 BERT Tokenizer . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3 XLNet Tokenizer . . . . . . . . . . . . . . . . . . . . . . . . 31
4 Training the Models . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1 Features & Labels . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2 Text CNN Model . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3 BERT Model . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.4 XLNet Model . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.5 Handling Class Imbalance . . . . . . . . . . . . . . . . . . . 36
5 Fidelity Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Experiment result 40
1 Experiment Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2 Performance Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3 Evaluation on Test data set . . . . . . . . . . . . . . . . . . . . . . 43
3.1 Text CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2 BERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3 XLNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.4 Model Comparison . . . . . . . . . . . . . . . . . . . . . . . 49
4 Evaluation on AC-Net data set . . . . . . . . . . . . . . . . . . . . 52
5 Fidelity Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Conclusion 58
1 Limitations & Challenges . . . . . . . . . . . . . . . . . . . . . . . . 59
2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Bibliography 60
                                

[1] Google-play-scraper. URL: https://pypi.org/project/
google-play-scraper/.
[2] Langdetect · spacy universe. URL: https://spacy.io/universe/project/
spacy-langdetect.
[3] Sklearn.metrics.f1 score. URL: https://scikit-learn.org/stable/
modules/generated/sklearn.metrics.f1_score.html.
[4] Sklearn.metrics.roc auc score. URL: https://scikit-learn.org/stable/
modules/generated/sklearn.metrics.roc_auc_score.html.
[5] Welcome to colaboratory. URL: https://colab.research.google.com/
?utm_source=scs-index.
[6] Mcafee mobile threat report, 2019. URL: https://www.mcafee.com/
enterprise/en-us/assets/reports/rp-mobile-threat-report-2019.
pdf.
[7] Change app permissions on your android phone, 2022. URL:
https://support.google.com/googleplay/answer/9431959?hl=en&
amp;ref_topic=2450444#zippy=%2Ctypes-of-permissions.
[8] Permissions on android : android developers, 2022. URL: https:
//developer.android.com/guide/topics/permissions/overview#
perm-groups.
[9] Huseyin Alecakir, Burcu Can, and Sevil Sen. Attention: there is an inconsistency
between android permissions and application metadata! International
Journal of Information Security, 20(6):797–815, 2021.
[10] Maxim Anikeev, Haya Shulman, and Hervais Simo. Privacy policies of mobile
apps-a usability study. In IEEE INFOCOM 2021-IEEE Conference on Computer
Communications Workshops (INFOCOM WKSHPS), pages 1–2. IEEE,
2021.
[11] Anshul Arora, Sateesh K Peddoju, and Mauro Conti. Permpair: Android
malware detection using permission pairs. IEEE Transactions on Information
Forensics and Security, 15:1968–1982, 2019.
[12] Rawan Baalous and Ronald Poet. How dangerous permissions are described
in android apps’ privacy policies? In Proceedings of the 11th International
Conference on Security of Information and Networks, pages 1–2, 2018.
[13] Rawan Baalous and Ronald Poet. Utilizing sentence embedding for dangerous
permissions detection in android apps’ privacy policies. International Journal
of Information Security and Privacy (IJISP), 15(1):173–189, 2021.
[14] Published by L. Ceci. Annual number of mobile app downloads worldwide
2021, Jan 2022. URL: https://www.statista.com/statistics/271644/
worldwide-free-and-paid-mobile-app-store-downloads/.
[15] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert:
Pre-training of deep bidirectional transformers for language understanding.
arXiv preprint arXiv:1810.04805, 2018.
[16] Johannes Feichtner and Stefan Gruber. Understanding privacy awareness in
android app descriptions using deep learning. In Proceedings of the Tenth
ACM Conference on Data and Application Security and Privacy, pages 203–
214, 2020.
[17] Yinglan Feng, Liang Chen, Angyu Zheng, Cuiyun Gao, and Zibin Zheng. Acnet:
Assessing the consistency of description and permission in android apps.
IEEE Access, 7:57829–57842, 2019.
[18] Alessandra Gorla, Ilaria Tavecchia, Florian Gross, and Andreas Zeller. Checking
app behavior against app descriptions. In Proceedings of the 36th international
conference on software engineering, pages 1025–1035, 2014.
[19] Joseph Johnson. Global android malware volume 2020, Jan
2021. URL: https://www.statista.com/statistics/680705/
global-android-malware-volume/, journal=Statista.
[20] Yoon Kim. Convolutional neural networks for sentence classification. CoRR,
abs/1408.5882, 2014.
[21] Shervin Minaee, Nal Kalchbrenner, Erik Cambria, Narjes Nikzad, Meysam
Chenaghlu, and Jianfeng Gao. Deep learning–based text classification: a
comprehensive review. ACM Computing Surveys (CSUR), 54(3):1–40, 2021.
[22] Rahul Pandita, Xusheng Xiao, Wei Yang, William Enck, and Tao Xie.
{WHYPER}: Towards automating risk assessment of mobile applications. In
22nd USENIX Security Symposium (USENIX Security 13), pages 527–542,
2013.
[23] Jeffrey Pennington, Richard Socher, and Christopher D. Manning. Glove:
Global vectors for word representation. In Empirical Methods in Natural
Language Processing (EMNLP), pages 1532–1543, 2014.
[24] Gautham Prakash. Google play store apps, Jun 2021. URL: https://www.
kaggle.com/datasets/gauthamp10/google-playstore-apps.
[25] Zhengyang Qu, Vaibhav Rastogi, Xinyi Zhang, Yan Chen, Tiantian Zhu, and
Zhong Chen. Autocog: Measuring the description-to-permission fidelity in
android applications. In Proceedings of the 2014 ACM SIGSAC Conference
on Computer and Communications Security, pages 1354–1365, 2014.
[26] Sevil Sen and Burcu Can. Android security using nlp techniques: A review.
arXiv preprint arXiv:2107.03072, 2021.
[27] Irina Shklovski, Scott D Mainwaring, Halla Hrund Sk´ulad´ottir, and H¨oskuldur
Borgthorsson. Leakiness and creepiness in app space: Perceptions of privacy
and mobile app use. In Proceedings of the SIGCHI Conference on Human
Factors in Computing Systems, pages 2347–2356, 2014.
[28] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you
need. Advances in neural information processing systems, 30, 2017.
[29] Takuya Watanabe, Mitsuaki Akiyama, Tetsuya Sakai, and Tatsuya Mori.
Understanding the inconsistencies between text descriptions and the use of
privacy-sensitive resources of mobile apps. In Eleventh Symposium On Usable
Privacy and Security (SOUPS 2015), pages 241–255, 2015.
[30] Primal Wijesekera, Arjun Baokar, Ashkan Hosseini, Serge Egelman, David
Wagner, and Konstantin Beznosov. Android permissions remystified: A field
study on contextual integrity. In 24th USENIX Security Symposium (USENIX
Security 15), pages 499–514, 2015.
[31] Zhiqiang Wu, Xin Chen, and Scott Uk-Jin Lee. Fcdp: Fidelity calculation for
description-to-permissions in android apps. IEEE Access, 9:1062–1075, 2020.
[32] Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov,
and Quoc V Le. Xlnet: Generalized autoregressive pretraining for
language understanding. Advances in neural information processing systems,
32, 2019.

簡易檢索 / 詳目顯示

相關論文