簡易檢索 / 詳目顯示

研究生: 李 娜
Shubhranshi Kapoor
論文名稱: 基於Transformer架構來理解Android 應用程序中對行為保真度級別之描述
Utilizing Transformer-based Architecture to Understand the Description to Behavior Fidelity Level in Android Applications
指導教授: 孫宏民
Sun, Hung-Min
口試委員: 許富皓
Hsu, Fu-Hau
黃育綸
Huang, Yu-Lun
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊安全研究所
Institute of Information Security
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 64
中文關鍵詞: 多重標籤文本分類手機應用Android安全危險允許Transformer架構深度學習
外文關鍵詞: Multi-label Text Classification, Mobile Application, Android Security, Dangerous Permissions, Transformer Architecture, Deep Learning
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著智慧型手機的使用在這十年中大幅增加,讓用戶了解手機應用程式中暴
    露的數據隱私觀念變得相當重要。手機應用程式收集、使用和共享的有關應用
    程式功能和用戶數據訊息提及在數據字段中。由於缺乏隱私意識或對該領域的
    了解有限,大多數用戶在下載任何應用程式之前不會閱讀信息。此外,用戶缺
    乏有效的方法來查找給定類別中的安全應用。根據之前的研究,分析手機應用
    程式的數據有助於評估其權限使用情況。本研究的目的是通過分析手機應用程
    式的數據計算,並描述數據行為正確性級別,以確保開發人員描述該應用程式
    請求的權限使用之間的差距。在這項研究中,從Google Play 商店收集了126,279
    個移動應用程式的數據,並訓練了Transformer 架構的BERT 和XLNet 模型來預
    測危險的權限使用。XLNet 和BERT 模型的總體準確率分別為92.4% 和91.0%。
    對於這個多重標籤文本分類任務,使用基於深度學習的Text CNN 模型作為基礎
    模型,準確率可達到90.6%。預測權限與實際權限之間的相關性用於確保數據正
    確性級別。


    With the prevalence of smartphones and applications for all aspects of daily
    life, it is crucial to make users aware about their data exposed in mobile applications. The information about applications functionality and user data which the mobile app collects, uses and shares is mentioned in the app’s metadata fields. Most users do not read the information before downloading an application due to lack of privacy awareness or their limited knowledge of technology. There is a lack of efficient methods for users to find secure apps in a given category. Previous studies found that analyzing mobile apps’ metadata can help evaluate their permission usage. This research calculates the description to behavior fidelity level by analyzing the metadata of mobile apps to determine the gap between developer's descriptions and the permission usage requested by that application. Metadata of 126,279 mobile applications are collected from Google Play Store and transformer architecture-based BERT and XLNet model are trained to predict dangerous permission usage. XLNet and BERT model give overall accuracy of 92.4% and 91.0% respectively. For this multi-label text classification task, a deep learning based Text CNN model is used as a baseline model, which gives an accuracy of 90.6%. The correlation between the predicted permissions and actual permissions is used to determine the fidelity level.

    Abstract (Chinese) I Abstract II Acknowledgements III Contents IV List of Figures VII List of Tables IX 1 Introduction 1 2 Background 6 2.1 Android Dangerous Permissions . . . . . . . . . . . . . . . . . . . . 6 2.2 CNN for Text Classification . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Transformer based models . . . . . . . . . . . . . . . . . . . . . . . 9 2.3.1 Bidirectional Encoder Representations from Transformers (BERT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.2 XLNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3 Related Work 15 3.1 Android App Metadata . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2 Description to Behaviour Fidelity . . . . . . . . . . . . . . . . . . . 16 3.3 Deep Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . 17 3.4 Class Imbalance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4 Proposed Methodology 19 4.1 Data Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2.1 Description Preprocessing . . . . . . . . . . . . . . . . . . . 26 4.2.2 Permissions Preprocessing . . . . . . . . . . . . . . . . . . . 27 4.3 Tokenization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.3.1 Tokenizer with GloVe word embeddings . . . . . . . . . . . . 29 4.3.2 BERT Tokenizer . . . . . . . . . . . . . . . . . . . . . . . . 29 4.3.3 XLNet Tokenizer . . . . . . . . . . . . . . . . . . . . . . . . 31 4.4 Training the Models . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.4.1 Features & Labels . . . . . . . . . . . . . . . . . . . . . . . . 32 4.4.2 Text CNN Model . . . . . . . . . . . . . . . . . . . . . . . . 32 4.4.3 BERT Model . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.4.4 XLNet Model . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.4.5 Handling Class Imbalance . . . . . . . . . . . . . . . . . . . 36 4.5 Fidelity Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5 Experiment result 40 5.1 Experiment Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.2 Performance Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.3 Evaluation on Test data set . . . . . . . . . . . . . . . . . . . . . . 43 5.3.1 Text CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.3.2 BERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.3.3 XLNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.3.4 Model Comparison . . . . . . . . . . . . . . . . . . . . . . . 49 5.4 Evaluation on AC-Net data set . . . . . . . . . . . . . . . . . . . . 52 5.5 Fidelity Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 6 Conclusion 58 6.1 Limitations & Challenges . . . . . . . . . . . . . . . . . . . . . . . . 59 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Bibliography 60

    [1] Google-play-scraper. URL: https://pypi.org/project/
    google-play-scraper/.
    [2] Langdetect · spacy universe. URL: https://spacy.io/universe/project/
    spacy-langdetect.
    [3] Sklearn.metrics.f1 score. URL: https://scikit-learn.org/stable/
    modules/generated/sklearn.metrics.f1_score.html.
    [4] Sklearn.metrics.roc auc score. URL: https://scikit-learn.org/stable/
    modules/generated/sklearn.metrics.roc_auc_score.html.
    [5] Welcome to colaboratory. URL: https://colab.research.google.com/
    ?utm_source=scs-index.
    [6] Mcafee mobile threat report, 2019. URL: https://www.mcafee.com/
    enterprise/en-us/assets/reports/rp-mobile-threat-report-2019.
    pdf.
    [7] Change app permissions on your android phone, 2022. URL:
    https://support.google.com/googleplay/answer/9431959?hl=en&
    amp;ref_topic=2450444#zippy=%2Ctypes-of-permissions.
    [8] Permissions on android : android developers, 2022. URL: https:
    //developer.android.com/guide/topics/permissions/overview#
    perm-groups.
    [9] Huseyin Alecakir, Burcu Can, and Sevil Sen. Attention: there is an inconsistency
    between android permissions and application metadata! International
    Journal of Information Security, 20(6):797–815, 2021.
    [10] Maxim Anikeev, Haya Shulman, and Hervais Simo. Privacy policies of mobile
    apps-a usability study. In IEEE INFOCOM 2021-IEEE Conference on Computer
    Communications Workshops (INFOCOM WKSHPS), pages 1–2. IEEE,
    2021.
    [11] Anshul Arora, Sateesh K Peddoju, and Mauro Conti. Permpair: Android
    malware detection using permission pairs. IEEE Transactions on Information
    Forensics and Security, 15:1968–1982, 2019.
    [12] Rawan Baalous and Ronald Poet. How dangerous permissions are described
    in android apps’ privacy policies? In Proceedings of the 11th International
    Conference on Security of Information and Networks, pages 1–2, 2018.
    [13] Rawan Baalous and Ronald Poet. Utilizing sentence embedding for dangerous
    permissions detection in android apps’ privacy policies. International Journal
    of Information Security and Privacy (IJISP), 15(1):173–189, 2021.
    [14] Published by L. Ceci. Annual number of mobile app downloads worldwide
    2021, Jan 2022. URL: https://www.statista.com/statistics/271644/
    worldwide-free-and-paid-mobile-app-store-downloads/.
    [15] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert:
    Pre-training of deep bidirectional transformers for language understanding.
    arXiv preprint arXiv:1810.04805, 2018.
    [16] Johannes Feichtner and Stefan Gruber. Understanding privacy awareness in
    android app descriptions using deep learning. In Proceedings of the Tenth
    ACM Conference on Data and Application Security and Privacy, pages 203–
    214, 2020.
    [17] Yinglan Feng, Liang Chen, Angyu Zheng, Cuiyun Gao, and Zibin Zheng. Acnet:
    Assessing the consistency of description and permission in android apps.
    IEEE Access, 7:57829–57842, 2019.
    [18] Alessandra Gorla, Ilaria Tavecchia, Florian Gross, and Andreas Zeller. Checking
    app behavior against app descriptions. In Proceedings of the 36th international
    conference on software engineering, pages 1025–1035, 2014.
    [19] Joseph Johnson. Global android malware volume 2020, Jan
    2021. URL: https://www.statista.com/statistics/680705/
    global-android-malware-volume/, journal=Statista.
    [20] Yoon Kim. Convolutional neural networks for sentence classification. CoRR,
    abs/1408.5882, 2014.
    [21] Shervin Minaee, Nal Kalchbrenner, Erik Cambria, Narjes Nikzad, Meysam
    Chenaghlu, and Jianfeng Gao. Deep learning–based text classification: a
    comprehensive review. ACM Computing Surveys (CSUR), 54(3):1–40, 2021.
    [22] Rahul Pandita, Xusheng Xiao, Wei Yang, William Enck, and Tao Xie.
    {WHYPER}: Towards automating risk assessment of mobile applications. In
    22nd USENIX Security Symposium (USENIX Security 13), pages 527–542,
    2013.
    [23] Jeffrey Pennington, Richard Socher, and Christopher D. Manning. Glove:
    Global vectors for word representation. In Empirical Methods in Natural
    Language Processing (EMNLP), pages 1532–1543, 2014.
    [24] Gautham Prakash. Google play store apps, Jun 2021. URL: https://www.
    kaggle.com/datasets/gauthamp10/google-playstore-apps.
    [25] Zhengyang Qu, Vaibhav Rastogi, Xinyi Zhang, Yan Chen, Tiantian Zhu, and
    Zhong Chen. Autocog: Measuring the description-to-permission fidelity in
    android applications. In Proceedings of the 2014 ACM SIGSAC Conference
    on Computer and Communications Security, pages 1354–1365, 2014.
    [26] Sevil Sen and Burcu Can. Android security using nlp techniques: A review.
    arXiv preprint arXiv:2107.03072, 2021.
    [27] Irina Shklovski, Scott D Mainwaring, Halla Hrund Sk´ulad´ottir, and H¨oskuldur
    Borgthorsson. Leakiness and creepiness in app space: Perceptions of privacy
    and mobile app use. In Proceedings of the SIGCHI Conference on Human
    Factors in Computing Systems, pages 2347–2356, 2014.
    [28] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
    Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you
    need. Advances in neural information processing systems, 30, 2017.
    [29] Takuya Watanabe, Mitsuaki Akiyama, Tetsuya Sakai, and Tatsuya Mori.
    Understanding the inconsistencies between text descriptions and the use of
    privacy-sensitive resources of mobile apps. In Eleventh Symposium On Usable
    Privacy and Security (SOUPS 2015), pages 241–255, 2015.
    [30] Primal Wijesekera, Arjun Baokar, Ashkan Hosseini, Serge Egelman, David
    Wagner, and Konstantin Beznosov. Android permissions remystified: A field
    study on contextual integrity. In 24th USENIX Security Symposium (USENIX
    Security 15), pages 499–514, 2015.
    [31] Zhiqiang Wu, Xin Chen, and Scott Uk-Jin Lee. Fcdp: Fidelity calculation for
    description-to-permissions in android apps. IEEE Access, 9:1062–1075, 2020.
    [32] Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov,
    and Quoc V Le. Xlnet: Generalized autoregressive pretraining for
    language understanding. Advances in neural information processing systems,
    32, 2019.

    QR CODE