簡易檢索 / 詳目顯示

研究生: 莊筑鈞
Chuang, Chu-Chun
論文名稱: 基於單邊元三元組損失函數的多任務架構應用於泛化人臉防偽辨識
Multi-Task Framework for Generalized Face Anti-Spoofing with One-Side Meta Triplet Loss
指導教授: 賴尚宏
Lai, Shang-Hong
口試委員: 林嘉文
Lin, Chia-Wen
許秋婷
Hsu, Chiu-Ting
黃思皓
Huang, Szu-Hao
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 40
中文關鍵詞: 電腦視覺深度學習人臉防偽辨識多任務元學習域名泛化
外文關鍵詞: Computer vision, Deep Learning, Face anti-spoofing, Multi-task, Meta learning, Domain generalization
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由於偽冒攻擊的變化日漸增加,模型的泛化對於人臉防偽辨識成為一項不可
    或缺的挑戰,然而先前許多提出的方法往往無法在泛化上表現得很好,本論
    文基於兩個角度來提升人臉防偽辨識的泛化能力,首先,在網路中使用人臉
    解析的資訊,讓網路能專注於臉部區域以及理解不同臉部區域的分布;第二
    點,使用單邊元三元組損失函數與元學習的過程進行合作。本論文提出一個
    新穎的多任務架構應用於泛化人臉防偽辨識,方法中包括三個任務: 深度預
    估、人臉解析、欺騙分類。藉由在人臉解析以及深度預測的任務中做像素級
    的監督,讓學到的特徵有正規化的效果,能夠更準確的區分出遭受攻擊的
    臉。另外,我們提出的單邊元三元組損失函數使用兩階段加大的邊界值,與
    元學習中模擬域名轉移的過程互相結合,增進模型泛化的能力。本論文提出
    的架構包括一個特徵提取器、一個深度預估器、一個基於U-net 的人臉解析
    器、以及一個元學習器負責元學習和分類器。而本文提出的基於U-net 的人
    臉解析器包含一個用來預估臉部語義照片的U-net,和一個基於注意力模型
    的連接,用來整合不同維度中的臉部語義資訊。本文中使用四個公開資料集
    來做測試泛化能力的實驗,證實我們提出的多任務架構以及訓練方法可以比
    先前其他方法在泛化的能力上表現得更好,面對沒看過的資料也有相當優越
    的結果,在一些人臉防偽辨識的域泛化基準實驗中,我們的方法相較於所
    參考的方法,AUC 進步超過6%,而比起過去的方法,HTER 也有相當的進
    步。


    Due to increasing variations of presentation attacks, model generalization becomes
    an essential challenge for face anti-spoofing. Many previous works could not perform
    well in generalization. This paper improves the generalization ability of face
    anti-spoofing with two aspects. First, employing the face parsing information encourages
    the network to focus on face regions and realizes distributions between
    different face parts. Second, one-side triplet loss is adopted into the network to cooperate
    with the meta learning process. This paper proposes a novel multi-task face
    anti-spoofing framework that contains three tasks: depth estimation, face parsing,
    and live/spoof classification. With the pixel-wise supervision from the face parsing
    and depth estimation tasks, the regularized features can better distinguish spoof
    faces. While simulating domain shift with meta learning techniques, the proposed
    one-side triplet loss can further improve the generalization capability by a two-stage
    margin setting. Our framework consists of a feature extractor, a depth estimator, a
    U-net based face parsing module, and a meta learner for conducting meta learning
    and classification. The proposed U-net based face parsing module contains a U-net
    for predicting semantic face image and an attention-based skip connection for aggregating
    the semantic information of different channels. Extensive experiments on
    four public datasets demonstrate that the proposed framework and training strategies
    are more effective than previous works for model generalization to unseen domains.
    The AUCs are improved by over 6% compared to the baseline for some experiments
    on domain generalization benchmark for face anti-spoofing, and the HTER is also
    significantly improved over the previous methods.

    1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Related Work 5 2.1 Face Anti-spoofing . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 Temporal-based Methods . . . . . . . . . . . . . . . . . . . 5 2.1.2 Appearance-based Methods . . . . . . . . . . . . . . . . . 6 2.2 Domain Generalization . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.1 Meta Learning for Domain Generalization . . . . . . . . . . 8 3 Proposed Method 9 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Multi-Task Meta Learning . . . . . . . . . . . . . . . . . . . . . . 10 3.3 U-net Based Face Parsing Module . . . . . . . . . . . . . . . . . . 12 3.3.1 Face Parsing U-net . . . . . . . . . . . . . . . . . . . . . . 12 3.3.2 Attention-Based Skip Connection . . . . . . . . . . . . . . 13 3.4 One-Side Triplet loss . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.5 Objective Function . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.5.1 Classification Loss . . . . . . . . . . . . . . . . . . . . . . 15 3.5.2 One-Side Triplet Loss . . . . . . . . . . . . . . . . . . . . 16 3.5.3 Segmentation Loss . . . . . . . . . . . . . . . . . . . . . . 17 3.5.4 Depth Loss . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.5.5 Overall Loss . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.6 Network Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4 Experiments 23 4.1 Experimental Setting . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.1.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.1.2 Implementation Details . . . . . . . . . . . . . . . . . . . . 25 4.1.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . 26 4.2 Experimental Comparisons . . . . . . . . . . . . . . . . . . . . . . 27 4.3 Face Parsing Results . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.4 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.4.1 U-net Based Face Parsing Module . . . . . . . . . . . . . . 29 4.4.2 One-Side Triplet Loss with Meta learning . . . . . . . . . . 30 4.5 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.5.1 Grad-CAM Visualization . . . . . . . . . . . . . . . . . . . 30 4.5.2 t-SNE Visualization . . . . . . . . . . . . . . . . . . . . . 31 4.5.3 Effect of Attention-Based Skip Connection for Face Parsing 32 5 Conclusions 37 References 38

    [1] Atoum, Y., Liu, Y., Jourabloo, A., and Liu, X. Face anti-spoofing using patch
    and depth-based cnns. In In Proceeding of International Joint Conference on
    Biometrics (2017).
    [2] Balaji, Y., Sankaranarayanan, S., and Chellappa, R. Metareg: Towards domain
    generalization using meta-regularization. In Advances in Neural Information
    Processing Systems (2018), pp. 998–1008.
    [3] Boulkenafet, Z., Komulainen, J., and Hadid, A. Face spoofing detection using
    colour texture analysis. IEEE Transactions on Information Forensics and
    Security (2016).
    [4] Boulkenafet, Z., Komulainen, J., Li, L., Feng, X., and Hadid, A. Oulu-npu: A
    mobile face presentation attack database with real-world variations.
    [5] Chingovska, I., Anjos, A., and Marcel, S. On the effectiveness of local binary
    patterns in face anti-spoofing. In 2012 BIOSIG - Proceedings of the International
    Conference of Biometrics Special Interest Group (BIOSIG) (2012).
    [6] Deng, J., Guo, J., Ververas, E., Kotsia, I., and Zafeiriou, S. Retinaface: Singleshot
    multi-level face localisation in the wild. In Proceedings of the IEEE/CVF
    Conference on Computer Vision and Pattern Recognition (CVPR) (June 2020).
    [7] Feng, Y., Wu, F., Shao, X., Wang, Y., and Zhou, X. Joint 3d face reconstruction
    and dense alignment with position map regression network. In Proceedings of
    the European Conference on Computer Vision (ECCV) (2018).
    [8] Finn, C., Abbeel, P., and Levine, S. Model-agnostic meta-learning for fast
    adaptation of deep networks. In Proceedings of the 34th International Conference
    on Machine Learning (06–11 Aug 2017), Proceedings of Machine Learning
    Research, pp. 1126–1135.
    [9] Freitas Pereira, T. d., Komulainen, J., Anjos, A., De Martino, J. M., Hadid, A.,
    Pietikäinen, M., and Marcel, S. Face liveness detection using dynamic texture.
    EURASIP Journal on Image and Video Processing (2014), 2.
    [10] Ghifary, M., Kleijn, W. B., Zhang, M., and Balduzzi, D. Domain generalization
    for object recognition with multi-task autoencoders. In 2015 IEEE
    International Conference on Computer Vision (ICCV) (2015).
    [11] Guo, J., Zhu, X., Zhao, C., Cao, D., Lei, Z., and Li, S. Z. Learning meta face
    recognition in unseen domains. In Proceedings of the IEEE/CVF Conference
    on Computer Vision and Pattern Recognition (CVPR) (2020).
    [12] He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition.
    In 2016 IEEE Conference on Computer Vision and Pattern Recognition
    (CVPR) (2016).
    [13] Jia, Y., Zhang, J., Shan, S., and Chen, X. Single-side domain generalization for
    face anti-spoofing. In Proc. IEEE Conference on Computer Vision and Pattern
    Recognition (CVPR) (2020).
    [14] Komulainen, J., Hadid, A., and Pietikäinen, M. Face spoofing detection from
    single images using micro-texture analysis.
    [15] Li, D., Yang, Y., Song, Y.-Z., and Hospedales, T. Learning to generalize: Metalearning
    for domain generalization, 2018.
    [16] Li, H., Pan, S. J., Wang, S., and Kot, A. C. Domain generalization with adversarial
    feature learning. In 2018 IEEE/CVF Conference on Computer Vision
    and Pattern Recognition (2018).
    [17] Liu, Y., Jourabloo, A., and Liu, X. Learning deep models for face antispoofing:
    Binary or auxiliary supervision. In Proceedings of the IEEE Conference
    on Computer Vision and Pattern Recognition (CVPR) (2018).
    [18] Motiian, S., Piccirilli, M., Adjeroh, D. A., and Doretto, G. Unified deep supervised
    domain adaptation and generalization. In Proceedings of the IEEE
    International Conference on Computer Vision (ICCV) (2017).
    [19] Nichol, A., Achiam, J., and Schulman, J. On first-order meta-learning algorithms,
    2018.
    [20] Pérez-Cabo, D., Jiménez-Cabello, D., Costa-Pazo, A., and López-Sastre,
    R. J. Deep anomaly detection for generalized face anti-spoofing. In 2019
    IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops
    (CVPRW) (2019).
    [21] Saha, S., Xu, W., Kanakis, M., Georgoulis, S., Chen, Y., Paudel, D. P., and
    Van Gool, L. Domain agnostic feature learning for image and video based
    face anti-spoofing. In 2020 IEEE/CVF Conference on Computer Vision and
    Pattern Recognition Workshops (CVPRW) (2020).
    [22] Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., and Batra,
    D. Grad-cam: Visual explanations from deep networks via gradient-based localization.
    In 2017 IEEE International Conference on Computer Vision (ICCV)
    (2017).
    [23] Shao, R., Lan, X., Li, J., and Yuen, P. C. Multi-adversarial discriminative
    deep domain generalization for face presentation attack detection. In The IEEE
    Conference on Computer Vision and Pattern Recognition (CVPR) (2019).
    [24] Shao, R., Lan, X., and Yuen, P. C. Regularized fine-grained meta face antispoofing.
    In Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI)
    (2020).
    [25] van der Maaten, L., and Hinton, G. Visualizing data using t-sne. Journal of
    Machine Learning Research (2008).
    [26] Wang, G., Han, H., Shan, S., and Chen, X. Cross-domain face presentation
    attack detection via multi-domain disentangled representation learning.
    [27] Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., and Hu, Q. Eca-net: Efficient
    channel attention for deep convolutional neural networks. In 2020 IEEE/CVF
    Conference on Computer Vision and Pattern Recognition (CVPR) (2020).
    [28] Wang, Z., Yu, Z., Zhao, C., Zhu, X., Qin, Y., Zhou, Q., Zhou, F., and Lei,
    Z. Deep spatial gradient and temporal depth learning for face anti-spoofing.
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
    Recognition (CVPR) (June 2020).
    [29] Wen, D., Han, H., and Jain, A. K. Face spoof detection with image distortion
    analysis. IEEE Transactions on Information Forensics and Security (2015).
    [30] Xu, Z., Li, S., and Deng, W. Learning temporal features using lstm-cnn architecture
    for face anti-spoofing. 2015 3rd IAPR Asian Conference on Pattern
    Recognition (ACPR) (2015).
    [31] Yang, J., Lei, Z., and Li, S. Z. Learn convolutional neural network for face
    anti-spoofing, 2014.
    [32] Yang, X., Luo, W., Bao, L., Gao, Y., Gong, D., Zheng, S., Li, Z., and Liu,
    W. Face anti-spoofing: Model matters, so does data. In Proceedings of the
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
    (June 2019).
    [33] Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., and Sang, N. Bisenet: Bilateral
    segmentation network for real-time semantic segmentation. In Proceedings of
    the European conference on computer vision (ECCV) (2018).
    [34] Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., and Sang, N. Bisenet: Bilateral
    segmentation network for real-time semantic segmentation. In Proceedings of
    the European Conference on Computer Vision (ECCV) (September 2018).
    [35] Yu, Z., Li, X., Niu, X., Shi, J., and Zhao, G. Face anti-spoofing with human
    material perception, 07 2020.
    [36] Yu, Z., Qin, Y., Li, X., Wang, Z., Zhao, C., Lei, Z., and Zhao, G. Multi-modal
    face anti-spoofing based on central difference networks. In Proceedings of the
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
    Workshops (June 2020).
    [37] Yu, Z., Wan, J., Qin, Y., Li, X., Li, S. Z., and Zhao, G. Nas-fas: Static-dynamic
    central difference network search for face anti-spoofing. IEEE Transactions
    on Pattern Analysis and Machine Intelligence (2020).
    [38] Zhang, Z., Yan, J., Liu, S., Lei, Z., Yi, D., and Li, S. Z. A face antispoofing
    database with diverse attacks. In 2012 5th IAPR International Conference on
    Biometrics (ICB) (2012).
    [39] Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., and Torralba, A. Learning
    deep features for discriminative localization. In 2016 IEEE Conference on
    Computer Vision and Pattern Recognition (CVPR) (2016), pp. 2921–2929.

    QR CODE