研究生: |
陳殷盈 Yan-Ying Chen |
---|---|
論文名稱: |
聲音和影像資訊在生物認證系統上的融合 Audio-Visual Information Fusion for Biometric Verification System |
指導教授: |
賴尚宏
Shang-Hong Lai |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2004 |
畢業學年度: | 92 |
語文別: | 中文 |
論文頁數: | 52 |
中文關鍵詞: | 聲音 、人臉 、生物認證 、多模式 |
外文關鍵詞: | biometric, verification, multi-modal, information fusion |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
認證系統目前已經被廣泛的利用在真實生活中,其重要性與日俱增。例如自動櫃員機系統或是進出管理系統等等。生物特徵的獨特性可以有力地分辨出使用者的不同,並且不會遺失。因此,可取代一般的密碼及身分認證卡來建立認證系統。利用多種生物特徵建立的多模式生物認證更可強化系統的可靠度及安全度。
本論文目的在發展一套利用聲音及人臉資訊建立可靠的多模式生物認證系統。藉由不同模式的特徵,使認證系統更加強固。在此類多模式生物認證系統中,不同模式提供的特徵需要一個完善的融合方式,來提高辨識率。因此,我們根據Opinion Fusion 及 Concatenation Fusion利用兩種分類方法:Gaussian Mixture Models (GMMs)和 Support Vector Machine(SVMs)組成三種資料融合的方法。藉此比較,不同的融合方法對辨識率的影響。
第一種方法為根據SVMs為分類器,利用Concatenation Fusion 來融合資料。此種方法需先將不同模式提供的特徵結合為一個新的特徵,再利用SVMs來分類。此外我們提出一個新的計算信心指數的方式,利用test sample與其hyper-plane的距離作為此次分類的信心水準。同樣的分類法則也可利用到第二種方法,以SVMs-based的Opinion fusion上,同樣可以提高辨識率。此外,再以利用傳統的GMMs加上Opinion Fusion 的方式作為對照,建立多模式的認證系統。論文最末將利用相同的資料庫作為訓練及測試資料進行實驗,藉此呈現這三種方法在認證精確度上的比較。
Personal verification system has already been used in real-life at present. It becomes increasingly important recently and has been employed in many practical applications, including automatic teller machines, access control system and so on. Biometrics is unique and can be classified between different claims. Hence, Biometrics can replace password or ID card to build a verification system. The multi-modal biometric verification system comprised of multiple experts would be more robust and more accurate than a single-modal biometrics verification system.
The purpose of this thesis is to develop a reliable multi-modal biometric verification system based on speech and face information and to make this system more robust based on the features extracted from different experts.
For multi-modal biometric verification systems, the features extracted from different modules need to be fused intelligently to improve the verification rate. So, we consider two fusion methods, i.e. opinion fusion and concatenation fusion, and two classifiers, i.e. Gaussian Mixture Models (GMMs) and Support Vector Machine (SVMs) in this thesis. Three audio-visual biometric verification systems formed by different fusion and classifiers are discussed in details. These three systems are compared with their verification rates on experiments on audio-visual biometric verification.
In this thesis, we propose two SVM-based multi-modal biometric verification systems. The first system is based on the concatenation fusion with SVM classifier. This method concatenates the features provided by each expert to form a new feature vector. Then the SVM classifier is trained from the concatenated feature vectors for each person and later used for verification. To determine the final verification result from several SVM classification results for many possible paired audio-visual concatenated feature vectors, we present a new scheme for computing confidence weight based on the distance between feature vector and the hyper-plane of the associated SVM model. The final verification is determined from the weighted sum of all the SVM classification results. The idea for computing the confidence of the opinion can be used for SVM-based opinion fusion to enhance the verification rate. Finally, experimental results on the same audio-visual database for the three biometric verification systems are shown and their verification rates compared. We show that the proposed SVM-based fusion systems outperform the traditional GMM-based opinion fusion system.
[1] C. C Chibelushi, F, Deravi and J. S. Mason, “Voice and Facial Image Integration for Speaker Recognition”, IEEE International Symposium and Multimedia Technologies and Future Applications, Southampton, UK, 1993.
[2] R. Brunelli, D. Falavigna, T. Poggio and L. Stringa, “Automatic Person Recognition Using Acoustic and Geometric Features”, Machine Vision & Applications, Vol. 8, 1995, pp. 317-325.
[3] R. Brunelli, D. Falavigna, “Person Identification Using Multiple Cues”, IEEE Trans. On Pattern Analysis and Machine Intelligence, Vol. 10, No. 17, 1995, pp 955-965.
[4] J.Kittler, M. Hatef, R. P. W. Duin and J. Matas, “On Combining Classifiers”, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 20, No. 3, 1998, pp. 226-239.
[5] T. Wark, S. Sridharan and V. Chandran, “Robust Speaker Verification via Fusion of Speech and Lip Modalities”, Proc. International Conf. Acoustics, Speech and Signal Processing, Phoenix, Arizona, 1999, Vol. 6, pp. 3061-3064.
[6] T. Wark, S. Sridharan and V. Chandran, “Robust Speaker Verification via Asynchronous Fusion of Speech and Lip Information”, Proc. 2nd International Conf. Audio- and Video- based Biometric Person Authentication, Washington, D.C., 1999, pp. 37-42.
[7] C. Sanderson and K. K. Paliwal, “Noise Compensation in a Person Verification System Using Face and Multiple Speech Features”, Pattern Recognition, Volume 36, Issue 2, pp. 293-302, February, 2003
[8] M. Turk and A. Pentland, "Eigenfaces for recognition," Journal of Cognitive Neuroscience, Vol. 3, No. 1, pp. 71-86, Winter 1991
[9] Steve Young, Gunnar Evermann, Thomas Hain, Dan Kershaw, Gareth Moore, Julian Odell, Dave Ollason, Dan Povey, Valtcho Valtchev, Phil Wooland, "The HTK Book(for HTK Version 3.2.1", @copyright 2001-2002 Cambridge University Engineering Department.
[10] J.A. Haigh, Voice activity detection for conversational analysis, Masters Thesis, University of Wales, 1994.
[11] Douglas A. Reynolds and C. Rose, "Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models", IEEE Transections on Speech and Audio Processing, Vol. 3, No. 1, pp. 72-83, January 1995
[12] V. Wan and S. Renals. Speaker verification using sequence discriminant support vector machines. IEEE Trans. on Speech and Audio Processing, 2004. Accepted for publication
[13] Wan, V. and Campbell, W. M., Support Vector Machines for Speaker Verification and Identification. Neural Networks for Signal Processing X, pp. 775-784, 2000.
[14] A. Dempster, N. Laird, and D. Rubin, “Maximum likelihood from incomplete data via the EM algorithm”, J. Royal Stat. Soc. Vol. 39, pp. 1-38, 1977.
[15] Conrad Sanderson and Kuldip K. Paliwal, "Information Fusion and Person Verification Using Speech & Face Information", IDIAP-RR 02-33, September 2002