研究生: |
廖家德 Liao, Chia-Te |
---|---|
論文名稱: |
Novel Robust Kernels for Visual Learning Problems 新穎強健的核心方法在視覺學習問題之應用 |
指導教授: |
賴尚宏
Lai, Shang-Hong |
口試委員: | |
學位類別: |
博士 Doctor |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2010 |
畢業學年度: | 99 |
語文別: | 英文 |
論文頁數: | 114 |
中文關鍵詞: | 核心方法 、強健學習 、影像分類 、表情分析 |
外文關鍵詞: | Kernel methods, Robust learning, Image classification, Facial expression analysis |
相關次數: | 點閱:4 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
因所收集到的資料常具有不同程度的差異性與雜訊干擾,機器學習演算法的強健度一直是個相當重要的課題。為使演算法更具有強健性,我們往往需要蒐集大量的訓練資料以涵蓋所有可能的資料變化。而在原始的資料空間裡,資料的分佈通常是非線性的,因此應用非線性的演算法將可以得到較好的結果。對視覺學習問題而言,核心方法(kernel methods)在近年來得到了相當大的進展。這篇論文即發展了五個可供核心演算法使用的強健核心(kernel function),意欲在面對不同影像相關問題時改善核心演算法的強健度。藉著特別的設計,我們定義的核心函數可以在高度雜訊的環境裡合理估測影像關聯性。第一個核心主要是由一個基礎於rho函數與RBF函數所加權構成。藉著估測影像模型,第二個核心可以減輕不合理的資料元素所帶來的影響。第三個核心結合了切線距離與rho函數,使其對變形影像及雜訊均具有不錯的強健性。第四個核心是特別為了視訊監控中的行人辨識應用所設計的。它允許我們使用未排序的特徵集合來描述一張影像,並進一步將其組織為階層式架構。最後,我們提出了一個最佳化方法來學習核心函數,應用在表情強度估測以及表情辨識這兩種問題上。從理論角度而言,這些核心函數均滿足Mercer's condition,所以他們可以被應用在許多核心演算法上以提升強健性。從實用角度而言,藉著許多在不同應用上的實驗結果,我們也驗證了這些核心函數確實可以改善核心演算法的強健度。
Robustness, which is the ability of learning algorithms to resist data disturbance and irrelevant data variations, is very critical for most visual learning systems. One usually has to collect a large number of examples in order to train a model that is robust against different kinds of data disturbance or variations. On the other hand, because the distribution of data is often highly nonlinear in the input space, a robust learning solution can be achieved by using nonlinear learning methods. Kernel methods have been extensively applied to many visual learning applications. In this dissertation, we develop five different robust kernels that can be used in conjunction with kernel learning machines, and consequently improve the robustness for resolving several image-related problems. By putting special attention in the kernel design, the proposed kernels can robustly provide image similarity in noisy environments. For the first kernel, it is a weighted linear combination of a robust ρ-function and a Radial Basis Function (RBF). For the second kernel, we incorporate a learned image appearance model with a robust ρ-function, and design a kernel which suppresses the influence of data elements too far away from a regular appearance model. For the third kernel, it is designed by incorporating the notions of robust error function and tangent distance. This kernel is insensitive to some irrelevant data deformations and noise disturbances. The fourth kernel is a robust image kernel especially designed for pedestrian identification problem for video surveillance. It can be used over unordered feature sets, and it allows us to represent a target image in a hierarchical structure. Finally, we propose a framework to learn novel facial expression kernels, which can be applied to estimate the facial expression intensity and recognize facial expressions. It performs robustly against inter-personal variations based on using the intra-person expression flow. The expression kernel involves determining a weighting mask for the facial optical flows by solving a constrained quadratic optimization problem for each expression. From the theoretical point of view, these kernels are proved to satisfy the Mercer's condition, so they are valid to be used in a class of kernel-based learning algorithms to enhance their robustness. From a practical point of view, these kernels are shown to significantly improve the robustness of the machine learning algorithms for many visual learning applications.
[1] D.Huttenlocher, G. Klanderman, and W. Rucklidge, "Comparing images using the hausdorff distance," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 15, no. 9, 1993, pp. 850-863.
[2] J. Segman, J. Rubinstein, and Y.Y. Zeevi, "The canonical coordinates method for pattern deformation: theoretical and computational considerations," IEEE Trans. Pattern Analysis and Machine Intelligence vol. 14, 1992, pp. 1171-1183.
[3] T. Vetter and T. Poggio, "Image synthesis from a single example image," in: Proc. European Conference on Computer Vision, Cambridge, UK, 1996, pp. 652-659.
[4] B. Schölkopf, C. Burges, and V. Vapnik, "Incorporating invariances in support vector learning machines," in: Proc. International Conference on Artificial Neural Networks, Berlin, Germany, 1996, pp. 47-52.
[5] P. Simard, B. Victorri, Y. Le Cun, and J. Denker, "Tangent prop-a formalism for specifying selected invariances in an adaptive network," Advances in Neural Information Processing Systems, 1992, pp. 895-903.
[6] M. Girolami, "Mercer kernel-based clustering in feature space," IEEE Trans. Neural Networks, vol. 13, no. 3, 2002, pp. 780-784.
[7] T. Joachims, "Text categorization with support vector machines: learning with many relevant features," in: Proc. 10th European Conference on Machine Learning, Chemnitz, 1998, pp. 137-142.
[8] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, "Gene selection for cancer classification using support vector machines," Machine Learning, vol. 46, 2002, pp. 389-422.
[9] K. Grauman and T. Darrell, "Pyramid match kernels: Discriminative classification with sets of image features," in: Proc. IEEE International Conference on Computer Vision, Beijing, China, 2005, pp. 1458-1465.
[10] H. Bulthoff, C. Wallraven, and A. Graf, "View-based dynamic object recognition based on human perception," in: Proc. International Conference on Pattern Recognition, Quebec, Canada, 2002, pp. 768-776.
[11] S.-Z. Li, Q. Fu, L. Gu, B. Scholkopf, Y. Cheng, H. Zhang, "Kernel machine based learning for multi-view face detection and pose estimation," in: Proc. IEEE International Conference on Computer Vision, Vancouver, Canada, 2001, pp. 674-679.
[12] C. Wallraven, B. Caputo, A. Graf, "Recognition with local features: the kernel recipe," in: Proc. IEEE International Conference on Computer Vision, Nice, France, 2003, pp. 257-264.
[13] A. Barla, E. Franceschi, F. Odone, and A. Verri, "Hausdorff Kernel for 3D Object Acquisition and Detection," in: Proc. European Conference on Computer Vision, Copenhagen, Denmark, 2002, pp. 20-33.
[14] Y.-Y. Lin, T.-L. Liu, C.-S. Fuh, "Local Ensemble Kernel Learning for Object Category Recognition," in: Proc. IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, USA, 2007, pp. 1-8.
[15] S. Lyu, "Mercer kernels for object recognition with local features," in: Proc. IEEE Conference on Computer Vision and Pattern Recognition, San Diego, USA, 2005, pp. 223-229.
[16] G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray, "Visual categorization with bags of keypoints," in: Proc. European Conference on Computer Vision Workshop on Statistical Learning in Computer Vision, Prague, May 2004, pp. 1-22.
[17] C. Lu, T. Zhang, R. Zhang, and C. Zhang, "Adaptive robust kernel PCA algorithm," in: Proc. International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, China, 2003, pp. VI- 621-4.
[18] B. Schölkopf, A. Smola, and K.-R. Muller, "Nonlinear component analysis as a kernel eigenvalue problem," Neural Computation, vol. 10, 1999, pp. 1299-1319.
[19] A. Barla, E. Franceschi, F. Odone, and A. Verri, "Image kernels," in: Proc. International Workshop on Pattern Recognition with Support Vector Machines, Quebec, Canada, 2002, pp. 83-96.
[20] W. Du, K. Inoue, and K. Urahama, "Robust Kernel Fuzzy Clustering, Fuzzy Systems and Knowledge Discovery," vol. 1, 2005, pp.454-461.
[21] J. H. Chen, "M-estimator based robust kernels for support vector machines," in: Proc. International Conference on Pattern Recognition, Cambridge, UK, 2004, pp. 168-171.
[22] N. Cristianini and J. Shawe-Taylor. An Introduction to support vector machines and other kernel-based learning algorithms, Cambridge University Press, Cambridge, 2000.
[23] V. Vapnik. Statistical Learning Theory, Chichester, Wiley, 1998.
[24] D. DeCoste and B. Schölkopf, "Training invariant support vector machines," Machine Learning, vol. 46, 2002, pp.161-190.
[25] T.B. Trafalis and R.C. Gilberta, "Robust classification and regression using support vector machines," European Journal of Operational Research, vol. 173, 2006, pp. 893-909.
[26] D. DeCoste and M.C. Burl, "Distortion-invariant recognition via jittered queries," in: Proc. IEEE Conference on Computer Vision and Pattern Recognition, Hilton Head, USA, 2000, pp. 732-737.
[27] R. Kondor and T. Jebara, "A kernel between sets of vectors," in: Proc. International Conference on Machine Learning, Washington D.C., USA, 2003, pp. 361-368.
[28] L.Wolf and A. Shashua, "Learning over sets using kernel principal angles," Journal of Machine Learning Research, vol. 4, 2003, pp. 913-931.
[29] K. Hotta, "Support Vector Machine with Local Summation Kernel for Robust Face Recognition," in: Proc. International Conference on Pattern Recognition, Cambridge, UK, 2004, pp. 482-485.
[30] Q. Song, W. Hu, and W. Xie, "Robust support vector machine with bullet hole image classification," IEEE Trans. System, Man, and Cybernetics vol. 32, 2002, pp. 440-448.
[31] F. R. Hampel, E. M. Ronchetti, P. J. Rousseeuw, and W. A. Stahel, Robust Statistics: The Approach Based on Influence Functions, Wiley, New York, 1986.
[32] S. Geman and D. E. McClure, "Statistical methods for tomographic image reconstruction," Bulletin of the International Statistical Institute, vol.52, 1987, pp. 5-21.
[33] Zaid Harchaouiang and Francis Bach, "Image Classification with Segmentation Graph Kernels," in: Proc. IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, USA, 2007, pp. 1-8.
[34] H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, and C. Watkins, "Text Classification using String Kernels," Journal of Machine Learning Research, vol.2, 2002, pp. 419-444.
[35] D. Nist´er and H. Stew´enius, "Scalable recognition with a vocabulary tree," in: Proc. IEEE Conference on Computer Vision and Pattern Recognition, New York, USA, 2006, pp. 2161-2168.
[36] S. Boughorbel, J.-P. Tarel, and F. Fleuret, "Non-mercer kernel for SVM object recognition," in: Proc. British Machine Vision Conference, London, UK, 2004, pp. 137-146.
[37] C. Wallraven, B. Caputo, and A. Graf, "Recognition with local features: the kernel recipe," in: Proc. IEEE International Conference on Computer Vision, Beijing, China, 2003, pp. 257-264.
[38] H.-T. Lin and C.-J. Lin, "A Study on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods," Technical Report, National Taiwan University, 2003.
[39] A. J. Smola, B. Schölkopf, and K.-R. Muller, "The connection between regularization operators and support vector kernels," Neural Networks, vol. 11, 1998, pp. 637-649.
[40] J. He, S.-F. Chang, and L. Xie, "Fast Kernel Learning for Spatial Pyramid Matching," in: Proc. IEEE Conference on Computer Vision and Pattern Recognition, New York, USA, 2006, pp. 1-7.
[41] Y. Zhang and T. Chen, "Efficient Kernels for Identifying Unbounded-Order Spatial Features," in: Proc. IEEE Conference on Computer Vision and Pattern Recognition, Miami, USA, 2009, pp. 1762-1769.
[42] A. Gersho and Y. Shoham, "Hierarchical vector quantization of speech with dynamic codebook allocation," in: Proc. International Conference on Acoustics, Speech, and Signal Processing, San Diego, USA, 1984, pp. 416-419.
[43] C. Cortes and V. Vapnik, "Support vector networks," Machine Learning, vol. 20, 1995, pp. 273-297.
[44] S. Mika, G. Rätsch, J. Weston, B. Schölkopf, A. J. Smola, and K.-R. Müller, "Invariant feature extraction and classification in kernel spaces," Advances in Neural Information Processing Systems, 2000, pp. 526-532.
[45] T. Vetter and T. Poggio, "Image synthesis from a single example image," in: Proc. European Conference on Computer Vision, Cambridge, UK, 1996, pp. 652-659.
[46] P. J. Rousseeuw and A.M. Leroy, Robust regression and outlier detection. JohnWiley&Sons, New York, 1987
[47] N. Cristianini, J. Shawe-Taylor, A. Elisseeff, and J. Kandola, "On kernel-target alignment," Advances in Neural Information Processing Systems, 2001, pp. 367-373.
[48] P. Simard, B. Victorri, Y. Le Cun, and J. Denker, "Tangent prop-a formalism for specifying selected invariances in an adaptive network," Advances in Neural Information Processing Systems, 1992, pp. 895-903.
[49] P. Y. Simard, Y. A. LeCun, J. S. Denker, and B. Victorri, "Transformation invariance in pattern recognition-tangent distance and tangent propagation," Lecture Notes in Computer Science, vol. 1524, 1998, pp. 239-274.
[50] B. Schölkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, Cambridge, MA, 2001.
[51] J. Stewart, "Positive definite functions and generalizations, a historical survey," Rocky Mountain Journal of Mathematics, vol. 6, 1976, pp.409-433.
[52] C.A. Micchelli, "Interpolation of scattered data: distance matrices and conditionally positive functions," Constr. Approx., vol. 2, 1986, pp.11-22.
[53] O. Javed, Z. Rasheed, K. Shafique, and M. Shah, "Tracking across multiple cameras with disjoint views," in: Proc. IEEE International Conference on Computer Vision, Nice, France, October 2003, pp 952-957.
[54] A. W. Senior, A. Hampapur, Y. Tian, L. Brown, S. Pankanti, and R.M. Bolle, "Appearance models for occlusion handling," Image and Vision Computing, vol. 24, no.11, 2006, pp. 1233-1243.
[55] N. Thome and S. Miguet, "A robust appearance model for tracking human motions," in: Proc. IEEE International Conference Advanced Video and Signal Based Surveillance, Como, Italy, 2005, pp. 528-533.
[56] N. D. Bird, O. Masoud, N.P. Papanikolopoulos, and A. Isaacs, "Detection of loitering individuals in public transportation areas," IEEE Trans. Intelligent Transportation Systems, vol. 6, no.2, 2005, pp.167-177.
[57] D.G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal on Computer Vision, vol. 60, no.2, 2004, pp. 91-110.
[58] J. S. Taylor and N. Cristianini, Kernel Methods for Pattern Analysis. Cambridge University Press, 2004.
[59] C. W. Hsu and C. J. Lin, "A comparison of methods for multi-class SVMs," IEEE Trans. Neural Networks, vol. 13, no. 2, 2002, pp. 415-425.
[60] A. S. Georghiades, D. J. Kriegman, P. N. Belhumeur, "From few to many: illumination cone models for face recognition under variable lighting and poses," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 6, 2001, pp. 643-660.
[61] FGnet - Facial Expression and Emotion Database, 2004. Technische Universität München. <http://www.mmk.ei.tum.de/~waf/fgnet/feedtum.html>
[62] United States Postal Service Database, 1992. CEDAR. <http://www.kernel-machines.org/data.html>
[63] I. Borg and P. Groenen, Modern multidimensional scaling, Berlin:Springer-Verlag, 1997.
[64] L. K., Saul and S.T. Roweis, "Nonlinear dimensionality reduction by locally linear embedding," Science, vol. 290, pp. 2323-2326.
[65] D. Ridder, O. Kouropteva, O. Okun, M. Pietikainen, and R.P.W. Duin, "Supervised locally linear embedding," in: Proc. International Conference on Artificial Neural Networks and Neural Information Processing, Istanbul, Turkey, 2003, pp. 333-341.
[66] C.C. Chang, C.J. Lin, Libsvm: a library for support vector machines. Software, available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
[67] B. Haasdonk and D. Keysers, "Tangent distance kernels for support vector machines," in: Proc. the International Conference on Pattern Recognition, Quebec, Canada, 2002, pp. 864-868.
[68] D. Keysers, W. Macherey, H. Ney, and Jorg Dahmen, "Adaptation in statistical pattern recognition using tangent vectors," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 2, 2004, pp. 269-274.
[69] J. X. Dong, A. Krzyzak, and C. Y. Suen, "Fast SVM training algorithm with decomposition on very large datasets," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 4, 2005, pp. 603-618.
[70] D. Keysers, R. Paredes, H. Ney, and E. Vidal, "Combination of tangent vectors and local representation for handwritten digit recognition," in: Proc. International Workshop on Statistical Pattern Recognition, Ontario, Canada, 2002, LNCS 2396, Springer-Vertag, pp. 538-547.
[71] H. C. Zeng and S. H. Lai, “Adaptive foreground object extraction for real-time video surveillance with lighting variations,” in: Proc. International Conference on Acoustics, Speech, and Signal Processing, Hawaii, USA, April 2007, pp. I1201-I1204.
[72] C. Harris and M. Stephens, "A combined corner and edge detector," in: Proc. 4th Alvey Vision Conference, Manchester, January 1988, pp. 147-151.
[73] C. T. Liao, Y. L. Wang, S.-H. Lai, and C.T. Hsu, "A Novel Color Context Descriptor and Its Applications", in: Proc. IEEE International Conference on Multimedia & Expo, New York, USA, 2009, pp. 438-441.
[74] M. Pantic and L. Rothkrantz, Expert system for automatic analysis of facial expression," Image and Vision Computing Journal, vol.18, no.11, 2000, pp. 881-905.
[75] G. Zhao and M. Pietikäinen, Boosted multi-resolution spatiotemporal descriptors for facial expression recognition, Pattern Recognition Letters, vol. 30, no. 12, 2009, pp. 1117-1127.
[76] P. Ekman and W.V. Friesen, Facial action coding system (FACS): Manual, Palo Alto, CA: Consulting Psychologists Press, 1978.
[77] J. J.-J. Lien, T. Kanade, J.F. Cohn, C.C. Li, and A.J. Zlochower, Subtly different facial expression recognition and expression intensity estimation, in: Proc. IEEE Conference on Computer Vision and Pattern Recognition, Santa Barbara, CA , USA, 1998, pp. 853-859.
[78] P. Yang, Q. Liu, and D. N. Metaxas, "RankBoost with l1 regularization for facial expression recognition and intensity estimation," in: Proc. IEEE International Conference on Computer Vision, Kyoto, Japan, 2009, pp. 1018-1025.
[79] F. Bach, R. Thibaux, and M. I. Jordan, "Computing regularization paths for learning multiple kernels," in: Advances in Neural Information Processing Systems, 2005, pp. 73-80.
[80] S. Sonnenburg, G. R¨atsch, C. Sch¨afer, and B. Sch¨olkopf, "Large scale multiple kernel learning," Journal of Machine Learning Research, vol. 7, 2006, pp. 1531-1565.
[81] F. Bach, G. Lanckriet, and M. Jordan, "Multiple kernel learning, conic duality, and the smo algorithm," in: Proc. International Conference on Machine Learning, Banff, Alberta, Canada, 2004, pp. 6-13.
[82] M. Gonen and E. Alpaydin, "Localized multiple kernel learning," in: Proc. International Conference on Machine Learning, Helsinki, Finland, 2008, pp. 352-359.
[83] A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet, "More efficiency in multiple kernel learning," in: Proc. International Conference on Machine Learning, Corvalis, Oregon, USA, 2007, pp. 775-782.
[84] S. Sonnenburg, G. Ratsch, C. Schafer, and B. Scholkopf, "Large scale multiple kernel learning," Journal of Machine Learning Research, vol. 7, 2006, pp. 1531-1565.
[85] S. Negahdaripour, "Revised definition of optical flow: Integration of radiometric and geometric cures for dynamic scene analysis," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 9, 1998, pp. 961-979.
[86] R. Jin, H. Valizadegan, and H. Li, "Ranking refinement and its application to information retrieval," International Conference on World Wide Web, Beijing, China, 2008, pp. 397-406.
[87] P. Yang, Q. Liu, and D. N. Metaxas, "RankBoost with L1 regularization for facial expression recognition and intensity estimation," in: Proc. IEEE International Conference on Computer Vision, Kyoto, Japan, 2009, pp. 2638-2644.
[88] T. Kanade, J. F. Cohn, and Y. Tian, "Comprehensive database for facial expression analysis," in: Proc. the Fourth IEEE International Conference on Automatic Face and Gesture Recognition, Pittsburgh, PA, USA, 2000, pp. 46-53.
[89] B. J. Frey and D. Dueck, "Clustering by Passing Messages Between Data Points," Science, vol. 315, 2007, pp. 972-976.
[90] H. A. Bangpeng Yao and S. Lao, "Logit-rankboost with pruning for face recognition," in: Proc. 8th IEEE International Conference on Automatic Face and Gesture Recognition, Amsterdam, Netherlands, 2008, pp. 1-8.
[91] G. Littlewort, M. S. Bartlett, I. Fasel, J. Susskind, and J. Movellan, "Dynamics of facial expression extracted automatically from video," Journal of Image and Vision Computing, vol. 24, no. 6, 2006, pp. 615-625.
[92] S. Koelstra and M. Pantic, "Non-rigid registration using free-form deformations for recognition of facial actions and their temporal dynamics," in: Proc. IEEE International Conference on Automatic Face and Gesture Recognition, Amsterdam, Netherlands, 2008, pp. 1-8.