簡易檢索 / 詳目顯示

研究生: 孫昕霈
Sun, Hsin Pei
論文名稱: 在不同頭部姿勢下的視線估測
Gaze Estimation under Head Pose Variations
指導教授: 賴尚宏
Lai, Shang Hong
口試委員: 王聖智
Wang, Sheng Jyh
許秋婷
Hsu, Chiou Ting
陳煥宗
Chen, Hwann Tzong
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2016
畢業學年度: 105
語文別: 英文
論文頁數: 35
中文關鍵詞: 視線估測深度學習人機互動電腦視覺
外文關鍵詞: Gaze Estimation, Deep Learning, Human-Computer Interaction, Computer Vision
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本篇論文提出一個視線估算的系統,該系統可以在不同的使用者和不同的頭部姿勢下,根據人眼影像去估算出目前人所看到的位置。此一研究有助於開發有別於觸控或體感的人機互動控制模式。
    在我們的系統中,為了達到不同受測者皆可使用的特性,我們採用了包含多種頭部姿勢資訊以及多個受測者的UT dataset來學習出各種不同頭部移動的情形。另外,我們建立3D臉部模型來作頭部姿勢的估算來得到轉動的3D資訊,藉此達到全程採用單一相機的基於影像學習視線估測,以擴充應用的廣泛性及一般性。
    對於視線估測這類回歸問題,我們引入近年來流行的深度學習架構來解決問題。然而,大部分的視線估測演算法都是在固定頭部姿勢下對於瞳孔在不同位置來判斷人所看的地方,這樣的研究並不適用於一般看電視的情境,比如移動的物體或是人在不同位置,人的視線都會隨著頭部轉動而移動。因此為了解決頭部移動所導致眼睛形狀不同的問題,我們針對區域性的頭部姿勢來訓練不同的深度網路來估算目光位置。
    透過實驗,我們證明了如此的方法可以有效的解決在不同頭部姿勢下視線估測的問題,且在訓練時間和表現結果都有所提升。


    In this thesis, we propose a new gaze estimation algorithm that estimates where a user looks from the eye images. The proposed gaze estimation algorithm is based on using multiple convolutional neural networks (CNN) to learn the regression networks for estimating gaze angles from eye images. The proposed algorithm can provide accurate gaze estimation for users with different head poses, since it explicitly uses the head pose information in the proposed gaze estimation framework. To achieve person independent system, we train the deep CNN regression networks with UT Multiview dataset, which contains a large number of subjects with large head pose variations. On the other hand, we estimate the head pose from the 2D face image and a generic 3D face model. It is the reason that the proposed algorithm can be widely used for appearance-based gaze estimation in practice. Our experimental results show that the proposed gaze estimation system improves the accuracy of appearance-based gaze estimation under head pose variations compared to the previous methods.

    Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Problem Description 1 1.3 Main Contribution 2 1.4 Thesis Organization 3 Chapter 2 Literature Review 5 2.1 Model-based Gaze Estimation 5 2.2 Appearance-based Gaze Estimation Method 6 2.3 Convolutional Neural Networks 7 Chapter 3 Proposed System 9 3.1 System Overview 9 3.2 Image Patch Normalization 10 3.3 Head Pose Estimation 11 3.4 Our CNN architecture 14 3.5 Training Phase 16 3.6 Estimation Phase 18 Chapter 4 Experiment 20 4.1 Camera Calibration and Coordinate System Transformation 20 4.2 Gaze Angle Procedure 22 4.3 Our Data Collection 23 4.4 Different Structures of Our Proposed Method 24 4.5 Comparison with Baseline Method 28 4.6 Cross-subject Experiment in UT Multiview Dataset 30 Chapter 5 Conclusion 31 References 32

    [1] J. Nielsen, K. Pernice. Eyetracking Web Usability, Berkeley, CA: New Riders Press, 2009.
    [2] J. Nielsen, K. Pernice. How to conduct eyetracking studies, Nielsen Norman Group, 2009.
    [3] B. A. Smith, Q. Yin, S. K. Feiner, and S. K. Nayar, Gaze locking: passive eye contact detection for human-object interaction, in Proc. UIST, pages 271–280, 2013.
    [4] C. H. Morimoto and M. R. Mimica, Eye gaze tracking techniques for interactive applications, Comput. Vi. Image Understand., Special Issue on Eye Detection and Tracking, vol. 98, no. 1, pp. 4–24, 2005.
    [5] J. Gall and V. Lempitsky, Class-specific Hough forest for object detection, in Proceedings of IEEE Conference Computer Vision and Pattern Recognition, 2009.
    [6] J. P. Rae, W. Steptoe, and D. J. Roberts, Some Implications of Eye Gaze Behavior and Perception for the Design of Immersive Telecommunication Systems, 2011 IEEE/ACM 15th Int. Symp. Distrib. Simul. Real Time Appl., pp. 108–114, Sep. 2011.
    [7] D. W. Hansen and Q. Ji, In the eye of the beholder: A survey of models for eyes and gaze, IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 3, pp. 478–500, Mar. 2010.
    [8] C. H. Morimoto and M. R. M. Mimica, Eye gaze tracking techniques for interactive applications, Comput. Vis. Image Understand., vol. 98, no. 1, pp. 4–24, 2005.
    [9] A. Nakazawa and C. Nitschke, Point of gaze estimation through corneal surface reflection in an active illumination environment, in Proc. 12th ECCV, pp. 159–172, 2012
    [10] R. Valenti, N. Sebe, and T. Gevers, Combining head pose and eye location information for gaze estimation, IEEE Transactions on Image Processing, vol. 21, pp. 802–815, 2012.
    [11] J. Panero and M. Zelnik, Human Dimension and Interior Space: A Source Book of Design Reference Standards, New York: Watson-Guptill, 1979.
    [12] F. Lu, Y. Sugano, T. Okabe, and Y. Sato, Adaptive linear regression for appearance-based gaze estimation, IEEE Trans. PAMI, Oct 2014.
    [13] F. Lu, Y. Sugano, O. Takahiro, and Y. Sato, A head pose-free approach for appearance-based gaze estimation, in BMVC, 2011.
    [14] F. Lu, Y. Sugano, T. Okabe, and Y. Sato, Head pose-free appearance-based gaze sensing via eye image synthesis, in ICPR, 2012.
    [15] T. Schneider, B. Schauerte, and R. Stiefelhagen, Manifold alignment for person independent appearance-based gaze estimation, in ICPR, 2014.
    [16] N. S. Altman, An Introduction to Kernel and Nearest-Neighbor Non-parametric Regression, The American Statistician, vol. 46, pp. 175–185, 1992.
    [17] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees, Belmont, CA: Wadsworth International Group, 1984
    [18] C. E. Rasmussen and C. K. I. Williams, Gaussian processes for machine learning, the MIT Press, 2006
    [19] A. J. Smola and B. Sch¨olkopf, A tutorial on support vector regression, Statistics and Computing, vol. 14, pp. 199–222, 2004.
    [20] N. Dalal and W. Triggs, Histograms of Oriented Gradients for Human Detection, in CVPR, 2004.
    [21] D. C. He and L. Wang, Texture Unit, Texture Spectrum And Texture Analysis, IEEE Transactions on Geoscience and Remote Sensing, vol. 28, no. 4, pp. 509–512, 1990.
    [22] Y. Sugano, Y. Matsushita, and Y. Sato, Learning-by-synthesis for appearance-based 3d gaze estimation, in Proc. CVPR, pages 1821–1828, 2014.
    [23] L. Breiman, Random forests, Machine learning, 45(1):5–32, 2001.
    [24] Y. LeCun, L.D. Jackel, L. Bottou, C. Cortes, J.S. Denker, H. Drucker, I.Guyon, U.A. M¨uller, E. S¨ackinger, P. Simard, and V. Vapnik, Learning algorithms for classification: A comparism on handwritten digit recognistion, Neural Networks, pages 261–276, 1995.
    [25] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, 2012.
    [26] K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, CoRR, abs/1409.1556, 2014.
    [27] De la Torre, F., W.-S. Chu, X. Xiong, F. Vicente, X. Ding and J. Cohn, Intraface, 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), pp. 1–8, 2015.
    [28] Y. C. Lee, S. H. Lai, Accurate and robust face recognition from RGB-D images with a deep learning approach, National Tsing Hua University, 2016.
    [29] V. Lepetit, F. Moreno-Noguer, and P. Fua, EPnP: An accurate o(n) solution to the PnP problem, International Journal of Computer Vision, 81(2):155–166, 2009.
    [30] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, in NIPS, pp. 1106–1114, 2012.
    [31] J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Ng, Multimodal deep learning, in Proc. ICML, pages 689–696, 2011.
    [32] F. Martinez, A. Carbone, and E. Pissaloux, Gaze estimation using local features and non-linear regression, in ICIP, 2012.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE