簡易檢索 / 詳目顯示

研究生: 劉育誠
Liu, Yu-Cheng
論文名稱: 基於視覺焦點模型之視訊物件發掘
Visual Attention Based Multiple Objects Discovery
指導教授: 林嘉文
Lin, Chia-Wen
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2009
畢業學年度: 97
語文別: 英文
論文頁數: 63
中文關鍵詞: 視訊物件發掘人眼視覺模型EM演算法
外文關鍵詞: Video Object Discovery, Visual Attention Model, EM Algorithm
相關次數: 點閱:4下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 此論文提出以基於視覺焦點模型的機率架構實現視訊物件發掘,此架構包含了三種模型:在外觀模型中以機率的形式描述物體於不同畫框(frame)的一致性、使用空間模型表現物體的幾何結構、並以運動模型建立物體在時間上的相關性。在多物體視訊發掘的主題,我們採用Perceptual Quality Significance Map (PQSM) 為視覺焦點模型,並可將PQSM偵測之人眼感興趣區域用以描述物體的外觀、大小及位置。由於視訊的場景常會隨著鏡頭(shot)的轉換而改變,因此我們可運用從EM演算法求得之機率參數找出含有欲發掘物體的畫框,並以這些機率參數比對各鏡頭中包含之物體的相似性,藉以發掘出跨多鏡頭的物體,這也是與物體追蹤(object tracking)相比之下最大的不同。最後,可從結果看出我們所採用之模型架構於視訊多物件發掘的主題表現非常出色;因人眼感興趣之區域於本論文之運動模型中被用以提供物體的資訊,而運動模型是用以建立各資料的時間相關性,故運用本論文之模型架構還可建立出PQSM中人眼感興趣之區域的時間相關性。


    In this thesis, we present a visual attention based probabilistic framework for the video objects discovery scheme. The framework consists of three models: the appearance model use probabilistic representation to describe the consistency of object across frame; the spatial model represents the objects’ geometric structure; the motion model establishes the temporal association of objects. In order to complete the video multiple objects discovery, we use Perceptual Quality Significance Map (PQSM) for the visual attention model. The visual attention regions from PQSM can be regarded as objects. We also use those regions to describe the appearance, size, and initial location of objects. Finally, the probabilistic parameters are obtained by Expectation-Maximization (EM) algorithm. Since the scene of video may switch between different shots, we use these probabilistic parameters to indicate which frame has the discovered objects and measure the similarity of objects in different shots. The mainly different from object tracking is object discovery can discover the object across different shots. We show the results that can be performed very well for video multiple objects discovery. And the attention regions from PQSMs are used to generate the object information for the motion model. Since the motion model is used to establish the data association, the temporal association of attention regions from PQSMs can be established by this proposed model.

    Abstract i 摘 要 ii Content iii Chapter 1. Introduction 1 Chapter 2. Related Work 3 2.1. Probabilistic Latent Semantic Analysis (pLSA) model 3 2.2. DISCOV model 5 2.3. Perceptual Quality Significance Map (PQSM) 6 Chapter 3. Proposed Method 9 3.1. Visual Word Generation 10 3.1.1 SIFT Descriptor Generation 12 3.2. Find the Number of Objects 13 3.2.1 Subshot Segmentation 14 3.2.2 Find the Number of Objects in Every Shot 16 3.3. Appearance Model 21 3.3.1 Initialize Codeword Distribution with Using PQSM 21 3.4. Spatial Model 24 3.4.1 Couple Spatial Model with Other Models 24 3.5. Motion Model 25 3.5.1 Predict the Location and Size of Object with Using PQSM 27 3.5.2 Couple Motion Model with Other Models 30 3.6. Model Fitting 30 3.7. Objects Discovery from Different Shots 31 Chapter 4. Experimental Results 37 Chapter 5. Conclusion and Future Work 61 References 62

    [1] T. Hofmann, “Unsupervised Learning by Probabilistic Latent Semantic Analysis,” Machine Learning, vol. 41, no. 2, pp. 177-196, 2001.
    [2] D. Lowe, “Distinctive Image Features from Scale Invariant Keypoints,” Int’l J. Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
    [3] D. Liu and T. Chen, “DISCOV: A Framework for Discovering Objects in Video,” IEEE Trans. on Multimedia, 2008.
    [4] Z.-K. Lu, W. Lin, X.-K. Yang, E.-P. Ong, and S.-S. Yao, “Modeling visual attention’s modulatory afterreffects on visual sensitivity and quality evaluation,” IEEE Trans. Image Processing, vol. 14, no. 11, pp. 1928–1942, Nov. 2005.
    [5] J. B. Hopfinger, M. H. Buonocore, and G. R. Mangun, “The neural mechanisms of top-down attentional control,” Nature Neurosci., vol. 3, no. 3, pp. 284–291, Mar. 2000.
    [6] Y.-F. Ma, X.-S. Hua, L. Lu, and H.-J. Zhang, “A generic framework of user attention model and its application in video summarization,” IEEE Trans. Multimedia, vol.7, no.5, pp. 907–919, Oct. 2005.
    [7] W.-H. Cheng, W.-T. Chu, and J.-L. Wu, “A visual attention based region-of-interest determination framework for video sequences,” IEICE Trans. Information and Systems, vol. E-88, No. 7, pp. 1578–1586, July 2005.
    [8] T. Mei, X.-S. Hua, C.-Z. Zhu, H.-Q. Zhou, and S.-P. Li, “Home video visual quality assessment with spatiotemporal factors,” IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 6, pp.699–706, June 2007.
    [9] Y.-H. Ho, C.-W. Lin, J.-F. Chen, and H.-Y. Mark Liao “Fast coarse-to-fine video retrieval using shot-level statistics,” IEEE Trans. Circuits and Systems for Video Technology, vol. 16, no. 5, pp. 642-648, May 2006.
    [10] M. Turk and A. Pentland. “Eigenfaces for recognition,” Journal of Cognitive Neuroscience, vol. 3, no. 1, Mar. 1991.
    [11] J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide baseline stereo from maximally stable extremal regions,” in Proc. British Machine Vision Conf., 2002, p. 384.
    [12] D. Liu and T. Chen, “Semantic-shift for unsupervised object detection,” in Proc. IEEE CVPR Workshop on Beyond Patches, 2006, pp. 16–23.
    [13] D. Liu and T. Chen “A topic-motion model for unsupervised video object discovery,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007, p. 1.
    [14] K. Zhang and J. Kittler, “Global motion estimation and robust regression for video coding,” in Proc. ICIP, Oct. 1998, pp. 994-947.
    [15] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd ed. New York: Wiley, 2000.
    [16] Y. Bar-Shalom and T. E. Fortmann, Tracking and Data Association. New York: Academic Press, 1988.
    [17] L. Itti, Models of Bottom-Up and Top-Down Visual Attention, California Institute of Technology, Jan 2000.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE