簡易檢索 / 詳目顯示

研究生: 程芝潔
Chih-Chieh Cheng
論文名稱: 以隱藏式馬可夫模型整合聲音及運動資訊擷取棒球賽事精華
Fusion of Audio and Motion Information on HMM-Based Highlight Extraction for Baseball Games
指導教授: 許秋婷
Chiou-Ting Hsu
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2004
畢業學年度: 92
語文別: 英文
論文頁數: 53
中文關鍵詞: 精彩畫面擷取聲音特徵攝影機運動物件運動強度機率模型隱藏式馬可夫模型
外文關鍵詞: highlight extraction, audio feature, camera motion, object motion intensity, likelihood model, Hidden Markov model
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著多媒體資料庫的日益龐大,快速瀏覽及分類多媒體資料的需求也漸形急迫。視訊資料之所以能展現極大的多樣性主要是由於它同時包含了聲音及視覺的訊號,因此,在視訊資料分析方法中,同時將聲音與視覺資訊結合分析以達到與人類知覺相似的理解力,已成為一種必須的趨勢。在這些視訊資料之中,運動比賽影片是一個相當重要的類型,主要是由於運動比賽已是全球化的娛樂,並且吸引了大量的觀眾。本篇論文的目的為根據整合的聲音及運動資訊,偵測並擷取棒球賽事精華。為了更能描述聲音及運動特性,我們提出了以機率模型為基礎的特徵值表示法,而這個表示法主要在於計算聲音及運動特徵值屬於某一種聲音及運動類型的「可能程度」。實驗顯示我們所提出的表示法確實提升了以聲音及運動資訊描述影片的可靠性。接著我們將聲音及運動的機率模型以對等的方式結合,而得到一個整合的表示模型,並且以隱藏式馬可夫模型偵測此表示模型在時間上的轉變。我們在一個12小時的棒球比賽資料庫下的實驗證明了我們所提出方法的有效性及可靠性。


    While the size of multimedia database increases, the demand for efficient browsing and archiving for multimedia data becomes more and more urgent. Video data exhibits a great variety through the co-existence of both audio and visual signals, and therefore, it is necessary to incorporate both audio and visual information into video analysis in order to cope better with human perceptibility. Among them, sport video is one of the major categories, which is globally widespread and draws large audiences. This paper aims to extract baseball game highlights based on audio-motion integrated cues. In order to better describe different audio and motion characteristics in baseball game highlights, we propose a novel representation method based on likelihood model. The proposed likelihood model measures the “likeliness” of low-level audio features and motion features to a predefined audio types and motion patterns, respectively. We will show that the proposed feature representation indeed improves the reliability of using low-level audio/motion features to interpret the highlight. Next, we obtain an integrated feature representation by fusing the audio and motion likelihood models symmetrically. Finally, we employ Hidden Markov Model (HMM) to model and detect the transition of the integrated representation for highlight segments. A series of experiments have been conducted on a 12-hours video database to demonstrate the effectiveness of our proposed method and show that the proposed framework achieves promising results over a variety of baseball game sequences.

    1. INTRODUCTION 1 2. RELATED WORKS 5 2.1 FEATURE EXTRACTION 5 2.1.1 Visual Features-Based Methods 5 2.1.2 Audio Features-Based Methods 7 2.1.3 Audio-Visual Integration 8 2.2 CONTENT EVALUATION 9 2.2.1 Deterministic Reasoning 9 2.2.2 Probabilistic Inferring 10 3. FEATURE EXTRACTION AND REPRESENTATION 15 3.1. AUDIO INFORMATION 15 3.1.1 Feature Extraction 15 3.1.2 Representation using Likelihood Model 17 3.2. MOTION INFORMATION 19 3.2.1 Camera Motion and Its Likelihood Model 20 3.2.2 Object Motion Intensity and Its Likelihood Model 25 3.2.3 Shot-cut Detection 27 4. HIGHLIGHT EXTRACTION WITH INTEGRATED INFORMATION BASED ON HIDDEN MARKOV MODEL 31 4.1 FUSION OF AUDIO AND MOTION INFORMATION 31 4.2 HIGHLIGHT EXTRACTION USING HMM 32 5. EXPERIMENTAL RESULTS 35 5.1. GROUND TRUTH AND EVALUATION 35 5.2. RESULTS OF AUDIO-VISUAL INTEGRATED FRAMEWORK 36 5.3. PROBABILISTIC INFERRING V.S. DETERMINISTIC REASONING 38 5.4. RESULTS OF SYMMETRIC AUDIO-VISUAL COMBINATION 39 5.5. DISCUSSION 39 6. CONCLUSION 47 7. REFERENCES 48

    [1]
    D. Zhong and S.F. Chang, “Structure Analysis of Sports Video Using Domain Models,” Proc. ICME’01, 2001.
    [2]
    A. Ekin, A. Murat Tekalp and R. Mehrotra, “Automatic Soccer Video Analysis and Summarization,” IEEE Tran. Image Processing, vol. 12, no. 7, Jul. 2003.
    [3]
    V. Tovinkere and R. J. Qian, “Detecting Semantic Events in Soccer Games: Towards A Complete Solution,” Proc. ICME’01, 2001.
    [4]
    J. Assfalg, M. Bertini, A. Del Bimbo, W. Nunziati and P. Pala, “Detection and Recognition of Football Highlights using HMM,” Proc. ICECS’02, 2002.
    [5]
    G. Xu, Y.F. Ma, H.J. Zhang and S.Q. Yang, “A HMM Based Semantic Analysis Framework for Sports Game Event Detection,” Proc. ICIP’03, Sep. 2003.
    [6]
    B. Li and I. Sezan, “Semantic Sports Video Analysis: Approaches and New Applications,” Proc. ICIP’03, Sep. 2003.
    [7]
    T. Kawashima, K. Tateyama, T. Iijima and Y. Aoki, “Indexing of Baseball Telecast for Content-Based Video Retrieval,” Proc. ICIP’98, 1998.
    [8]
    M. Petkovic, V. Mihajlovic and W. Jonker, “Techniques for Automatic Video Content Derivation,” Proc. ICIP’03, Sep. 2003.
    [9]
    H. Pan, P. van Beek and M. I. Sezan, “Detection of Slow-Motion Replay Segments in Sports Video for Highlights Generation,” Proc. ICASSP’01, 2001
    [10]
    Y. Rui, A. Gupta and A. Acero, “Automatically Extracting Highlights for TV Baseball Programs,” Eighth ACM International Conference on Multimedia, pp. 105-115, 2000
    [11]
    Z. Xiong, R. Radhakrishnan, A. Divakaran and T. Huang, “Audio Events Detection Based Highlights Extraction from Baseball, Golf and Soccer Games in a Unified Framework,” Proc. ICASSP 2003, Apr. 2003.
    [12]
    R. Leonardi, P. Migliorati and M. Prandini, “Semantic Indexing of Sports Program Sequences by Audio-Visual Analysis,” Proc. ICIP 2003, Sep. 2003
    [13]
    A. Divakaran, R. Radhakrishnan and K. A. Peker, “Motion Activity-based Extraction of Key-frames from Video Shots,” Proc. ICIP’02, Sep. 2002
    [14]
    X. Shao, C. Xu and M. S. Kankanhalli, “Automatically Generating Summaries for Musical Video,” Proc. ICIP 2003, Sep. 2003.
    [15]
    Y. Gong and X. Liu, "Video Summarization and Retrieval using Singular Value Decomposition," Multimedia Systems, vol. 9, no. 2, Aug. 2003
    [16]
    M. Smith and T. Kanade, “Video Skimming and Characterization through the Combination of Image and Language Understanding Techniques,” Proc. CVPR’97, 1997.
    [17]
    Y.F. Ma and H.J. Zhang, “A Model of Motion Attention for Video Skimming,” Proc. ICIP 2002, Sep. 2002.
    [18]
    H. Sundaram, L. Xie and S.F Chang, “A Utility Framework for the Automatic Generation of Audio-Visual Skims,” Proc. ACM Multimedia, 2002.
    [19]
    M.J. Roach, J.S.D. Mason and M. Pawlewski, “Video Genre Classification using Dynamics,” Proc. ICASSP’01, 2001.
    [20]
    A. Bonzanini, R. Leonardi and P. Migliorati, “Event Recognition in Sport Programs using Low-Level Motion Indices,” Proc. ICME’01, 2001
    [21]
    N. Peyrard and P. Bouthemy, “Detection of Meaningful Events in Videos Based on A Supervised Classification Approach,” Proc. ICIP’03, Sep. 2003.
    [22]
    Y.P. Tan, D. D. Saur, S. R. Kulkarni and P. J. Ramadge, “Rapid Estimation of Camera Motion from Compressed Video with Application to Video Annotation,” IEEE Tran. Circuits and Systems for Video Tech., vol. 10, no. 1, Feb. 2000.
    [23]
    P. Chang, M. Han and Y. Gong, “Extract Highlights from Baseball Game Video with Hidden Markov Models,” Proc. ICIP 2002, Sep. 2002.
    [24]
    L. Xie, S.F. Chang, A. Divakaran and H. Sun, “Structure Analysis of Sports Video with Hidden Markov Models,” Proc. ICASSP’02, 2002
    [25]
    G. Xu, Y.F. Ma, H.J. Zhang and S.Q. Yang, “Motion-Based Event Recognition using HMM,” Proc. ICPR’02, 2002.
    [26]
    Y.L. Chang, W. Zeng, I. Kamel and R. Alonso, “Integrated Image and Speech Analysis for Content-Based Video Indexing,” Proc. ICMCS 1996, 1996
    [27]
    A. Albiol, L. Torres and J. Delp, “The Indexing of Persons in News Sequences using Audio-Visual Data,” Proc. ICASSP’03, 2003.
    [28]
    S.C. Chen, M.L. Shyu, W. Liao and C. Zhang, “Scene Change Detection by Audio and Video Clues,” Proc. ICME’02, 2002.
    [29]
    Z. Xiong, R. Radhakrishnan and A. Divakaran, “Generation of Sports Highlights using Motion Activity in Combination with a Common Audio Feature Extraction Framework,” Proc. ICIP 2003, Sep. 2003.
    [30]
    W. Hua, M. Han and Y. Gong, “Baseball Scene Classification using Multimedia Features,” Proc. ICME’02, 2002.
    [31]
    A. Hanjalic, “Generic Approach to Highlight Extraction from A Sport Video,” Proc. ICIP’03, Sep. 2003.
    [32]
    R. Dahyot, A. Kokaram, N. Rea and H. Denman, “Joint Audio Visual Retrieval for Tennis Broadcasts,” Proc. ICASSP’03, 2003.
    [33]
    Y. Gong, X. Liu and W. Hua, “Creating Motion Video Summaries with Partial Audio-Visual Alignment,” Proc. ICME’02, 2002.
    [34]
    C.C. Cheng and C.T. Hsu, “Content-Based Audio Classification with Generalized Ellipsoid Distance,” Proc. PCM 2002, Dec. 2002.
    [35]
    J. R. Deller, J. H. L. Hansen and J. G. Proakis, Discrete-Time Processing of Speech Signals, IEEE Press, 2000.
    [36]
    T. Hastie and R. Tibshirani, “Discriminant Analysis by Gaussian Mixtures,” J. Royal Statistical Society, series B, 1996.
    [37]
    T. Zhang and C.-C. J. Kuo, “Audio Content Analysis for On-line Audiovisual Data Segmantation and Classification,” IEEE Trans. Speech and Audio Processing, vol. 9, no. 4, May 2001
    [38]
    Y. Wang, Z. Liu, and J.C. Huang, “Multimedia Content Analysis,” IEEE Signal Processing Magazine, pp. 12-36, Nov. 2000
    [39]
    A. M. Kondoz, Digital Speech, Wiley, 1994
    [40]
    L. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition, Englewood Cliffs, NJ: Prentice-Hall, 1993.
    [41]
    S. Z. Li, “Content-Based Audio Classification and Retrieval Using the Nearest Feature Line Method,” IEEE Trans. Speech and Audio Processing, Vol.8, No.5, Sep. 2000
    [42]
    Z. Liu, J. Huang, Y. Wang, and T. Chen, “Audio Feature Extraction and Analysis for Scene Segmentation and Classification,” Journal of VLSI Signal Processing 20, pp.61-79, 1998.
    [43]
    E. Wold, T. Blum, D. Keislar, and J. Wheaton, “Content-based classification, search and retrieval of audio,” IEEE Multimedia Mag., vol. 3, no.3, pp. 27-36, 1996.
    [44]
    J. Foote et al, “Content-based retrieval of music and audio,” Multimedia Storage Archiving Syst. II, vol. 3229, pp. 138-147, 1997.

    [45]
    M. A. T. Figueiredo and A. K. Jain, “Unsupervised Learning of Finite Mixture Models,” IEEE Tran. Pattern Analysis and Machine Intelligence, vol. 24, no. 3, Mar. 2002.
    [46]
    M. A. T. Figueiredo and A. K. Jain, “Unsupervised Selection and Estimation of Finite Mixture Models,” Proc. ICPR’00, pp. 87-90, 2000.
    [47]
    F. Dufaux and J. Konrad, “Efficient, Robust, and Fast Global Motion Estimation for Video Coding,” IEEE Tran. Image Processing, vol. 9, no. 3, Mar. 2000.
    [48]
    X. Q. Gao, C. J. Duanmu and C. R. Zou, “A Multilevel Successive Elimination Algorithm for Block Matching Motion Estimation,” IEEE Tran. Image Processing, vol. 9, no. 3, Mar. 2000.
    [49]
    D. Wang and L. Wang, “Global Motion Parameters Estimation Using a Fast and Robust Algorithm,” IEEE Tran. Circuits and Systems for Video Tech., vol. 7, no. 5, Oct. 1997.
    [50]
    Y. T. Tse and R. Baker, “Global Zoom/Pan Estimation and Compensation for Video Compression,” Proc. ICASSP’91, 1991.
    [51]
    S. Lee and M. Hayes, “Real-Time Camera Motion Classification for Content-Based Indexing and Retrieval using Templates,” Proc. ICASSP’02, 2002
    [52]
    I. Koprinska and S. Carrato, “Temporal Video Segmentation: A Survey,” Signal Processing: Image Communication, vol.16, p.477-500, 2001.
    [53]
    A. Hanjalic, “Shot-Boundary Detection: Unraveled and Resolved?,” IEEE Trans. Circuit Syst. Video Technol., vol. 12, no. 2, Feb. 2002
    [54]
    W.J. Heng and K.N. Ngan, “Shot Boundary Refinement for Long Transition in Digital Video Sequence,” IEEE Trans. Multimedia, vol. 4, no. 4, Dec. 2002.
    [55]
    L. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proc. Of the IEEE, vol. 77, no. 2, Feb. 1999
    [56]
    D. Gatica-Perez, A. Loui and M.T. Sun, “Finding Structure in Home Videos by Probabilistic Hierarchical Clustering,” IEEE Trans. Circuit Syst. Video Technol., vol. 13, no. 6, Jun. 2003.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE