研究生: |
施皇嘉 Shih, Huang-Chia |
---|---|
論文名稱: |
應用於運動影片以其內容為基礎的關注評價及賽程時況標記之研究 A Study on Content-based Attention Ranking and In-game Stats Tagging for Sports Videos |
指導教授: |
黃仲陵
Huang, Chung-Lin 黃正能 Hwang, Jenq-Neng |
口試委員: | |
學位類別: |
博士 Doctor |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2008 |
畢業學年度: | 97 |
語文別: | 英文 |
論文頁數: | 104 |
中文關鍵詞: | 視訊分析 、多媒體搜尋 、資料探勘 、語意分析 、人類關注模組 、運動節目 |
外文關鍵詞: | Video Analysis, Multimedia Search, Data Mining, Semantic Analysis, Human Attention Model, Sports Program |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於視訊資料大量數位化,以致於數位資料量與日俱增,現今所制定之各種視訊及影像壓縮標準,使得數位資料能以最精簡的編碼容量來儲存,因此,資料的描述與儲存已不成問題,現階段最急迫的問題是如何提供使用者在最短的時間內有效率地找到最符合其需求的數位內容(digital content)。倘若所有的數位資料必須由人類親自去分類與標記,無形之中耗費了許多資源。有鑒於此,數位內容的自動分析,儼然成為一項重要的課題。我們嘗試去建立一座橋樑來連結電腦世界中的低階特徵與人類世界中的語意資訊之間的鴻溝(semantic gap)。能夠應用在即時的網路文字廣播及各式的後製分析及處理,有效取代舊有的人工分類、人工挑選精采片段等時間消耗量大、重覆性高的工作。我們的研究主題,嘗試去建構一個系統能夠對輸入影片自動做賽程時況的標記(in-game stats tagging)及語意分析(semantic analysis)並依照觀眾的感觀(perception)特性來對運動節目片段做關注評價(attention ranking)分析。近年來,由於許多台灣旅外運動選手的傑出表現,運動節目已成為時下最受歡迎的視訊節目。另一方向,由於運動節目具有高重覆性、高相關性的鏡頭特性,在分析處理時較為有利,所以我們選擇運動節目做為我們研究的範疇。
我們提出一個Attention Rank (AR)數值來表示一個視訊畫面吸引觀眾的可能性,AR結合視覺關注模組(Visual Attention Model, VAM)及文字關注模組(Contextual Attention Model, CAM)來完成推導,並配合攝影機移動關注模組(Camera Motion Attention Model)作權重的調整。我們以物體為基礎的方法來表示每一個參與畫面的物體平均貢獻到該畫面對觀眾的吸引力值。其中VAM又可以分為空間域、時間域及人臉特徵。而對於文字語意上的特徵,我們建構一個文字關注模組CAM來模擬使用者對於賽程的感興趣程度。CAM所使用之統計資料是由記分板(Superimposed Caption Box, SCB)取得重要的賽程統計數據。我們提出一個方法來推演這些統計資料吸引觀眾的程度高低,統計資訊對於觀眾感興趣程度可以分成,正比、反比、及特定情況,再利用多個文字映像矩陣來取得AR。除此之外,使用者的回饋訊息我們也加以參考,以利推演每一種特徵對使用者的關注指數,可增進搜尋視訊資料的精確度。
運動節目中,有許多重要的資訊會利用即時的記分板(Superimposed Caption Box, SCB)嵌於畫面的某一角落,以提供觀眾瞭解目前比賽情況,而記分板上的資料是屬於最精簡而又最重要的資訊。因此,SCB內的資訊對於我們分析運動節目的內含是不可或缺的一個元素。不同類型的運動節目,所顯示的內容也有所不同,呈現的方式、位置、圖樣也會隨著電視公司而異。雖然目前有相當多的研究學者愈來愈重視計分板上的資訊,但幾乎所有的資訊萃取方法都是由人工給定數字的位置及其含意,即給定一個樣板(template),但如此只是變相的文字辨識(OCR)問題,對於萃取高階資訊的貢獻不大,況且,隨著不同的影片輸入,就必須重新建構一次樣板,這是十分費時也是缺乏彈性的作法。因此我們提出一個一般性(general)的演算法。適用於各種的運動節目並不限記分板的樣式。記分板的呈現方式包括有字元(character)及符號(symbol),對於偵測及定位這些字元與符號的位置及大小並不困難,困難點在於如何去給定它們的高階語意,例如,某一個數字物件(digital object)我們利用特定的文字定位及文字辨識來得知為”6”,但我們卻無法得知其意義為何。經由我們的觀察可以發現幾個記分板在於顯示字元資料的規則性:(1)某一個數字物件僅能歸類到單一提示物件(annotative object),(2)隸屬於相同一類的數字物件與提示字物件在擺設上會依循特定的相對關係,如水平或垂直。(3)並非所有的數字物件會伴隨一個提示物件。籍由以上三點的規則性,我們利用在處理空間標記(labeling)相當著名的Relaxation Labeling演算法來解決我們的問題。
The demand for multimedia applications is increasing even beyond the capabilities of best-effort transmission networks. Therefore, the trend is toward constructing a content-oriented multimedia server that is capable of handling high volumes of content as well as of fulfilling high performance and various user preference requirements. Researchers have been trying to integrate context and content for multimedia mining and management, which is crucial for multimedia communication.
The attention analysis of multimedia data is challenging since different models have to be constructed according to different attention characteristics. Effectively measuring the user attention on the videos is an important task in many multimedia applications, including multimedia information retrieval, users-content interaction, and multimedia searching. This thesis analyzes how people are excited about the watched video content and proposes a content-driven attention ranking strategy which enables the client users to iteratively browse the video according to their preference. The proposed attention rank (AR) algorithm, which is extended from the Google PageRank algorithm that sorts the websites based on the importance, can effectively measure the user interest (UI) level for each video frame. The degree of attention is derived by integrating the object-based visual attention model (VAM) with the contextual attention model (CAM), which can reliably take advantage of the human perceptual characteristics, and effectively identify the user-attentive video content.
This thesis presents a method to integrate the content and context for sports video understanding. On one hand, visual information is the most intuitive feature of the human perception system. Modeling the visual attention provides a good solution for a better understanding of the video content. The considered visual features includes spatial, temporal, and facial feature maps. On the other hand, the game stat information in the sports video is the most of the subscribers are interested in. The captions embedded in sports video programs represent the key information of the video content. Taking advantage of prior implicit knowledge about sports videos, we proposed an automatic context extraction and interpretation system that can be used to tag the in-game stats for providing the on-going game situation to subscribers.
The information of users’ feedback is utilized in re-ranking procedure to further improve the retrieving accuracy. A higher AR represents a stronger user interest. The AR is affected by two factors: intra-AR and inter-AR. In a frame-based analysis, the intra-AR of each frame is based on its visual and contextual attention characteristics. If there are many high-attention objects contained in a frame with a high-interest contextual description, it is highly probable that that frame has a high AR. From an event-based analysis scenario viewpoint, the inter-AR of each frame is affected by the relevant key-frames which are located in the same event.
[1] S.-F. Chang, A. Eleftheriadis, and D. Anastassiou, “Development of Coloumbia’s Video on Demand Tested,” Signal Processing: Image Commun., vol. 8, pp. 191–208, 1994.
[2] R. Boutaba and A. Hafid, “A Generic Platform for Scalable Access to Multimedia-on-Demand Systems,” IEEE Journal on Selected Areas in Communications, vol. 17, no. 9, Sep. 1999.
[3] S. Cherry, “Winner: The Battle for Broadband,” IEEE Spectrum, Jan. 2005.
[4] ISO/IEC, JTC1/SC29/WG11 N4980, “MPEG-7: Overview (version 8),” July 2002.
[5] S.-F. Chang, T. Sikora, and A. Puri, "Overview of the MPEG-7 Standard," IEEE Transactions on Circuits and Systems for Video Technology, special issue on MPEG-7, June 2001.
[6] M. Abdel-Mottaleb, and S. Krishnamachari, “Multimedia Descriptions Based on MPEG-7 Extraction and Applications,” IEEE Trans. on Multimedia, vol. 6, no. 3, pp. 459-468, June 2004.
[7] J. Fan, A. K. Elmagarmid, X. Zhu, W. G. Aref, and L. Wu, “ClassView: Hierarchical Video Shot Classification, Indexing, and Accessing,” IEEE Trans. on Multimedia, vol. 6, no. 1, pp. 70-86, Feb. 2004.
[8] B. L. Tseng, C. Y. Lin, and J. R. Smith, “Video personalization and summarization system,” IEEE Workshop on Multimedia Signal Processing, pp.424-427, Dec. 9-11 2002.
[9] J. H. Lee, G. G. Lee, and W. Y. Kim, “Automatic video summarizing tool using MPEG-7 descriptors for personal video recorder,” IEEE Trans. on Consumer Electronics, vol. 49, no 3, pp. 742-749, Aug. 2003.
[10] A. Ekin, A. M. Tekalp, and R. Mehrotra, “Automatic Soccer Video Analysis and Summarization,” IEEE Trans. on Image Processing, vol. 12, no. 7, pp. 796-807, July 2003.
[11] H. C. Shih and C. L. Huang, “A Semantic Network Modeling For Understanding Baseball Video,” in Proc. IEEE-ICASSP, Hong-Kong, April 2003.
[12] D. Zhong and S. F. Chang, “Spatio-temporal video search using the object-based video representation,” in Proc. IEEE-ICIP, Oct. 1997.
[13] H. Zhang, A Wang, and Y. Altunbasak, “Content-based video retrieval and compression,” in Proc. IEEE-ICIP, Oct. 1997.
[14] M. R. Naphade, I. Kozintsev, and T. S. Huang, “A Factor Graph frame work for semantic indexing,” IEEE Trans. on CAS for VT, vol. 12, no. 1, pp. 40-52, Jan. 2002.
[15] N. Vasconcelos and A. Lippman, “Bayesian modeling of video editing and structure: semantic features for video summarization and browsing,” in Proc. IEEE-ICIP, Chicago, 1998.
[16] F. V. Jensen, K. G. Olesen, and S. K. Anderson, “An Algebra of Bayesian Belief Universes for Knowledge-Based Systems,” Networks, vol. 20, pp.637-659, 1990.
[17] A. M. Ferman and A, M, Tekalp, “Probabilistic analysis and extraction of video content,” in Proc. IEEE-ICIP, Tokyo, Japan, Oct. 1999.
[18] B. Li, H. Pan, and I. Sezan “A General Framework for Sports Video Summarization with its Application to Soccer,” in Proc. IEEE-ICASSP, Hong-Kong, Apr. 2003.
[19] S. F. Chang and H. Sundaram, “Structural and Semantic Analysis of Video,” in Proc. IEEE-ICIP, Vancouver, Sep. 2000.
[20] J. Luo, A. E. Savakis, S. P. Etz, and A. Singhal, “On the application of Bayes Networks to Semantic Understanding of Consumer Photographs,” in Proc. IEEE-ICIP, Vancouver, Sep. 2000.
[21] A. Ekin, A. M. Tekalp, and R. Mehrotra, “Extraction of Semantic Description of Event using Bayseian Network,” in Proc. IEEE-ICIP, 2001.
[22] P. Chang, M. Han, and Y. Gong, “Extract Highlights from Baseball Game Video with Hidden Markov Models,” in Proc. IEEE-ICIP, 2002.
[23] G. Xu, Y. F. Ma, H. J. Zhang, and S. Yang, “A HMM based Semantic Analysis Framework for Sports Game Event Detection,” in Proc. IEEE-ICIP, 2003.
[24] E. Ahmet and A. M. Tekalp, “Shot Type Classification by Dominant Color for Sports Video Segmentation and Summarization,” in Proc. IEEE-ICASSP, Hong-Kong, Apr. 2003.
[25] S. Dasiopoulou, V. Mezaris, I. Kompatsiaris, V.-K. Papastathis, and M. G. Strintzis, “Knowledge-Assisted Semantic Video Object Detection,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 10, pp. 1210–1224, Oct. 2005.
[26] M. R. Naphade, S. Basu, J. R. Smith, C.-Y. Lin, and B. Tseng, “Modeling Semantic Concepts to Support Query by Keywords in Video,” in Proc. IEEE ICIP, Rochester, New York, Sep. 22–25, 2002.
[27] C. Y. Chao, H. C. Shih, and C. L. Huang, “Semantics-based highlight extraction of soccer program using DBN,” in Proc. IEEE ICASSP, Philadelphia, PA, Mar. 18–23, 2005.
[28] N. Babaguchi, Y. Kawai, T. Ogura and T. Kitahashi, “Personalized Abstraction of Broadcasted American Football Video by Highlight Selection,” IEEE Trans. Multimedia, vol. 6, no. 4, pp. 575–586, Aug. 2004.
[29] C.-H. Liang, W.-T. Chu, J.-H. Kuo, J.-Ling Wu, and W.-H. Cheng, “Baseball Event Detection using Game-specific Feature Sets and Rules,” in Proc. IEEE ISCAS 2005, Kobe, Japan, May. 23–26, 2005.
[30] C G.M. Snoek and M. Worring, “Multimodal Video Indexing: A Review of the State-of-the-art,” Multimedia Tools and Applications, vol. 25, no. 1, pp. 5–35, 2005.
[31] D. A. Sadlier and N.E. O’Connor, “Event Detection in Field Sports Video using Audio-visual Features and a Support Vector Machine,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 10, pp. 1225–1233, Oct. 2005.
[32] T. Sato, T. Kanada, E. Hughes, and M. Smith, “Video OCR for Digit News Archives,” IEEE Workshop on CAIVD, pp. 52–60, Jan. 1998.
[33] X. Tang, X. Gao, J. Liu, and H. Z. Zhang, "A Spatial- Temporal Approach for Video Caption Detection and Recognition," IEEE Trans Neural Networks, vol. 13, no. 4, pp. 961–971, Jul. 2002.
[34] G. Xu, Y.-F. Ma, H.-J. Zhang, and S.-Q. Yang, “An HMM-based framework for video semantic analysis,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 11, pp. 1422–1433, Nov. 2005.
[35] H. C. Shih and C. L. Huang, “MSN: Statistical Understanding of Broadcasted Sports Video Using Multilevel Semantic Network,” IEEE Trans. Broadcasting, vol. 15, no. 4, pp. 449–459, Dec. 2005.
[36] A. Kokaram, N. Rea, R. Dahyot, A. M. Tekalp, P. Bouthemy, P. Gros, and I. Sezan, “Browsing Sports Video,” IEEE Signal Processing Magazine, pp. 46–58, March 2006.
[37] Z Xiong, X. S. Zhou, Q. Tian, Y. Rui, and T. S. Huang, “Semantic Retrieval of Video,” IEEE Signal Processing Magazine, pp. 18–27, March 2006.
[38] W.-N. Lie and S.-H. Shia, “Combining Caption and Visual Features for Semantic Event Classification of Baseball Video,” in Proc. IEEE ICME, Jul. 6–8, 2005.
[39] D. Zhang, R. K. Rajendran, and S.-F. Chang, “General and Domain-specific Techniques for Detecting and Recognizing Superimposed Text in Video,” in Proc. IEEE ICIP, Sep. 22–25, 2002.
[40] S.-H. Sung and W.-S. Chun, “Knowledge-based numeric open caption recognition for live sportscast,” in Proc. IEEE ICPR, Aug. 11–15, 2002.
[41] M. R. Lyu, J. Song, and M. Cai, “A Comprehensive Method for Multilingual Video Text Detection, Localization, and Extraction,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 2, pp. 243–255, Feb. 2005.
[42] G. Miao, G. Zhu, S. Jiang, Q. Huang, C. Xu, and W. Gao, “A Real-time Score Detection and Recognition Approach for Broadcast Basketball Video,” in Proc. IEEE ICME, Beijing, China, pp. 1691–1694, Jul. 2-5, 2007.
[43] R. Lienhart and A. Wernicke, “Localizing and Segmenting Text in Images and Videos,” IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 14, pp. 256–268, Apr. 2002.
[44] R. Lienhart, “Video OCR: A Survey and Practitioner’s Guide,” The Kluwer International Series in Video Computing, vol. 6: Video Mining, 2003.
[45] H. Li and D. Doermann, “Text Enhancement in Digit Video Using Multiple Frame Integration,” in Proc. of 7th ACM Int. Conf. on Multimedia, Orlando, Florida, pp. 19–22, Oct. 1999.
[46] J. Xi, X.-S. Hua, X.-R. Chen, W. Liu, and H.-J. Zhang, “A Video Text Detection and Recognition System,” in Proc. IEEE ICME, Tokyo, Japan, Aug. 22–25, 2001.
[47] J. K. Tsotsos, S. M. Culhane, W. Y. K. Wai, Y. H. Lai, N. Davis, and F. Nuflo, “Modeling visual-attention via selective tuning,” Artifical Intelligence, vol. 78, no. 1–2, pp. 507–545, Nov. 1995.
[48] L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 11, pp. 1254–1259, Nov. 1998.
[49] D. Walther and C. Koch, “Modeling attention to salient proto-objects,” Neural Networks, vol. 19, pp. 1395–1407, 2006.
[50] Y.-F. Ma, X.-S. Hua, L. Lu, and H.-J. Zhang, “A Generic Framework of User Attention Model and Its Application in Video Summarization,” IEEE Trans. on Multimedia, vol. 7, no, 5, pp. 907–919, Oct. 2005.
[51] W.-H. Cheng, C.-W. Wang, and J.-L. Wu, “Video Adaptation for Small Display Based on Content Recomposition,” IEEE Trans. on CAS for VT, vol. 17, no. 1, pp.43–58, Jan. 2007.
[52] H. C. Shih and C. L. Huang, “Content Extraction and Interpretation of Superimposed Captions for Broadcasted Sports Videos,” IEEE Trans. on Broadcasting, vol. 54, no. 3, Sept. 2008.
[53] Y. Rui, T. S. Huang, M. Ortega,S. Mehrotra, "Relevance Feedback: A Power Tool in Interactive Content-based Image Retrieval," IEEE Trans. on Circuits and Systems for Video Technology, vol. 8, no. 5, pp. 644–655, Sep. 1998.
[54] R. Zhang and Z. Zhang, “BALAS: Empirical Bayesian Learning in the Relevance Feedback of Image Retrieval,” Image and Vision Computing, vol. 24, no. 3, pp 211–223, March 2006.
[55] I. Ruthven and M. Lalmas, “A Survey on the Use of Relevance Feedback for Information Access Systems,” Knowledge Engineering Review, vol. 18, no. 2, pp. 95–145, June 2003.
[56] R. Zhang, Z. Zhang, M. Li, W.-Y. Ma, and H. J. Zhang, “A Probabilistic Semantic Model for Image Annotation and Multi-Modal Image Retrieval,” Multimedia Systems Journal, vol. 12, no. 1, pp. 27–33, Aug. 2006.
[57] S.-F. Chang, W. Chen, H. J. Meng, H. Sundaram, and D. Zhong, “VideoQ: An Automated Content Based Video Search System Using Visual Cues,” in Proc. ACM Multimedia, Seattle, WA, pp. 313–324, 1997.
[58] A. D. Doulamis and N. D. Doulamis, “Optimal Content-Based Video Decomposition for Interactive Video Navigation,” IEEE Trans. Circuits Syst. Video Technol., vol. 14, no. 6, pp. 757–775, June 2004.
[59] L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 11, pp. 1254–1259, Nov. 1998.
[60] Y.-F. Ma, X.-S. Hua, L. Lu, and H.-J. Zhang, “A Generic Framework of User Attention Model and Its Application in Video Summarization,” IEEE Trans. on Multimedia, vol. 7, no, 5, pp. 907–919, Oct. 2005.
[61] L. Page, S. Brin, R. Motwani, and T. Winograd, “The Pagerank Citation Ranking: Bringing Order to the Web,” Stanford Digital Library Technologies Working Paper, 1999-0120, Stanford Univ., Palo Alto, Calif., 1998.
[62] S. Brin and L. Page, “The Anatomy of A Large-Scale Hypertextual Web Search Engine,” in Proc. of the 7th International World Wide Web Conf., New York, pp. 107–117, 1998.
[63] M. R. Henzinger, “Hyperlink Analysis for the Web,” IEEE Internet Computing, vol. 5, no. 1, pp. 45–50, Jan.–Feb. 2001.
[64] X. Chen and A. L. Yuille, “Detecting and Reading Text in Natural Scenes,” in Proc. IEEE CVPR 2004, Washington, DC, pp. 366–373, Jul. 2004.
[65] A. Polesel, G. Ramponi, and V. J. Mathews, “Image Enhancement via Adaptive Unsharp Masking,” IEEE Trans. Image Processing, vol. 9, no. 3, pp. 505–510, Mar. 2000.
[66] K. Fukanaga, Introduction to Statistical Pattern Recognition, New York, Academic Press, 1972.
[67] N. Otsu, “A Threshold Selection Method from Gray-Level Histograms,” IEEE Trans. Systems, Man, and Cybernetics, vol. 9, no. 1, pp. 62–66, Mar. 1979.
[68] O. Trier, A. Jain and T. Taxt, “Feature extraction methods for character recognition - A survey,” Pattern Recognition, vol. 29, no 4, pp. 641–662, 1996.
[69] A. Khotanzad and Y.H. Hong, “Invariant Image Recognition by Zernike Moments,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 12, no 5, pp. 489–497, May 1990.
[70] V. Vapnik, The Nature of Statistical Learning Theory, New York, Springer-Verlag, 1995.
[71] K. Crammer and Y. Singer, “On the Algorithmic Implementation of Multi-class Kernel-based Machines," J. of Machine Learning Research, vol. 2, pp. 265–292, Dec. 2001.
[72] Y. Takahashi, N. Nitta, and N. Babaguchi, “User and Device Adaptation for Sports Video Content,” in Proc. IEEE ICME 2007, Beijing, China, pp. 1051–1054, July 2–5, 2007.
[73] A. Rosenfeld, R. Hummel, and S. Zucker, “Scene Labeling by Relaxation Operations, “ IEEE Trans. Systems, Man, and Cybernetics, vol. 6, no. 6, pp. 420–433, Jun. 1976.
[74] Wikipedia [Online]. Available: http://en.wikipedia.org/wiki/PageRank
[75] T. N. Cornsweet, Visual Perception. New York: Academic Press, 1970.
[76] R. Blake, R. Sekuler, Perception (fifth Ed.). New York: McGraw-Hill, 1994.
[77] B. B. Huang, S. X. Tang, “A Contrast-Sensitive Visible Watermarking Scheme,” IEEE MultiMedia, vol. 13, no. 2, pp. 60–66, April–June 2006.
[78] L. Itti and C. Koch, “A Comparison of Feature Combination Strategies for Saliency-based Visual Attention Systems,” in Proc. SPIE Human Vision and Electronic Imaging IV, San Jose, CA, Jan. 1999.
[79] K.-H. Choi, J.-N. Hwang, “Automatic Creation of a Talking Head From a Video Sequence,” IEEE Trans. on Multimedia, vol. 7, no. 4, pp. 628–637, Aug. 2005.
[80] Z. Zhang, R. K. Srihari, A. Rao, “Face Detection and Its Applications in Intelligent and Focused Image Retrieval,” in Proc. IEEE-ICTAI, pp. 121–128, 1999.
[81] D. Arijon, Grammar of the Film Language. Silman-James, Press LA, 1991.
[82] F. Beaver, Dictionary of Film Terms. Twayne Publishing NY, 1994.
[83] G. Davenport, T.A. Smith, N. Princever, “Cinematic Primitives for Multimedia,” IEEE Computer Graphics and Applications, vol. 11, no. 4, pp. 121–133, 2002.
[84] M. K. Kim, E. Kim, D. Shim, S. L. Jang, and G. Kim, “An Efficient Global Motion Characterization Methods For Image Processing Application,” IEEE Trans. Consum. Electron., vol. 43, no. 4, pp. 1010–1018, Nov. 1997.
[85] H. C. Shih, C. L. Huang, and J.-N. Hwang, “Video Attention Ranking using Visual and Contextual Attention Model for Content-based Sports Videos Mining,” in Proc. IEEE-MMSP07’, Chania, Crete, Greece, Oct. 1–3 2007.
[86] H. C. Shih and C. L. Huang, “A Robust Superimposed Caption Box Content Understanding for Sports Videos,” in Proc. IEEE-MIPR’06, San Diego, Dec. 11–13 2006.
[87] A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, R. Jain, “Content-Based Image Retrieval at the End of the Early Years,” IEEE Trans. on Pattern Anal. Mach. Intell., vol. 22, no. 12, pp. 1349–1380, 2000.
[88] C. Dorai, S. Venkatesh, “Bridging the Semantic Gap with Computational Media Aesthetics,” IEEE MultiMedia, vol. 10, no. 2 pp. 15–17, 2003.
[89] Wikipedia [Online]. Available: http://en.wikipedia.org/wiki/Semantic_gap
[90] C. Kim and J.-N. Hwang, “Fast and Robust Moving Object Segmentation in Video Sequences,” in Proc. IEEE ICIP, Kobe, Japan, Oct. 1999.
[91] C. Kim and J.-N. Hwang, “Fast and Automatic Video Object Segmentation and Tracking for Content-based Applications,” IEEE Trans. on CAS for VT, vol. 12, no. 2, pp. 122–129, Feb. 2002.
[92] C. Kim and J.-N. Hwang, “Video Object Extraction for Object-Oriented Applications,” Journal of VLSI Signal Processing – Systems for Signal, Image, and Video Technology, vol. 29, issue 1–2, pp. 7–22, August, 2001.
[93] M. Pelillo and M. Refice, “Learning compatibility coefficients for relaxation labeling processes,” IEEE Trans. on Pattern Anal. Machine Intell., vol. 16, no. 9, pp. 933–945, Sept. 1994.
[94] R. A. Hummel and S. W. Zucker, “On the foundations of relaxation labeling processes,” IEEE Trans. on Pattern Anal. Machine Intell., vol. 5, no, 3, pp. 267–287, 1983.
[95] B. James, The New Bill James Historical Baseball Abstract. Simon & Schuster, June 2003.
[96] B. James and J. Henzler, Win Shares. STATS Publishing Inc., Mar. 2002.
[97] J. Albert, Teaching Statistics Using Baseball. The Mathematical Association of America, July 2003.
[98] J. Albert, “Using Play-by-Play Baseball Data to Develop a Batter Measure of Batting Performance,” Techn. Report, Bowling Green State University, Sep. 9, 2001.