研究生: |
洪國翔 Hong, Guo-Xiang |
---|---|
論文名稱: |
對影片註解的最佳化訓練集選擇 Optimal Training Set Selection for Video Annotation |
指導教授: |
黃仲凌
Huang, Chung-Lin |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2009 |
畢業學年度: | 97 |
語文別: | 英文 |
論文頁數: | 57 |
中文關鍵詞: | 影片註解 |
外文關鍵詞: | Video annotation, Training set selection |
相關次數: | 點閱:44 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近來,越來越多的多媒體資料產生在我們生活中,然而,沒有適當的處理和搜尋技術。許多人對多媒體技術感興趣,例如影片註解和影片檢索。其中,影片註解是將預先定義的義概念,根據影片的內容加以註解,對於影片搜尋是重要的步驟。大多數的學者關注在如果解決低階特徵值和高階概念之間的語義隔閡。
大多數基於學習的影片語義分析方法希望得好的語義模型,然而需要大量的訓練樣本才能達到好的效能。然而,註解大量的資料是非常浪費時間和人力,有效的訓練樣本的取得不易。此類語意模型的好壞,應該取決於訓練資料的分佈,而不是資料庫的大小。一般訓練集的選擇是使用隨機選擇或是選部份的影片資料,忽略訓練集的相似和廣泛特徵,並不能達到有效的結果。
在本篇論文裡,我們提出了幾種方式來做訓練資料的選取和減少使用者介入,有基於分群選擇、空間分散性、時間分散性與基於散佈選擇。藉由這些方法,我們希望用影片資料的空間與時間分佈來建立少量又有效的訓練集。因此,如果選擇的訓練資料能代表整體影片特徵,當訓練集大小遠小於原始影片資料,分類的效能是好的。在實驗中,我們將影片的語意分成五類:人、鄉村風景、城市風景、地圖與其它。實驗結果證實,這些方法對訓練集的選擇是有效的,效能勝過隨機選擇。
Most learning-based video semantic analysis methods hope to obtain the good semantic model that require a large training set to achieve good performances. However, annotating a large video is labor-intensive and the training data set collection is not easy either. Generally, most of training set selection schemes adopted a random selection or select parts of the video data, they neglect similar and extensive characteristics of the training set.
In this thesis, we propose several different methods to construct the training set and reduce user involvement. They are clustering-based, spatial dispersiveness,
temporal dispersiveness, and sample-based. Using those selection schemes, we hope to construct a small size and effective training set by using spatial and temporal
distribution, clustering information of the whole video data. Therefore if the selected training data can represent the characteristic of the whole video data, the classification performance will be better even when the size of the training set is smaller than that of
the whole video data.
We can choose the best samples for training a semantic model and use SVM to classify the category of each sample. This thesis intends to classify the shots of the semantic into the five categories: person, landscape, cityscape, map and others. Experimental results show that these methods are effective for training set selection in video annotation, and outperform random selection.
References
[1] Yong Ge, Richang Hong, Zhiwei Gu, Rong Zhang, Xiuqing Wu, “A Probability Model for Image Annotation, ” IEEE ICME, 2007.
[2] Jeon, J., Lavrenko, V., Manmatha, R., “Automatic Image Annotation and Retrieval Using Cross-Media Relevance Model,” ACM SIGIR, 2003.
[3] Jing Liu, Mingjing Li, Zhiwei Li, Wei-Ying Ma, “Dual Cross-Media Relevance Model for Image Annotation, ” ACM Multimedia, 2007.
[4] J. Fan, H. Luo, X. Lin, “Semantic Video Classification by Integrating Flexible Mixture Model with Adaptive EM Algorithm,” ACM, pages 9-16, Nov, 2003.
[5] D. Zhong, S.-F. Chang, “Structure Analysis of Sports Video Using Domain Models,” IEEE ICME, 2001.
[6] J. He, M. Li, H.-J. Zhang, H. Tong, C. Zhang, “Manifold-Ranking Based Image Retrieval,” ACM Multimedia, Oct, 2004.
[7] J. Tang, X.-S. Hua, G..-J. Qi, Z. Gu, X. Wu, “Beyond Accuracy: Typicality Ranking for Video Annotation,” IEEE ICME, 2007.
[8] Jinhui Tang, Xian-Sheng Hua, Guo-Jun Qi, Yan Song, Xiuqing Wu, “Video Annotation Based on Kernel Linear Neighborhood Propagation, ” IEEE TMM, 2008.
[9] M. Wang, X.-S. Hua, Y. Song, J. Tang, L.-R. Dai, “Multi-Concept Multi-Modality Active Learning for Interactive Video Annotation,” IEEE ICSC, 2007.
[10] Y. Song, X.-S. Hua, L.-R. Dai, M. Wang, “Semi-Automatic Video Annotation Based on Active Learning with Multiple Complementary Predictors,” ACM MIR, Nov, 2005.
[11] Guo-Jun QI, Yan SONG, Xian-Sheng HUA, Li-Rong DAI, Hong-Jiang ZHANG, “Video Annotation by Active Learning and Clustering Tuning,” IEEE CVPRW, Oct, 2006.
[12] J. Tang, Y. Song, X.-S. Hua, T. Mei, X. Wu, “To Construct Optimal Training Set For Video Annotation,” ACM Multimedia, 2006.
[13] Guidelines for the TRECVID 2003 Evaluation,
http://www-nlpir.nist.gov/projects/tv2003/
[14] XML (Extensible Markup Language), http://en.wikipedia.org/wiki/XML
[15] XML Note, http://irw.ncut.edu.tw/peterju/xml.html
[16] John R. Smith, Shih-Fu Chang “Automated Image Retrieval Using Color and Texture,” IEEE PAMI, 1996.
[17] M. Stricker, M.Orengo, “Similarity of Color Images,” SPIE: Storage Retrieval
Image and Video Database III, vol. 2420, pp. 381–392, 1995.
[18] Gonzalez Woods, “Digital Image Processing 2/e,” Prentice Hall.
[19] Mari Partio, Bogdan Cramariuc, Moncef Gabbouj, Ari Visa, “Rock Texture Retrieval Using Gray Level Co-occurrence Matrix,”.
[20] M. M. Mokji, S.A.R. Abu Bakar, “Gray Level Co-Occurrence Matrix Computation Based On Haar Wavelet,” IEEE CGIV, 2007.
[21] Jun Yang, Alexander G. Hauptmann, “Exploring Temporal Consistency for Video Analysis and Retrieval,” ACM MIR, 2006.
[22] Richard O. Duda, Peter E. Hart, David G. Stork, “Pattern Classification,” Wiley-Interscience.
[23] Chih-Wei Hsu, Chih-Jen Lin, “A Comparison of Methods for Multi-class Support Vector Machines,” IEEE TNN, 2002.
[24] C.-C. Chang, C.-J. Lin, “LIBSVM: a library for support vector machines, http://www.csie.ntu.edu.tw/~cjlin/libsvm/