對影片註解的最佳化訓練集選擇｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	洪國翔 Hong, Guo-Xiang
論文名稱：	對影片註解的最佳化訓練集選擇 Optimal Training Set Selection for Video Annotation
指導教授：	黃仲凌 Huang, Chung-Lin
口試委員:
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2009
畢業學年度：	97
語文別：	英文
論文頁數：	57
中文關鍵詞：	影片註解
外文關鍵詞：	Video annotation, Training set selection
相關次數：	點閱：44 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

近來，越來越多的多媒體資料產生在我們生活中，然而，沒有適當的處理和搜尋技術。許多人對多媒體技術感興趣，例如影片註解和影片檢索。其中，影片註解是將預先定義的義概念，根據影片的內容加以註解，對於影片搜尋是重要的步驟。大多數的學者關注在如果解決低階特徵值和高階概念之間的語義隔閡。
大多數基於學習的影片語義分析方法希望得好的語義模型，然而需要大量的訓練樣本才能達到好的效能。然而，註解大量的資料是非常浪費時間和人力，有效的訓練樣本的取得不易。此類語意模型的好壞，應該取決於訓練資料的分佈，而不是資料庫的大小。一般訓練集的選擇是使用隨機選擇或是選部份的影片資料，忽略訓練集的相似和廣泛特徵，並不能達到有效的結果。
在本篇論文裡，我們提出了幾種方式來做訓練資料的選取和減少使用者介入，有基於分群選擇、空間分散性、時間分散性與基於散佈選擇。藉由這些方法，我們希望用影片資料的空間與時間分佈來建立少量又有效的訓練集。因此，如果選擇的訓練資料能代表整體影片特徵，當訓練集大小遠小於原始影片資料，分類的效能是好的。在實驗中，我們將影片的語意分成五類：人、鄉村風景、城市風景、地圖與其它。實驗結果證實，這些方法對訓練集的選擇是有效的，效能勝過隨機選擇。

Most learning-based video semantic analysis methods hope to obtain the good semantic model that require a large training set to achieve good performances. However, annotating a large video is labor-intensive and the training data set collection is not easy either. Generally, most of training set selection schemes adopted a random selection or select parts of the video data, they neglect similar and extensive characteristics of the training set.
In this thesis, we propose several different methods to construct the training set and reduce user involvement. They are clustering-based, spatial dispersiveness,
temporal dispersiveness, and sample-based. Using those selection schemes, we hope to construct a small size and effective training set by using spatial and temporal
distribution, clustering information of the whole video data. Therefore if the selected training data can represent the characteristic of the whole video data, the classification performance will be better even when the size of the training set is smaller than that of
the whole video data.
We can choose the best samples for training a semantic model and use SVM to classify the category of each sample. This thesis intends to classify the shots of the semantic into the five categories: person, landscape, cityscape, map and others. Experimental results show that these methods are effective for training set selection in video annotation, and outperform random selection.

Contents


Abstract...........................................................................................................................i
Contents.........................................................................................................................ii
List of Figures...............................................................................................................iv
List of Tables................................................................................................................vi

Chapter 1 Introduction…………………………………………………1
1.1 Motivation........................................................................................................1
1.2 Related Works..................................................................................................2
1.3 System Overview.............................................................................................4
1.4 Organization of Thesis.....................................................................................7

Chapter 2 Pre-processing and Feature Extraction................................8
2.1 Pre-processing..................................................................................................8
2.1.1 Introduction of Shot Change Detection ................................................8
2.1.2 Introduction of Key-frames Detection................................................10
2.1.3 Description XML Basic Format..........................................................12
2.2 Feature Extraction..........................................................................................13
2.2.1 Color Histogram Descriptor................................................................14
2.2.2 Color Moment Descriptor....................................................................15
2.2.3 Edge Histogram Descriptor.................................................................16
2.2.4 Texture Descriptor...............................................................................19

Chapter 3 Sample Selection Scheme and
Training Classification Model.............................................21
3.1 Optimal Training Set Construction................................................................21
3.2 Propose Methods............................................................................................21
3.2.1 Clustering-Based Selection.................................................................25
3.2.2 Spatial Dispersiveness….....................................................................25
3.2.3 Temporal Dispersiveness….................................................................27
3.2.4 Dispersion-Based Selection.................................................................29
3.2.5 Hybrid samples selection method.......................................................30
3.2.6 Objective : Random Selection、Select All Samples.............................31
3.2 Processing Scheme.........................................................................................32
3.3 Support Vector Machine.................................................................................41

Chapter 4 Experimental Results...........................................................43
4.1 Experimental Data and Parameter Settings....................................................43
4.2 Experimental Results......................................................................................45

Chapter 5 Conclusion and Future Works............................................54

References...............................................................................................55

                                

References

[1] Yong Ge, Richang Hong, Zhiwei Gu, Rong Zhang, Xiuqing Wu, “A Probability Model for Image Annotation, ” IEEE ICME, 2007.
[2] Jeon, J., Lavrenko, V., Manmatha, R., “Automatic Image Annotation and Retrieval Using Cross-Media Relevance Model,” ACM SIGIR, 2003.
[3] Jing Liu, Mingjing Li, Zhiwei Li, Wei-Ying Ma, “Dual Cross-Media Relevance Model for Image Annotation, ” ACM Multimedia, 2007.
[4] J. Fan, H. Luo, X. Lin, “Semantic Video Classification by Integrating Flexible Mixture Model with Adaptive EM Algorithm,” ACM, pages 9-16, Nov, 2003.
[5] D. Zhong, S.-F. Chang, “Structure Analysis of Sports Video Using Domain Models,” IEEE ICME, 2001.
[6] J. He, M. Li, H.-J. Zhang, H. Tong, C. Zhang, “Manifold-Ranking Based Image Retrieval,” ACM Multimedia, Oct, 2004.
[7] J. Tang, X.-S. Hua, G..-J. Qi, Z. Gu, X. Wu, “Beyond Accuracy: Typicality Ranking for Video Annotation,” IEEE ICME, 2007.
[8] Jinhui Tang, Xian-Sheng Hua, Guo-Jun Qi, Yan Song, Xiuqing Wu, “Video Annotation Based on Kernel Linear Neighborhood Propagation, ” IEEE TMM, 2008.
[9] M. Wang, X.-S. Hua, Y. Song, J. Tang, L.-R. Dai, “Multi-Concept Multi-Modality Active Learning for Interactive Video Annotation,” IEEE ICSC, 2007.
[10] Y. Song, X.-S. Hua, L.-R. Dai, M. Wang, “Semi-Automatic Video Annotation Based on Active Learning with Multiple Complementary Predictors,” ACM MIR, Nov, 2005.
[11] Guo-Jun QI, Yan SONG, Xian-Sheng HUA, Li-Rong DAI, Hong-Jiang ZHANG, “Video Annotation by Active Learning and Clustering Tuning,” IEEE CVPRW, Oct, 2006.
[12] J. Tang, Y. Song, X.-S. Hua, T. Mei, X. Wu, “To Construct Optimal Training Set For Video Annotation,” ACM Multimedia, 2006.
[13] Guidelines for the TRECVID 2003 Evaluation,
http://www-nlpir.nist.gov/projects/tv2003/
[14] XML (Extensible Markup Language), http://en.wikipedia.org/wiki/XML
[15] XML Note, http://irw.ncut.edu.tw/peterju/xml.html
[16] John R. Smith, Shih-Fu Chang “Automated Image Retrieval Using Color and Texture,” IEEE PAMI, 1996.
[17] M. Stricker, M.Orengo, “Similarity of Color Images,” SPIE: Storage Retrieval
Image and Video Database III, vol. 2420, pp. 381–392, 1995.
[18] Gonzalez Woods, “Digital Image Processing 2/e,” Prentice Hall.
[19] Mari Partio, Bogdan Cramariuc, Moncef Gabbouj, Ari Visa, “Rock Texture Retrieval Using Gray Level Co-occurrence Matrix,”.
[20] M. M. Mokji, S.A.R. Abu Bakar, “Gray Level Co-Occurrence Matrix Computation Based On Haar Wavelet,” IEEE CGIV, 2007.
[21] Jun Yang, Alexander G. Hauptmann, “Exploring Temporal Consistency for Video Analysis and Retrieval,” ACM MIR, 2006.
[22] Richard O. Duda, Peter E. Hart, David G. Stork, “Pattern Classification,” Wiley-Interscience.
[23] Chih-Wei Hsu, Chih-Jen Lin, “A Comparison of Methods for Multi-class Support Vector Machines,” IEEE TNN, 2002.
[24] C.-C. Chang, C.-J. Lin, “LIBSVM: a library for support vector machines, http://www.csie.ntu.edu.tw/~cjlin/libsvm/

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文