研究生: |
王瑞鈞 Wang, Ruei-Jiun |
---|---|
論文名稱: |
基於景移動的顯著物體預測的JND調整影像壓縮法 Scene Motion based Saliency Prediction for JND Adjusted Video Compression |
指導教授: |
邱瀞德
Chiu, Ching-Te |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2010 |
畢業學年度: | 98 |
語文別: | 英文 |
論文頁數: | 57 |
中文關鍵詞: | H.264 、影像品質 、影像壓縮 、視覺注意力 、受注目的 、感知模組 |
外文關鍵詞: | JND, saliency, H.264, compression, video quality, visual attention |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於網路多媒體的發達,一個有效率的壓縮演算法在處理資料壓縮和傳輸時,不只要能處理靜態的多餘資訊達到壓縮的效果,同時也要能根據使用者實際觀賞多媒體檔案時真正會注意的區域來進行處理,並排除不會注意的部份。視覺注意力模組或是視覺感知模組就是用來判斷哪些內容是屬於人無法察覺的或不會注意的部份。大多數的視覺注意力模組都是集中在靜態的元素分析並混合了少量地動態資訊像是移動向量但是通常都缺乏感知模組。利用場景的移動方向來判斷受注意的區域是個很有效的辦法,因為鏡頭的移動通常會受到拍攝者所關注的物體而影響,而這通常也會影響觀賞者。在這篇論文中,我們設計了一個根據鏡頭的移動和局部的物體移動來達到預測受注目的物體並追蹤的演算法。同時後根據"恰可察覺不同"模組來當成一個判斷壓縮程度是否過當或不足的標準,並利用它來對量化參數進行修正。為了測試我們這演算法的效果,設計了三個實驗,分別用來測驗對受注目的物體預測準確度,影像壓縮率和人對影像品質的接受度。實驗結果顯示,這演算法居有不錯的預測率。同時相對於JM14.0的H.264編碼器能達到8-73%的壓縮率。而最後一個實驗中,所邀請的受測者也無法判別壓縮過的影片和原影片的不同之處。
Due to the popularity of online repositories, an efficient compression algorithm removes not only statistical redundancy but also the psychovisual redundancies without perceptual degradation is important for transmission and storage. Visual attention model or visual sensitivity model are proposed for removing psychovisual redundancies. Most visual attention models are based on spatial component analysis and only few adopt motion vectors for temporal component analysis which lacks of perceptual information. Scene motion is a powerful feature for identification of regions of interest (ROIs) since it indicates the producer’s interests in the scene and can also attract the viewer’s attention. In addition to the global scene motion, the motion of salient object in the video stream facilitates tracing salient object locally. In this paper, we propose a scene motion and saliency motion based visual attention model to effectively trace the movement of salient regions. We propose a frame that incorporates the obtained motion saliency map with Just Notice Difference (JND) as a visual measure to determine the quantization parameters at the macro-block level. To evaluate the performance of our proposed framework, three experiments have been executed for verifying the accuracy of saliency motion prediction, the video compression rate and the visual quality assessment. Experiment results show our proposed framework has higher saliency prediction accuracy than previous approaches in term of Receiver Operating Characteristics (ROC). Our proposed framework achieves 8% up to 73% bit rate reduction compared with the H.264 in version J14.0 and the bit rate reduction is three times higher compared with the previous method. From the visual quality assessment experiments, participants cannot distinguish the difference between our compressed video and the original video streams.
[1] L. Itti, “Automatic Foveation for Video Compression Using a Neurobiological Model of Visual Attention,” IEEE Transactions on Image Processing, vol.13, pp.1304-1318, 2004.
[2] S. Lee, M-S. Pattichis, and A-C. Bovik, “Foveated Video Compression with Optimal Rate Control,” IEEE Transactions on Image Processing, vol.10, pp.977-992, 2001.
[3] Z. Wang, L. Lu, A-C. Bovik, “Foveation Scalable Video Coding with Automatic Fixation Selection,” IEEE Transactions on Image Processing, vol.12, pp243-254, 2003.
[4] S. Lee, M-S. Pattichis, A-C. Bovik, “Foveated Video Quality Assessment,” Multimedia, IEEE Transactions on, vol.4, pp.129-132, 2002.
[5] H. Yu, Z. Lin, F. Pan, “Applications and Improvement of H.264 in Medical Video Compression,” IEEE Transactions on Circuits and Systems, vol.52 pp.2707-2716, 2005
[6] Z. Wei, K-N. Ngan, “Spatio-Temporal Just Noticeable Distortion Profile for Grey Scale Image/Video in DCT Domain,” IEEE Transactions on Circuits and Systems for Video Technology, vol.19, pp337-346, 2009.
[7] Z. Lu, W. Lin, X. Yang, E-P. Ong, S. Yao, “Modeling Visual Attention’s Modulatory Aftereffects on Visual Sensitivity and Quality Evaluation,” IEEE Transactions on Image Processing, vol.14, pp.1928-1942, 2005.
[8] C. Guo, L. Zhang, “A Novel Multi-resolution Spatiotemporal Saliency Detection Model and Its Applications in Image and Video Compression,” IEEE Transactions on Image Processing, vol.19, pp185-198, 2010.
[9] L. Itti, C. Koch, E. Niebur, “A Model of Saliency-Based Visual Attention for Rapid Scene Analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.20, pp.1254-1259, 1998.
[10] V. Gopalakrishnan, Y. Hu, and D. Rajan, “Salient Region Detection by Modeling Distributions of Color and Orientation,” IEEE Transactions on Multimedia, vol. 11, Issue 5, pp. 892-905, 2009.
[11] T. Liu, J. Sun, N-N. Zheng, X. Tang, H-Y. Shum, “Learning to Detect A Salient Object,” IEEE Transactions on Computer Vision and Pattern Recognition,2007.CVPR’07, pp 1-8, 2007.
[12] H-J. Seo, P. Milanfar, “Nonparametric Bottom-Up Saliency Detection by Self-Resemblance,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp.20-25, 2009.
[13] J. Redi, H. Liu, P. Gastaldo, R. Zunino, “How to apply spatial saliency into objective metrics for JPEG compressed images?” IEEE International Conference on Image Processing (ICIP), pp.961-964, 2009.
[14] C-H. Chou, “A perceptual tuned sub-band image coder based on the measure of just-noticeable-distortion profile,” IEEE International Symposium on Information Theory, 1994.
[15] C-H. Chou, C-W. Chen, “A perceptually optimized 3-D sub-band codec for video communication over wireless channels,” IEEE Transactions on Circuits and Systems for Video Technology, vol.6, pp.143-156, 1996.
[16] I-J. Chin, T. Berger, “A software-only video codec using pixel-wise conditional differential replenishment and perceptual enhancements,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 9, pp. 438-450, 1999.
[17]R-J. Safranek, J-D. Johnston, “A perceptually tuned sub-band image coder with image dependent quantization and post-quantization data compression,” in Proc. IEEE Int. Conf., Acoust., Speech, Signal Process., 1989, vol. 3, pp. 1945–1948.
[18] G. Abdollahian, C-M. Taskiran, Z. Pizlo, E- J. Delp “Camera Motion-Based Analysis of User Generated Video,” IEEE Transactions on Multimedia, vol. 12, no. 1, 2010.
[19] X. Qiu, S. Jiang, H. Liu, Q. Huang, L. Cao, “SPATIAL-TEMPORAL ATTENTION ANALYSIS FOR HOME VIDEO,” IEEE International Conference on Multimedia and Expo, pp. 1517 – 1520, 2008.
[20] D-H. Kelly, “Motion and vision. II. Stabilized spatio-temporal threshold surface,” J. Opt. Soc. Amer., vol. 69, pp. 1340–1349, 1979.
[21] S. Daly, “Engineering observations from spatiovelocity and spatiotemporal visual models,” in Proc. SPIE, 1998, vol. 3299, pp. 180–191.
[22] K-N. Ngan, K-S. Leong, and H. Singh, “Adaptive cosine transform coding of images in perceptual domain,” IEEE Trans. Acoust., Speech, Signal Process., vol. 37, no. 11, pp. 1743–1750, 1989.
[23] N-B. Nill, “A visual model weighted cosine transform for image compression and quality assessment,” IEEE Transactions on Communications, vol. 33, no.3, pp. 551–557, 1985.
[24] A. Ahumada, H. Peterson, “Luminance-model-based DCT quantization for color image compression,” in Proc. SPIE Human Vision, Visual Process, Digit. Display III, vol. 1666, pp. 365–374, 1992.
[25] H-A. Peterson, A-J. Ahumada, A-B.Watson, “Improved detection model for DCT coefficient quantization,” in Proc. SPIE Human Vision Visual Process Digit. Display IV, vol. 1913, pp. 191–201, 1993.
[26] A-N. Nettravali, B-G. Haskell, “Digital pictures: Representation and compression,” 1988.
[27] J. Canny, “A computational approach to edge detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-8, no. 6, pp. 679–698, 1986.
[28] X. Zhang, W-S. Lin, P. Xue, “Improved estimation for just-noticeable visual distortion,” Signal Process., vol. 85, pp. 795–808, 2005.
[29] I. Hontsch, L-J. Karam, “Adaptive image coding with perceptual distortion control,” IEEE Trans. Image Process., vol. 11, no. 3, pp. 213–222, 2002.
[30] J-M. Foley, G-M. Boynton, “New model of human luminance pattern vision mechanisms: Analysis of the effects of pattern orientation, spatial phase and temporal frequency,” in Proc. SPIE Computational Vision Based Neurobiol., vol. 2054, pp. 32–42, 1994.
[31] J-G. Robson, “Spatial and temporal contrast sensitivity functions of the visual system,” J. Opt. Soc. Amer., vol. 56, pp. 1141–1142, 1966.
[32] T-H. Wu, S-Y Chien, “Perception-Aware Video Encoder:Hardware Architecture Design of Bio-Inspired Human Eyes Perception Evaluation Engine for H.264 Video Encoder,” NTU Master Thesis, 2009.
[33] N-D. Bruce, J-K. Tsotsos, “Saliency based on information maximization,” Neural Information Processing Systems, 2005.
[34] D. Gao, V. Mahadevan, N. Vasconcelos, “The discriminant centersurround hypothesis for bottom-up saliency,” Neural Information Processing Systems, 2007.
[35] J. Harel, C. Koch, P. Perona, “Graph-based visual saliency,” Neural Information Processing Systems, 2006.
[36] D.Walther, C. Koch, “Modeling attention to salient proto-objects,” Neural Netw., vol. 19, pp. 1395–1407, 2006.
[37] C-W. Tang, “Spatiotemporal Visual Considerations for Video Coding,” IEEE Transactions on Multimedia., vol. 9, pp. 231-238, 2007.