研究生: |
徐 葳 Hsu, Wei |
---|---|
論文名稱: |
利用遮蔽效應結合抖動技術以提升音訊在頻域量化後之聽感動態 Combining the Masking Effect and Dithering Techniques to Enhance Auditory Dynamics of Quantized Audio Signals in the Frequency Domain |
指導教授: |
劉奕汶
Liu, Yi-Wen |
口試委員: |
劉靖家
Liou, Jing-Jia 羅中泉 Lo, Chung-Chuan 蘇文鈺 Su, Wen-Yu |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2018 |
畢業學年度: | 107 |
語文別: | 中文 |
論文頁數: | 33 |
中文關鍵詞: | 抖動 、量化 、遮蔽效應 |
外文關鍵詞: | dithering, quantize, masking |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在傳統音訊處理的領域中,前人已經證明抖動技術的應用能在訊號從高量化位元數降低為低量化位元數同時,提供音訊更好的聽感品質,在傳統影像處理的領域中也有相同的概念:利用抖動技術讓低量化位元數的圖像有著高量化位元數的視覺動態效果。傳統抖動技術的處理都是在訊號的時域上,本論文提出的抖動方法是應用在訊號時頻圖,我們將音訊的頻譜視為一張圖像,參考圖像抖動技術的概念進行分析處理,與此同時,考量音訊在頻域上的遮蔽特性和人耳聽覺感受,在頻譜上結合心理聲學模型,可以使音檔的主觀聽感動態有所提升。
在心理聲學模型上,本論文參考前人應用在MPEG-1音訊壓縮技術的模組架構加以建立,將音訊在頻域上會被遮蔽而較聽不見的頻帶給予較低的位元量化,同時嘗試加入抖動技術讓音檔的主觀聽感品質增加,實作後再經由受測者和專業混音師聆聽音檔來檢測實驗結果。結果顯示,本論文的方法能在音檔解析度降低的同時,達到提升主觀聽感的效果。
For traditional audio processing, previous research has proved that the application of dithering can provide better sound quality while the signal is reduced from high to low number of bits per sample (bps). The same concept also applies to image processing: The usage of dithering makes the image with low BPS look closer to the image with high BPS, which has a broader dynamic range than the one with low bit resolution. The traditional dithering is a process in the time domain of the signal. In this thesis, we propose a method of dithering in the time-frequency domain. We regard the spectrogram of an audio file as an image, which can be analyzed and processed by the image dithering techniques. At the same time, the frequency domain masking effect for human auditory perception is considered, and we show that, be referring the psychoacoustic model in the spectrum, the subjective auditory dynamics of the audio file can be improved. The psychoacoustic model in this thesis is based on the techniques for MPEG-1 audio compression. By applying lower bit allocation for quantization at the masked frequency bands which are inaudible to humans and adding dithering techniques, the subjective sound quality can be enhanced. After the above-stated implementations, some amateur listeners and a professional mixing engineer were asked to listen to the testing audio files to evaluate the sound quality. Experimental results have shown that the dithering methods in this thesis can effectively improve the subjective auditory dynamics when the audio files are in low BPS.
[1] W. Purgathofer, R. F. Tobler, and M. Geiler, “Forced random dithering: improved threshold matrices for ordered dithering,” IEEE Int. Conf. on Image Processing, vol. 2, pp. 1032-1035, Nov. 1994.
[2] L. Schuchman, “Dither signals and their effect on quantization noise,” IEEE Trans. on Comm. Tech., Vol. 12, no. 4, pp. 162-165, 1964.
[3] S. P. Lipshitz, and J. Vanderkooy, “Digital dither,” J. Audio Eng. Soc., Vol. 34, pp. 1030, Dec. 1986.
[4] J. Vanderkooy, and S. P. Lipshitz, “Dither in digital audio,” J. Audio Eng. Soc., Vol. 35, pp. 966-975, Dec. 1987.
[5] S. P. Lipshitz, R. A. Wannamaker, and J. Vanderkooy, “Quantization and dither: A theoretical survey,” J. Audio Eng. Soc., Vol. 34, pp. 355-375, 1992.
[6] S. P. Lipshitz, R. A. Wannamaker, and J. Vanderkooy, “Minimally audible noise shaping,” J. Audio Eng. Soc., Vol 39, no. 11, pp. 836-852, 1991.
[7] W. Verhelst, and D. De Koning, “Least squares theory and design of optimal noise shaping filters,” J. Audio Eng. Soc., pp. 216-222, June. 2002.
[8] W. Verhelst, and D. De Koning, “Noise shaping filter design for minimally audible signal requantization,” IEEE Workshop Applications of Signal Processing to Audio and Acoustics, pp. 147-150, 2001.
[9] H.-F. Liu, C. Zhang, and R.-F. Liang, “Optimization of masking expansion algorithm in psychoacoustic models,” IEEE Int. Symp. Intell. Info. Proc. and Trusted Computing (IPTC), pp. 161-164, 2011.
[10] T. Painter, and A. Spanias, “Perceptual coding of digital audio,” Proceedings of the IEEE, vol. 88, no. 4, pp. 451-515, 2000.
[11] E. Zwicker, and H. Fastl, “Psychoacoustics Facts and Models,” Physics Today, vol. 22, 1990.
[12] E. Terhardt, “Calculating Virtual Pitch,” Hear. Res. Vol. 1, pp. 155-182, 1979.
[13] M. Bosi, and E. R. Goldberg, Introduction to Digital Audio Coding and Standards, Kluwer Academic Publishers, 2002.
[14] M. Schroeder, B. S. Atal, and J. L. Hall, “optimizing digital speech coders by exploiting masking properties of the human ear,” J. Audio Eng. Soc. Am. pp. 1647-1652, Dec. 1979.
[15] ITU-R, “Method for objective measurements of perceived audio quality,” Recommendation BS.1387, 1998.
[16] M. Salovarda, I. Bolkovac, and H. Domitrovic, “Estimating perceptual audio system quality using PEAQ algorithm,” IEEE Int. Conf. Applied Electromagnetics and Communications, pp. 1-4, Oct. 2005.
[17] P. Kabal, “An examination and interpretation of ITU-R BS. 1387: Perceptual evaluation of audio quality,” TSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, pp. 1-89, 2002.
[18] H. Fletcher, “Auditory patterns,” Reviews of Modern Physics, vol. 12, pp. 47, 1940.
[19] D. Fielder, "Evaluation of the audible distortion and noise produced by digital audio converters", J. Audio Eng. Soc. Vol. 35, no. 8, pp. 517-535, Aug. 1987.
[20] D. Pan, “A Tutorial on MPEG/Audio Compression,” IEEE Multimedia, Vol. 2, no. 2, Summer, 1995.
[21] M. Hans, and R. W. Schafer, “Lossless compression of digital audio,” IEEE Signal processing magazine, vol. 18, no. 4, pp. 21-32, 2001.
[22] S. Nawab, T. Quatieri, and J. Lim, “Signal reconstruction from short-time Fourier transform magnitude,” IEEE Trans. on Acoustics, Speech, and Signal Processing, vol.31, no.4, pp. 986-998, 1983.
[23] J. F. Box, “Guinness, Gosset, Fisher, and small samples,” Statistical science, no. 2, pp. 45-52, 1987.
[24] K. Krish., and J. Thomson, “A more powerful test for comparing two Poisson means,” Journal of Statistical Planning and Inference, vol. 119, no. 1, pp. 23-35, 2004.