研究生: |
曾思鈞 Ssu-Chun Tseng |
---|---|
論文名稱: |
在低位元率聲音編碼器上一個基於多重描述編碼的可延展與強健的方法 A Scalable and Robust Scheme Based on the Multiple-Description Coding for Low Bit-Rate Speech Coders |
指導教授: |
鍾葉青
Yeh-Ching Chung |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2005 |
畢業學年度: | 93 |
語文別: | 英文 |
論文頁數: | 30 |
中文關鍵詞: | 低位元率聲音編碼器 、可延展編碼 、多重描述編碼 |
外文關鍵詞: | low bit-rate speech coder, scalable coding, multiple-description coding |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著網際網路的快速發展,在封包網路上傳送聲音資料受到很多的注意。在不可靠的網路上傳送即時資料時,封包遺失是無可避免,而延遲的封包更是增加了這封包的遺失率。這些遺失及延遲的封包會使得聲音播放的品質下降。而且若是使用低位元率的聲音編碼器來編碼,這個下降品質會影響更大。近年來網路也越來越異質,所以可延展的編碼在網路環境中是很重要的。在這裡,我們為低位元率聲音編碼器提出了一個基於多重描述編碼的方法,它可以使得編碼器可延展以及可以隱藏遺失的封包。我們將剩餘訊號分為高頻及低頻兩部分,然後分別使用不同的方法去產生高頻及低頻的增加層。基於取樣的多重描述編碼跟基於線譜對的多重描述編碼都會被我們所用來隱藏遺失的封包。基於取樣的多重描述編碼將會被使用在高頻的增加層,而基於線譜對的多重描述編碼則會被用在核心編碼層跟低頻的增加層。我們實作我們的方法到G.723的標準上。結果顯示出我們提出的方法在不同的網路環境下,都可以提供可階層的功能及好的播放品質。
Voice transmissions over packet networks have attracted attention recently with the quick growth of the Internet. In real-time data transmission over unreliable networks, packet losses are inevitable and the delayed packets further increase the packet loss rate. The lost and delayed packets degrade the playback quality. The degradation of the quality is more perceptible if a low bit-rate speech coder is adopted. Networks are more and more heterogeneous recently. Therefore, the scalable coding is important in network environments. In this thesis, we proposed a robust and scalable scheme based on the MDC schemes for low bit-rate speech coders. We use a split-band approach that decomposes the residual signal into two frequency bands, low-band and high-band. The two bands use different methods to generate low-band and high-band enhancement layers. Both of the sample-based multiple description coding (MDC) scheme and the LSP-based MDC scheme are used for concealing the lost packets. We adopt the sample-based MDC scheme for the high-band enhancement layer and the LSP-based MDC scheme for the core and low-band enhancement layers. We have implemented the proposed scheme on ITU-T G.723.1 standard. The experimental results show that the proposed scheme can provide good quality and scalable capability under different conditions of bit rates and packet loss rates.
[1] Z. G. Chen, S. M. Tan, R. H. Campbell, and Y. Li, “Real Time Video and Audio in the World Wide Web,” World Wide Web Journal, Vol. 1, January 1996.
[2] A. Choi and A. Constantinides, “Effects of Packet Loss on 3 Toll Quality Speech Coders,” IEE National Conference on Telecommunications, 1989, pp. 380-385.
[3] L. DaSilva, D. Petr, and V. Frost, “A Class-Oriented Replacement Technique for Lost Speech Packets,” Proceeding of the Eighth Annual Joint Conference of the IEEE Computer and Communications Societies, Vol. 3, 1989, pp. 1098-1105.
[4] B. J. Dempsey and Y. Zhang, “Destination Buffering for Low-Bandwidth Audio Transmission Using Redundancy-Based Error Control,” Proceedings of 21st IEEE Local Computer Networks Conference, October 1996, pp. 345-355.
[5] N. Erdol, C. Castelluccia, and A. Zilouchian, “Recovery of Missing Speech Packets Using the Short-Time Energy and Zero-Crossing Measurements,” IEEE Transaction on Speech and Audio Processing, Vol. 1, July 1993, pp. 295-303.
[6] International Telecommunication Union, “ITU-T G.114 – One-way Transmission Time,” 1996.
[7] International Telecommunication Union, “ITU-T G.722 – 7 kHz Audio-Coding Within 64kbit/s,” 1988.
[8] International Telecommunication Union, “ITU-T G.722.1 – Coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss,” 1999.
[9] International Telecommunication Union, “ITU-T G.723.1 – Dual Rate Speech Coder for Multimedia Communications Transmitting at 5.3 and 6.3 kbits/s,” 1996.
[10] International Telecommunication Union, “ITU-T G.726 – 40, 32, 24, 16 kbit/s adaptive differential pulse code modulation,” 1990.
[11] International Telecommunication Union, “ITU-T G.727 – 5-, 4-, 3-, 2-bit/sample Embedded Adaptive Differential Pulse Code Modulation (ADPCM),” 1990.
[12] International Telecommunication Union, “ITU-T G.728 – Coding of speech at 16 kbit/s using low-delay code excited linear prediction,” 1992.
[13] International Telecommunication Union, “ITU-T G.729 – Coding of Speech at 8 kbit/s Using Conjugate-Structure Algebraic-Code-Excited Linear-Prediction,” 1996.
[14] International Telecommunication Union, “ITU-T G.729 Annex E – 11.8 kbit/s CS-ACELP speech coding algorithm,” 1998.
[15] International Telecommunication Union, “ITU-T P.862 – Perceptual Evaluation of Speech Quality, an Objective Method for End-to-End Speech Quality Assessment of Narrowband Telephone Networks and Speech Coders,” 2001.
[16] N. S. Jayant and S. W. Christensen, “Effects of Packet Losses in Waveform Coded Speech and Improvements Due to Odd-Even Sample-Interpolation Procedure,” IEEE Transaction on Communications, Vol. 29, February 1981, pp. 101-110.
[17] N. S. Jayant and P. Noll, Digital Coding of Waveform , Prentice Hall, 1984.
[18] S. K. Jung, K. T. Kim, and H. G. Kang, “A Bit-Rate/Bandwidth Scalable Speech Coder Based on ITU-T G.723.1 Standard,” IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, May 2004, pp. 285-288.
[19] D. Lin, “Loss Concealments for Low Bit-Rate Packet Voice,” Ph.D. Thesis, University of Illinois, 2002.
[20] Y. Mahieux and J. P. Petit, “High-Quality Audio Transform Coding at 64 kbps,” IEEE Transaction on Communications, Vol. 42, No. 11, Nov. 1994, pp. 3010-3019.
[21] H. S. Malvar, Signal Processing with Lapped Transforms, Artech House, 1992.
[22] J. D. Markel and A. H. Gray, Linear Prediction of Speech, Springer-Verlag, New York, 1976.
[23] J. Princen and A. Bradly, “Analysis/synthesis Filter Bank Design Based on Time-Domain Aliasing Cancellation,” IEEE Transaction on Acoustic, Speech, and Signal Processing, Vol. 34, Oct. 1986, pp. 1153-1161.
[24] T. F. Quatieri, Discrete-Time Speech Signal processing, Prentice Hall, 2002.
[25] M. A. Ramirez and Max Gerken, “A Multistage Search of Algebraic CELP Codebooks,” IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, 1999, pp. 17-20.
[26] N. Shacham, “Packet Recovery and Error Correction in High-Speed Wide-Area Networks,” IEEE Military Communications Conference, Vol. 2, October 1989, pp. 551-557.
[27] N. Shacham and P. McKenney, “Packet Recovery in High-Speed Networks Using Coding and Buffer Management,” Proceedings of IEEE INFOCOM, May 1990, pp. 124-131.
[28] J. Suzuki and M. Taka, “Missing Packet Recovery Techniques for Low Bit-Rates Coded Speech,” IEEE Journal on Selected Areas in Communications, Vol. 7, June 1989, pp. 707-717.
[29] J. Tang, “Evaluation of Double Sided Periodic Substitution Method for Recovering Missing Speech in Packet Voice Communications,” Proceedings of Tenth Annual International Phoenix Conference on Computers and Communications, March 1991, pp. 454-458.
[30] R. C. F. Tucker and J. E. Flood, “Optimizing the Performance of Packet-Switch
Speech,” IEEE Conference on Digital Processing of Signals in Communications, April 1985, pp. 227-234.
[31] B. W. Wah and D. Lin, ”LSP-Based Multiple-Description Coding for Real-Time Low Bit-Rate Voice over IP,” IEEE Transaction on Multimedia, Vol. 7, No. 1, Feb. 2005, pp. 167-178.