研究生: |
蔡青紘 Tsai, Morris C. -H. |
---|---|
論文名稱: |
基於排列組合的創新聲音模糊技術之研究 Innovative Recoverable Audio Mosaics Through Permutation-Based Techniques |
指導教授: |
黃之浩
Huang, Scott C. H. |
口試委員: |
鍾偉和
Chung, Wei-Ho 管延城 Kuan, Yen-Cheng |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 通訊工程研究所 Communications Engineering |
論文出版年: | 2024 |
畢業學年度: | 113 |
語文別: | 英文 |
論文頁數: | 70 |
中文關鍵詞: | 數位訊號處理 、音訊模糊 、訊號雜訊比 、離散餘弦轉換 、相對熵 |
外文關鍵詞: | DSP, Audio Mosaic, SNR, DCT, KLD |
相關次數: | 點閱:55 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本文介紹了一種基於分層排列的創新音訊馬賽克方法,擴展了我們先前在影像馬賽克方面的研究。與傳統方法僅模糊或添加雜訊到原始訊號不同,我們的方法提供可恢復的馬賽克,允許使用短密鑰進行重建。我們建立了訊號雜訊比(SNR)與離散餘弦轉換(DCT-KLD)的 Kullback-Leibler 散度之間的數學關係,使我們能夠定量評估馬賽克品質。透過在這些指標之間轉換,我們可以確定確保隱藏的語句保持不可理解所需的訊號去結構程度。這消除了對主觀人類聽力測試的需求,並為評估音訊馬賽克方法提供了寶貴的工具。此外,我們還提出了一種基於線性預測編碼(LPC)係數排列的新型輕量級音訊馬賽克方案。鑑於 LPC 在現代音訊編碼器中的廣泛採用,這種方法提供了一種實用且高效的解決方案。我們進行了理論保密性分析,以評估方案的安全性與原始 LPC 多項式的階數以及最小相位排列的 LPC 多項式群的關係。此外,我們透過模擬評估了馬賽克性能,以 DCT-KLD(離散餘弦轉換的 Kullback-Leibler 散度)作為衡量指標。我們的結果表明,所提出的基於 LPC 系數排列的音訊馬賽克方案在馬賽克品質方面顯著優於現有的波形排列方法,同時保持相當的密鑰大小需求。
This paper introduces a novel audio mosaicing approach based on hierarchical permutations, extending our previous work on image mosaicing. Unlike traditional methods that simply obscure or add noise to the original signal, our approach offers recoverable mosaicing, allowing reconstruction with a short key. We establish a mathematical relationship between signal-to-noise ratio (SNR) and Kullback-Leibler divergence of discrete cosine transform (DCT-KLD), enabling quantitative assessment of mosaicing quality. By translating between these metrics, we can determine the degree of signal destructuring necessary to ensure the concealed utterance remains unintelligible. This eliminates the need for subjective human listening tests and provides a valuable tool for evaluating audio mosaic methods. Additionally, A novel, lightweight audio mosaic scheme is proposed based on permutations of linear predictive coding (LPC) coefficients. Given the widespread adoption of LPC in contemporary audio codecs, this approach offers a practical and efficient solution. Theoretical secrecy analysis is conducted to evaluate the scheme's security in relation to the degree of the original LPC polynomial and the population of minimum-phase permuted LPC polynomials. Furthermore, the mosaic performance, measured by DCT-KLD (Kullback-Leibler divergence of discrete cosine transform), is assessed through simulations. Our results demonstrate that the proposed LPC-coefficient permutation-based audio mosaic scheme significantly outperforms existing waveform permutation methods in terms of mosaic quality, while maintaining comparable key size requirements.
[1] B. Atal and J. Remde. A new model of LPC excitation for producing naturalsounding speech at low bit rates. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 7, pages 614–617, May 1982.
[2] Tom Backstrom and Carlo Magi. Effect of white-noise correction on linear predictive coding. IEEE Signal Processing Letters, 14(2):148–151, January 2007.
[3] Magi Carlo, Pohjalainen Jouni, B¨ackstr¨om Tom, and Alku Paavo. Stabilised weighted linear prediction. Speech Communication, 51(5):401–411, January 2009.
[4] Neha Chauhan, Tsuyoshi Isshiki, and Dongju Li. Speaker recognition using LPC, MFCC, ZCR features with ANN and SVM classifier for large input database. In Proceedings of IEEE International Conference on Computer and Communication Systems (ICCCS), February 2019.
[5] Anurag Chowdhury and Arun Ross. Fusing MFCC and LPC features using 1D triplet CNN for speaker recognition in severely degraded audio signals. IEEE Transactions on Information Forensics and Security, 15:1616–1629, August 2020.
[6] K.-H. Chung and Y.-H. Chan. Color demosaicing using variance of color differences. IEEE Transactions on Image Processing, 15(10):2944–2955, October 2006.
[7] Zdravko Cvetkovski. Inequalities: Theorems, Techniques and Selected Problems. Springer, 2012.
[8] Amit Das and John H. L. Hansen. Constrained iterative speech enhancement using phonetic classes. IEEE Transactions on Audio, Speech, and Language Processing, 20(6):1869–1883, August 2012.
[9] de Fr´ein Ruair´ı. Power-weighted LPC formant estimation. IEEE Transactions on Circuits and Systems II: Express Briefs, 68(6):2207–2211, November 2021.
[10] Xiangling Ding, Gaobo Yang, Ran Li, Lebing Zhang, Yue Li, and Xingming Sun. Identification of motion-compensated frame rate up-conversion based on residual signals. IEEE Transactions on Circuits and Systems for Video Technology, 28(7):1497–1512, July 2018.
[11] Xiangling Ding, Ningbo Zhu, Leida Li, Yue Li, and Gaobo Yang. Robust localization of interpolated frames by motion-compensated frame interpolation based on an artifact indicated map and tchebichef moments. IEEE Transactions on Circuits and Systems for Video Technology, 29(7):1893–1906, July 2019.
[12] Yudi Dong and Yu-Dong Yao. Secure mmWave-radar-based speaker verification for IoT smart home. IEEE Internet of Things Journal, 8(5):3500–3511, March 2021.
[13] Thomas Drugman. Maximum phase modeling for sparse linear prediction of speech. IEEE Signal Processing Letters, 21(2):185–189, February 2014.
[14] Meijun Fu, Xiaomin Wang, and Jun Wang. Polynomial-decomposition-based LPC for formant estimation. IEEE Signal Processing Letters, 29:1392–1396, June 2022.
[15] Florin Ghido and Ioan Tabus. Sparse modeling for lossless audio compression. IEEE Transactions on Audio, Speech, and Language Processing, 21(1):14–28, August 2013.
[16] Cheng Guo, Jing Jia, Yingmo Jie, Charles Zhechao Liu, and Kim-Kwang Raymond Choo. Enabling secure cross-modal retrieval over encrypted heterogeneous IoT databases with collective matrix factorization. IEEE Internet of Things Journal, 7(4):3104–3113, April 2020.
[17] Ning Guo and Bernd Edler. Frequency domain long-term prediction for low delay general audio coding. IEEE Signal Processing Letters, 28:1185–1189, June 2021.
[18] Yan He, Yaqi Cheng, Weihua Liu, and Xingguang Li. An algorithm with smooth filtering based on LPC. In Proceedings of IEEE International Conference on Smart Internet of Things (SmartIoT), pages 125–129, August 2022.
[19] Christian R. Helmrich and Bernd Edler. Audio coding using overlap and kernel adaptation. IEEE Signal Processing Letters, 23(5):590–594, March 2016.
[20] Gee-Sern Hsu, Hung-Cheng Shie, Cheng-Hua Hsieh, and Jui-Shan Chan. Fast landmark localization with 3d component reconstruction and CNN for crosspose recognition. IEEE Transactions on Circuits and Systems for Video Technology, 28(1):3194–3207, November 2018.
[21] Wenchao Hu, Zhonggui Chen, Hao Pan, Yizhou Yu, Eitan Grinspun, and Wenping Wang. Surface mosaic synthesis with irregular tiles. IEEE Transactions on Visualization and Computer Graphics, 22(3):1302–1313, March 2016.
[22] Mohsen Zareian Jahromi, Adel Zahedi, Jesper Jensen, and Jan Østergaard. Information loss in the human auditory system. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(3):472–481, March 2019.
[23] Jae-Won Kim, Seung Kwon Beack, Wootaek Lim, and Hochong Park. Highly efficient audio coding with blind spectral recovery based on machine learning. IEEE Signal Processing Letters, 29:1212–1216, May 2022.
[24] Yonghoon Kim and Jechang Jeong. Four-direction residual interpolation for demosaicking. IEEE Transactions on Circuits and Systems for Video Technology, 26(5):881–890, May 2016.
[25] Grazina Korvel, Krzysztof Kakol, Olga Kurasova, and Bozena Kostek. Evaluation of Lombard speech models in the context of speech in noise enhancement. IEEE Access, 8:155156–155170, August 2020.
[26] Lian Li and JunHu Kuai. File encryption system based on the hybrid LPC coefficient and AES algorithm. In Proceedings of International Conference on Wireless Communications and Smart Grid (ICWCSG), June 2020.
[27] Lin Li, Lingchen Sun, Bin Feng, Rustam Stolkin, and Zhunga Liu. An automatic and optimal MPA design method. IEEE Transactions on Image Processing, 30:8046–8058, September 2021.
[28] Chih-Yuan Lien, Fu-Jhong Yang, Pei-Yin Chen, and Yi-Wen Fang. Efficient VLSI architecture for edge-oriented demosaicking. IEEE Transactions on Circuits and Systems for Video Technology, 28(8):2038–2047, August 2018.
[29] J.S. Lim and A.V. Oppenheim. Enhancement and bandwidth compression of noisy speech. Proceedings of the IEEE, 67(12):1586–1604, December 1979.
[30] Chao Lin, Debiao He, Neeraj Kumar, Xinyi Huang, Pandi Vijayakumar, and Kim-Kwang Raymond Choo. HomeChain: A blockchain-based secure mutual authentication system for smart homes. IEEE Internet of Things Journal, 7(2):818–829, February 2020.
[31] Chien-Hsiung Lin, Kuo-Liang Chung, and Chun-Wei Yu. Novel chroma subsampling strategy based on mathematical optimization for compressing mosaic videos with arbitrary RGB color filter arrays in H.264/AVC and HEVC. IEEE Transactions on Circuits and Systems for Video Technology, 26(9):1722–1733, September 2016.
[32] Xun Liu, Mischa Dohler, and Yansha Deng. Vibrotactile quality assessment: Hybrid metric design based on SNR and SSIM. IEEE Transactions on Multimedia, 22(4):921–933, April 2020.
[33] Jianfen Ma and Philipos C. Loizou. SNR loss: A new objective measure for predicting speech intelligibility of noise-suppressed speech. Speech Communication, 53(3):340–354, March 2010.
[34] Pejman Mowlaee and Josef Kulmer. Harmonic phase estimation in singlechannel speech enhancement using phase decomposition and SNR information. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(9):1521–1532, September 2015.
[35] Sikhar Patranabis, Abhishek Chakraborty, Debdeep Mukhopadhyay, and Partha Pratim Chakrabarti. Fault space transformation: A generic approach to counter differential fault analysis and differential fault intensity analysis on AES-like block ciphers. IEEE Transactions on Information Forensics and Security, 12(5):1092–1102, May 2017.
[36] Leonardo Ramalho, Joary Fortuna, Chenguang Lu, Miguel Berg, Igor Almeida, and Aldebaro Klautau. Rate control for entropy-coded LPC: Application to packet-based fronthauling. IEEE Transactions on Communications, 70(9):5898–5911, July 2022.
[37] Sathi Sarveswara Reddy, Sharad Sinha, and Wei Zhang. Design and analysis of RSA and Paillier homomorphic cryptosystems using PSO-based evolutionary computation. IEEE Transactions on Computers, 72(7):1886–1900, July 2023.
[38] Keronen Sami, Pohjalainen Jouni, Alku Paavo, and Kurimo Mikko. Noise robust feature extraction based on extended weighted linear prediction in LVCSR. In Proceedings of Annual Conference of the International Speech Communication Association (INTERSPEECH), pages 1265–1268, August 2011.
[39] M. Schroeder and B. Atal. Code-excited linear prediction (celp): High-quality speech at very low bit rates. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 10, pages 937–940, April 1985.
[40] Eli Schwartz, Raja Giryes, and Alex M. Bronstein. DeepISP: Toward learning an end-to-end image processing pipeline. IEEE Transactions on Image Processing, 28(2):912–923, February 2019.
[41] Dawood Shah, Tariq Shah, Mohammad Mazyad Hazzazi, Muhammad Imran Haider, Amer Aljaedi, and Iqtadar Hussain. An efficient audio encryption scheme based on finite fields. IEEE Access, 9:144385–144394, October 2021.
[42] E. Y.-N. Sun, H.-C. Wu, C. Busch, S. C.-H. Huang, Y.-C. Kuan, and S. Y. Chang. Efficient recoverable cryptographic mosaic technique by permutations. IEEE Transactions on Circuits and Systems for Video Technology, 31(1):112–125, January 2021.
[43] Elaine Y.-N. Sun, Hsiao-Chun Wu, Costas Busch, Scott C.-H.-Huang, YenCheng Kuan, and Jonathan Wu. Innovative audio mosaic technique by permutations. In Proceedings of IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), pages 1–5, October 2020.
[44] Elaine Y.-N. Sun, Hsiao-Chun Wu, Costas Busch, Scott C.-H. Huang, YenCheng Kuan, and Shih Yu Chang. Efficient recoverable cryptographic mosaic technique by permutations. IEEE Transactions on Circuits and Systems for Video Technology, 31(1):112–125, February 2021.
[45] Elaine Y.-N. Sun, Hsiao-Chun Wu, Costas Busch, Scott C.-H. Huang, YenCheng Kuan, and Jonathan Wu. Innovative audio mosaic technique by permutations. In Proceedings of IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), pages 1–5, October 2020.
[46] Daniel Stanley Tan, Wei-Yang Chen, and Kai-Lung Hua. DeepDemosaicking: Adaptive image demosaicking via multiple deep fully convolutional networks. IEEE Transactions on Image Processing, 27(5):2408–2419, May 2018.
[47] Morris C.-H. Tsai, Scott C.-H. Huang, and Hsiao-Chun Wu. Novel recoverable audio mosaic technique using segmental and hierarchical permutations. IEEE Internet of Things Journal, August 2023.
[48] Tsung-Han Tsai and Chun-Nan Liu. Low-power system design for MPEG2/4 AAC audio decoder using pure ASIC approach. IEEE Transactions on Circuits and Systems I: Regular Papers, 56(1):144–155, June 2009.
[49] Yan-Hui Tu, Jun Du, Tian Gao, and Chin-Hui Lee. A multi-target SNR-progressive learning approach to regression based speech enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28:1608–1619, May 2020.
[50] Luca Turchet, Gy¨orgy Fazekas, Mathieu Lagrange, Hossein S. Ghadikolaei, and Carlo Fischione. The internet of audio things: State of the art, vision, and challenges. IEEE Internet of Things Journal, 7(10):10233–10249, October 2020.
[51] Qian Wang, Wenyu Tan, Pengwei Li, Xiao Yan, Hsiao-Chun Wu, and Yiyan Wu. Novel multiwavelet-based LPC random forest classifier for bluetooth RFfingerprint identification. In Proceedings of IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), pages 1–6, 2022.
[52] Sean U. N. Wood and Jean Rouat. Unsupervised low latency speech enhancement with RT-GCC-NMF. IEEE Journal of Selected Topics in Signal Processing, 13(2):332–346, May 2019.
[53] Sean U. N. Wood, Johannes K. W. Stahl, and Pejman Mowlaee. Binaural codebook-based speech enhancement with atomic speech presence probability. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(12):2150–2161, December 2019.
[54] Honghui Xu, Zhipeng Cai, Daniel Takabi, and Wei Li. Audio-visual autoencoding for privacy-preserving video streaming. IEEE Internet of Things Journal, 9(3):1749–1761, February 2022.
[55] Jin Xu, Mark Davis, and Ruair´ı de Fr´ein. New robust LPC-based method for time-resolved morphology of high-noise multiple frequency signals. In Proceedings of Irish Signals and Systems Conference (ISSC), June 2020.
[56] Thimmaraja G. Yadava, H.C. Vinay, T.R. Nayana, H.S. Jayanna, P. Lavanya, D. Aswini, and Garima Singh G. Speech enhancement and encoding using SS-VAD and LPC. In Proceedings of International Conference on Electrical, Electronics, Communication, Computer Technologies and Optimization Techniques (ICEECCOT), pages 151–157, December 2019.
[57] Xiaodong Yang, Wengang Zhou, and Houqiang Li. MCFD: A hardwareefficient noniterative multicue fusion demosaicing algorithm. IEEE Transactions on Circuits and Systems for Video Technology, 31(9):3575–3589, September 2021.
[58] Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas Huang. Free-form image inpainting with gated convolution. In Proceedings of International Conference on Computer Vision (ICCV), pages 4471–4480, October 2019.
[59] Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S. Huang. Generative image inpainting with contextual attention. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5505–5514, June 2018.
[60] Zixiao Yu, Haohong Wang, and Jian Ren. RealPRNet: A real-time phonemerecognized network for “believable” speech animation. IEEE Internet of Things Journal, 9(7):5357–5367, April 2022.
[61] Jian Zhao, Lin Xiong, Jianshu Li, Junliang Xing, Shuicheng Yan, and Jiashi Feng. 3D-aided dual-agent GANs for unconstrained face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(10):2380– 2394, October 2019.
[62] Nan Zhao, Elena C. Kodama, and Joseph A. Paradiso. Mediated atmosphere table (MAT): Adaptive multimodal media system for stress restoration. IEEE Internet of Things Journal, 9(23):23614–23625, December 2022.
[63] Kai Zhen, Mi Suk Lee, Jongmo Sung, Seungkwon Beack, and Minje Kim. Psychoacoustic calibration of loss functions for efficient end-to-end neural audio coding. IEEE Signal Processing Letters, 27:2159–2163, November 2020.