簡易檢索 / 詳目顯示

研究生: 吳明儒
論文名稱: 視覺特徵用於大規模學習:以晶圓圖與音樂曲風分類為例
Visual Features for Large-scale Learning: Case Studies on Wafer Map and Music Genre Classification
指導教授: 張智星
張俊盛
口試委員:

賴尚宏
冀泰石
陳煥宗
林俊賢
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2015
畢業學年度: 103
語文別: 英文
論文頁數: 72
中文關鍵詞: 大尺度學習視覺特徵晶圓錯誤類型辨識音樂曲風分類
外文關鍵詞: Large-scale learning, Visual features, Wafer map failure pattern recognition, Music genre classification
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著大規模資料集日益容易取得,也讓大規模學習在學界與業界受到關注。其中,智慧型手機的龐大需求帶動了相關產業鏈的蓬勃發展。對於上游的半導體製造業而言,如何持續提升晶圓良率是一項重要的議題,而晶圓圖錯誤類型辨識是機器視覺裡的一個應用,其可將晶圓自動分類,以協助工程師尋找錯誤並加速良率提升。對於下游的APP產業而言,由於線上音樂串流服務的需求持續成長,也讓音樂曲風分類益發重要,其為機器聽覺裡的一個應用,可幫助音樂檔案的管理及推薦。然而,上述這兩項大規模學習的應用仍然缺乏精簡且具有鑑別力的特徵。與過去的研究不同,我們針對晶圓圖和歌曲分別設計視覺特徵。為了驗證系統的效能,我們建立了一個世界最大的公開晶圓資料集(WM-811K),也使用世界最大的音樂曲風評測資料集(MASD)來驗證,實驗結果顯示我們提出的視覺特徵均可有效提升辨識率。此外,在晶圓圖錯誤類型辨識上,我們的方法已經實際在晶圓廠上線,而在音樂曲風分類上,我們的方法也獲得了MIREX 2011至2013年曲風分類競賽的冠軍,這均說明我們方法的穩健性。


    Increased availability of large-scale datasets has attracted increased academic and industrial attention to large-scale learning. Concurrently, huge growth in demand for smart phones has had a commensurate impact on related industries such as wafer manufacturing and mobile application industries. In the wafer manufacturing industry, increased demand has driven efforts to increase wafer production capacity, in part by reducing failure rates. Wafer map failure pattern recognition (WMFPR), an application of machine vision, can be used to automatically classify wafers, thus assisting engineers in identifying root causes of failure and thus increasing wafer yield. In the mobile application industry, increased demand for online music distribution has driven interest in music genre classification (MGC), which is an application of machine hearing, can facilitate music organization and music recommendation for online music services. However, reduced yet discriminative feature representations are still needed for these two large-scale learning applications. By contrast to conventional approaches, we consider an alternate approach for designing visual features for WMFPR and MGC. To validate system performance, we collected the world's largest public wafer map dataset (WM-811k) for WMFPR, and applied the world's largest benchmark dataset (MASD) for MGC. Experimental results show that the proposed visual features can considerably improve recognition rates. Furthermore, TSMC has adopted the proposed WMFPR method, while the proposed MGC method won the MIREX music genre classification contests from 2011 to 2013, indicating the robustness of the proposed methods.

    1 Introduction 1 2 Literature Review 4 2.1 Wafer Map Failure Pattern Recognition 4 2.2 Music Genre Classification 6 3 Proposed Visual Features for Wafer Maps 10 3.1 Radon-based Features 10 3.2 Geometry-based Features 13 3.2.1 Regional Attributes 13 3.2.2 Statistical Attributes 15 3.2.3 Linear Attribute 16 4 Proposed Visual Features for Music Songs 17 4.1 Spectrogram Computation and Subband Division 18 4.2 Gabor Filtering 19 4.3 Beat Tracking 20 4.4 IBI Texture Representation 21 4.5 Heterogeneity Measure of IBIs 22 4.6 Feature Vector Concatenation 25 5 System Design 27 5.1 Wafer Map Failure Pattern Recognition 27 5.2 Music Genre Classification 28 5.2.1 Early Fusion 28 5.2.2 Proposed Confidence-based Late Fusion 29 6 Performance Evaluation 34 6.1 Wafer Map Failure Pattern Recognition 34 6.1.1 Data Collection 34 6.1.2 Experimental Settings 36 6.1.3 Experimental Results 37 6.2 Music Genre Classification 43 6.2.1 Datasets 43 6.2.2 Experimental Settings 44 6.2.3 Visual Feature Comparison 45 6.2.4 Fusion and Nonfusion Comparison 45 6.2.5 Comparison with Other Approaches 48 6.2.6 MIREX Contest 50 7 Conclusion 51 Appendix 53 A Gaussian Super Vector (GSV) 53 B Wafer Map Similarity Ranking 56 C Visual Feature Combinations 60 D MIREX 2013 Music Mood Classification Contest Result 61

    [1] G. X. Yuan, C. H. Ho, and C. J. Lin, “Recent advances of large-scale linear classification,” Proceedings of the IEEE, vol. 100, pp. 2584–2603, Sept 2012.
    [2] X. Glorot, A. Bordes, and Y. Bengio, “Domain adaptation for large-scale sentiment classification: A deep learning approach,” in In Proceedings of the Twenty-eight International Conference on Machine Learning, ICML, 2011.
    [3] F. Perronnin, Z. Akata, Z. Harchaoui, and C. Schmid, “Towards good practice in large-scale learning for image classification,” in Computer Vision and Pattern
    Recognition (CVPR), 2012 IEEE Conference on, pp. 3482–3489, June 2012.
    [4] TSMC, “Quarterly results,” 2013. [Online] Available: www.tsmc.com/uploadfile/ir/quarterly/2013/4mMqe/E/4Q13ManagementReport.pdf.
    [5] Q. P. He and J. Wang, “Large-scale semiconductor process fault detection using a fast pattern recognition-based method,” IEEE Trans. Semicond. Manuf., vol. 23, pp. 194–200, May 2010.
    [6] R. Baly and H. Hajj, “Wafer classification using support vector machines,” IEEE Trans. Semicond. Manuf., vol. 25, pp. 373–383, Aug. 2012.
    [7] C. F. Chen, W. C. Wang, and J. C. Cheng, “Data mining for yield enhancement in semiconductor manufacturing and an empirical study,” Expert Systems with Applications, vol. 33, pp. 192–198, Jul. 2007.
    [8] Nielsen, “U.S. music industry year-end review: 2013,” 2013. [Online] Available: http://www.nielsen.com/us/en/insights/reports/2014/u-s-music-industry-year-end-review-2013.html.
    [9] Business Insider, “Spotify has 10 million subscribers, which implies it’s doing over $1.2 billion in annual revenue,” 2014. [Online] Available: http://www.businessinsider.com/spotify-has-10-million-subscribers-40-million-users-2014-5.
    [10] G. Tzanetakis and P. Cook, “Musical genre classification of audio signals,” IEEE Transactions on Speech and Audio Processing, vol. 10, July 2002.
    [11] T. Deselaers, D. Keysers, and H. Ney, “Features for image retrieval: an experimental comparison,” Information Retrieval, vol. 11, no. 2, pp. 77–107, 2008.
    [12] D. Zhang, M. M. Islam, and G. Lu, “A review on automatic image annotation techniques,” Pattern Recognition, vol. 45, no. 1, pp. 346 – 362, 2012.
    [13] M. Wu, J. Jang, and J. Chen, “Wafer map failure pattern recognition and similarity ranking for large-scale data sets (accepted),” IEEE Transactions on Semiconductor Manufacturing, 2014.
    [14] M. Wu and J. Jang, “Combining acoustic and multilevel visual features for music genre classification (in peer review),” ACM Transactions on Multimedia Computing Communications and Applications, 2015.
    [15] F. L. Chen and S. F. Liu, “A neural-network approach to recognize defect spatial pattern in semiconductor fabrication,” IEEE Trans. Semicond. Manuf., vol. 13, pp. 366–373, Aug. 2000.
    [16] C. F. Chen, S. C. Hsu, and Y. J. Chen, “A system for online detection and classification of wafer bin map defect patterns for manufacturing intelligence,”International Journal of Production Research, vol. 51, pp. 2324–2338, Feb.
    2013.
    [17] J. Y. Hwang and W. Kuo, “Model-based clustering for integrated circuits yield enhancement,” European Journal of Operational Research, vol. 178, pp. 143–153, Apr. 2007.
    [18] T. Yuan and W. Kuo, “A model-based clustering approach to the recognition of spatial defect patterns produced during semiconductor fabrication,” IIE Trans., vol. 40, no. 2, pp. 93–101, 2008.
    [19] T. Yuan and W. Kuo, “Spatial defect pattern recognition in semiconductor manufacturing using model-based clustering and bayesian inference,” European Journal of Operational Research, vol. 190, no. 1, pp. 228–240, 2008.
    [20] T. Yuan, S. J. Bae, and J. I. Park, “Bayesian spatial defect pattern recognition in semiconductor fabrication using support vector clustering,” The International Journal of Advanced Manufacturing Technology, vol. 51, no. 5-8, pp. 671–683, 2010.
    [21] T. Yuan, W. Kuo, and S. J. Bae, “Detection of spatial defect patterns generated in semiconductor fabrication process,” IEEE Trans. Semicond. Manuf., vol. 24, no. 3, pp. 392–403, 2011.
    [22] K. W. Tobin, S. S. Gleason, T. P. Karnowski, S. L. Cohen, and F. Lakhani, “Automatic classification of spatial signatures on semiconductor wafer maps,” in Proc. Metrology, Inspection, and Process Control for Microlithography, pp. 434–444, 1997.
    [23] K. W. Tobin, S. S. Gleason, and T. P. Karnowskii, “Feature analysis and classification of manufacturing signatures based on semiconductor wafermaps,” in Proc. Machine Vision Applications in Industrial Inspection, pp. 14–25, 1997.
    [24] T. P. Karnowski, K. W. Tobin, S. S. Gleason, and F. Lakhani, “The application of spatial signature analysis to electrical test data: validation study,”in Proc.Inspection, and Process Control for Microlithography XIII, pp. 530–
    540, 1999.
    [25] C. H. Wang, “Recognition of semiconductor defect patterns using spatial filtering and spectral clustering,” Expert System with Applications, vol. 34, no. 3, pp. 1914–1924, 2008.
    [26] D.-N. Jiang, L. Lu, H.-J. Zhang, J.-H. Tao, and L.-H. Cai, “Music type classification by spectral contrast feature,” in Proceedings of the IEEE International Conference on Multimedia and Expo., vol. 1, pp. 113–116, 2002.
    [27] Z. Fu, G. Lu, K. M. Ting, and D. Zhang, “A survey of audio-based music classification and annotation,” IEEE Transactions on Multimedia, vol. 13, pp. 303–319, April 2011.
    [28] K. Seyerlehner, M. Schedl, T. Pohle, and P. Knees, “Using block-level features for genre classification, tag classification and music similarity estimation,” 2010.

    [29] M. Goto, “Smartmusickiosk: music listening station with chorus-search function,” in Proceedings of the 16th ACM Conference on User Interface Software and Technology, pp. 31–40, ACM, 2003.
    [30] C. Cao and M. Li, “Thinkit’s submission for mirex 2009 audio music classification and similarity tasks,” 2009.
    [31] W. M. Campbell, D. E. Sturim, and D. A. Reynolds, “Support vector machines using gmm supervectors for speaker verification,” IEEE Signal Processing Letters, vol. 13, no. 5, pp. 308–311, 2006.
    [32] D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, “Speaker verification using adapted gaussian mixture models,” Digital Signal Processing, vol. 10, no. 1–3, pp. 19 – 41, 2000.
    [33] J.-L. Gauvain and C.-H. Lee, “Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains,” IEEE Transactions on Speech and Audio Processing, vol. 2, pp. 291–298, Apr 1994.
    [34] P. Grosche, J. Serrà, M. Müller, and J. L. Arcos, “Structure-based audio fingerprinting for music retrieval,” in Proceedings of the International Conference
    on Music Information Retrieval, (Porto, Portugal), pp. 55–60, 2012.
    [35] J. F. Alm and J. S. Walker, “Time-frequency analysis of musical instruments,” Siam Review, vol. 44, no. 3, pp. 457–476, 2002.
    [36] F. Pachet and D. Cazaly, “A taxonomy of musical genres,” in Proceedings of the RIAO Conference, pp. 1238–1245, 2000.

    [37] M.-J. Wu, Z.-S. Chen, J.-S. Jang, J.-M. Ren, Y.-H. Li, and C.-H. Lu, “Combining visual and acoustic features for music genre classification,” in Proceedings of the International Conference on Machine Learning and Applications (ICMLA), vol. 2, pp. 124–129, IEEE, 2011.
    [38] Y. Costa, L. Oliveira, A. Koerich, F. Gouyon, and J. Martins, “Music genre classification using lbp textural features,” Signal Processing, vol. 92, no. 11, pp. 2723–2737, 2012.
    [39] H. Deshpande, R. Singh, and U. Nam, “Classification of music signals in the visual domain,” in Proceedings of the COST-G6 Conference on Digital Audio Effects, pp. 1–4, 2001.
    [40] T. Ojala, M. PietikaÈinen, and T. MaÈenpaÈaÈ, “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7,
    pp. 971–987, 2002.
    [41] D. P. Ellis and G. E. Poliner, “Identifying cover songs’ with chroma features and dynamic programming beat tracking,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 4, pp. 1429–1432, IEEE, 2007.
    [42] S.-C. Pei and N.-T. Hsu, “Instrumentation analysis and identification of polyphonic music using beat-synchronous feature integration and fuzzy clustering,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 169–172, IEEE, 2009.
    [43] R. C. Gonzalez and R. E. Woods, Digital Image Processing. Prentice-Hall, third ed., 2008.
    [44] R. G. Keys, “Cubin convolution interpolation for digital image processing,” IEEE Trans. Audio, Speech, and Signal Processing, vol. 29, no. 6, pp. 1153–1159, 1981.
    [45] R. M. Haralick and L. G. Shapiros, Computer and Robot Vision. Addison-Wesley, 1992.
    [46] S. Cunningham and S. MacKinnon, “Statistical methods for visual defect metrology,” IEEE Trans. Semicond. Manuf., vol. 11, pp. 48–53, Feb. 1998.
    [47] K. P. White, B. Kundu, and C. M. Mastrangelo, “Classification of defect cluster on semiconductor wafers via hough transform,” IEEE Trans. Semicond. Manuf., vol. 2, pp. 272–278, May 2008.
    [48] M. Muller, D. P. W. Ellis, A. Klapuri, and G. Richard, “Signal processing for music analysis,” IEEE Journal of Selected Topics in Signal Processing, vol. 5, no. 6, pp. 1088–1110, 2011.

    [49] J. Laroche, “Efficient tempo and beat tracking in audio recordings,” Journal of the Audio Engineering Society, vol. 51, no. 4, pp. 226–233, 2003.
    [50] D. P. Ellis, “Beat tracking by dynamic programming,” Journal of New Music Research, vol. 36, no. 1, pp. 51–60, 2007.
    [51] F.-H. F. Wu, T.-C. Lee, J.-S. R. Jang, K. K. Chang, C.-H. Lu, and W.-N. Wang, “A two-fold dynamic programming approach to beat tracking for audio music with time-varying tempo,” in Proceedings of the International Conference on Music Information Retrieval, pp. 191–196, 2011.
    [52] M. E. P. Davies and M. D. Plumbley, “Beat tracking with a two state model,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 3, pp. 241–244, IEEE, 2005.
    [53] S. Dixon, “Automatic extraction of tempo and beat from expressive performances,” Journal of New Music Research, vol. 30, no. 1, pp. 39–58, 2001.
    [54] J. Paulus, M. Müller, and A. Klapuri, “State of the art report: Audio-based music structure analysis,” in Proceedings of the International Conference on Music Information Retrieval, pp. 625–636, 2010.
    [55] T. Y. Liu, Y. Yang, H. Wan, H. J. Zeng, Z. Chen, and W. Y. Ma, “Support vector machine with a very large-scale taxonomy,” ACM SIGKDD Explorations Newsletter, vol. 7, pp. 36–43, Jun. 2005.
    [56] C. C. Chang and C. J. Lin, “Libsvm: A library for support vector machine,” 2010. [Online] Available: http://www.csie.ntu.edu.tw/~cjlin/libsvm.
    [57] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, pp. 504–507, Jul. 2006.
    [58] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 35, pp. 1798–1828, Aug. 2013.
    [59] G. E. Hinton, “Training a deep autoencoder or a classifier on mnist digits.” [Online] Available: http://www.cs.toronto.edu/~hinton/MatlabForSciencePaper.html
    [60] Z.-S. Chen, J.-S. R. Jang, and C.-H. Lee, “A kernel framework for contentbased artist recommendation system in music,” IEEE Transactions on Multimedia,vol. 13, no. 6, pp. 1371–1380, 2011.
    [61] K. Seyerlehner, Content-based music recommender systems: Beyond simple frame-level audio similarity. PhD thesis, Johannes Kepler University, Linz, Austria, 2010.
    [62] A. Schindler, R. Mayer, and A. Rauber, “Facilitating comprehensive benchmarking experiments on the million song dataset,” in Proceedings of the International Conference on Music Information Retrieval, pp. 469–474, 2012.
    [63] T. Bertin-Mahieux, D. P. Ellis, B. Whitman, and P. Lamere, “The million song dataset,” in Proceedings of the International Conference on Music Information Retrieval, pp. 591–596, University of Miami, 2011.
    [64] C.-C. Chang and C.-J. Lin, “Libsvm: A library for support vector machine,” 2010.
    [65] K. Seyerlehner, M. Schedl, P. Knees, and R. Sonnleitner, “Draft: A refined block-level feature set for classification, similarity and tag prediction,” 2011.
    [66] T. Lidy and A. Rauber, “Evaluation of feature extractors and psycho-acoustic transformations for music genre classification,” in Proceedings of the International
    Conference on Music Information Retrieval, pp. 34–41, 2005.
    [67] L. Rabiner and B.-H. Juang, Fundamentals of speech recognition, vol. 14. Prentice Hall PTR, 1993.
    [68] C. McKay, Automatic music classification with jMIR. PhD thesis, McGill University, Canda, 2010.
    [69] G. Tzanetakis, “Marsyas submissions to mirex 2007,” 2007.
    [70] J. S. Downie, A. F. Ehmann, and X. Hu, “Music-to-knowledge (m2k): A prototyping and evaluation environment for music digital library research,” in Proceedings of the 5th ACM/IEEE Joint Conference on Digital Libraries, pp. 376–376, IEEE, 2005.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE