簡易檢索 / 詳目顯示

研究生: 施宣酉
Shih, Samuel
論文名稱: 應用深度遷移式學習建構以語意相似度為基之商標圖像檢索系統
Develop a Semantic-based Trademark Logo Image Retrieval System Using Transfer Deep Learning Approach
指導教授: 張瑞芬
Trappey, Amy J. C.
張力元
Trappey, Charles V.
口試委員: 王建智
Wang, Chien-Chih
林裕訓
Lin, Yu-Hsun
學位類別: 碩士
Master
系所名稱: 工學院 - 工業工程與工程管理學系
Department of Industrial Engineering and Engineering Management
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 47
中文關鍵詞: 卷積神經網路三胞胎神經網路商標侵權基於內容的圖像檢索
外文關鍵詞: Convolution neural network, Triplet network, Trademark infringement, Content-based image retrieval
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 影像檢索隨著深度學習技術成長而有突破性的發展,以更深層的神經網路提取圖像特徵與演算法的改善,使得網路模型具有人類視覺的思維來辨識更多元的圖像特徵,而非僅限於傳統的低階特徵來匹配圖像。近幾年影像檢索技術的專利及文獻的發展,許多研究致力於減少機器檢索的資訊與人類視覺語意之間的差異,但是對於圖片的巨量化與複雜化其檢索技術往往越有挑戰性。隨著更深層的神經網路對於萃取的語意特徵越抽象化,本研究以遷移式深度學習方法建構商標相似性檢索系統,此系統命名為LogoSimNet。在數位轉型的發展時代中,巨量的商標影像資訊遍佈全球,其中存在一些法律上的議題。商標局以維也納分類法則審查商標圖像的重複性或相似性問題,透過標註檢索代碼進行相似度分析與匹配檢索。隨著商標數量的日趨成長,進一步提升商標局審核員在審查時的困難度與較長的作業時間。在另一方面,由於網路資訊的普及性,使用者可以便捷的透過網路影像臨摹他人圖畫設計,若不謹慎使用,皆可能引發著作權與商標侵權的爭議。這些具有爭議的問題突出了建構自動化智能商標圖像檢索系統的重要性。由於商標圖案在圖像設計上存在多元的視覺語意,如何將人為圖像匹配相似性工作應用到電腦視覺檢索圖像成為關鍵挑戰。除此之外,本研究以技術功效矩陣分析近十年的專利在影像檢索領域的專利布局分析。在基於已公開發行的專利中大部分為傳統機器學習方法,在深度學習方法中相較於傳統方法仍為少數,但深度學習仍為影像檢索中的重要領域。由上述提及的自動化圖像檢索需求以及深度學習的專利發展趨勢,本研究開發了一種以三胞胎神經網路(Triplet network)基於深度學習方法多元相似訓練神經網路。在模型訓練資料集重新整理部分的Logo-2K+商標資料集,以訓練集(超過26,000張圖像)和測試集(超過9000張圖像)來進行預訓練ResNet50V2模型的微調和驗證。商標檢索結果能依據不同視覺語意進行相似度分析,模型驗證準確度達Recall@16達到95%。


    Image retrieval (IR) technology has made breakthrough development in recent years due to the growth of deep learning technology. Through the feature extracted from deep neural network model, the machine can learn more semantic visual features, not just traditional low-level features. In recent years, with the review of IR's patented technology and non-patent literature, many studies tend to focus on reducing the visual semantics between machine learning results and human visual understanding. With the development of deep learning methods that can extract more semantic features, this research uses the transfer deep learning approach to construct the trademark (TM) retrieval system named LogoSimNet. In the era of digital transformation, huge number of logos have been widely spread with some possible issues. The TM office reviews duplicate or similar TM patterns by Vienna Classification Rules, in which human-labeled Vienna codes are used for similarity analysis and image retrieval. The increasing trend can increase examination difficulties for TM Office during the initial stage of TM reviews and registrations. And, the fact that users can easily download images through the internet and imitate the TM graphic designs also prone to copyright infringement. These controversial issues highlight the importance of developing automatic and intelligent logo IR methodology. Considering the complexity of TM visual semantics, how to implement the manual similarity examination in computer vision retrieval becomes a key challenge. Furthermore, this research analyzes the patent trend in the field of IR with a Technology Function Matrix. Since most of the published patents are traditional machine learning methods, deep learning methods are still an important field of IR. As mentioned above, this research develops the method of logo image similarity analysis using triplet network architecture. This research uses the logo image training set (more than 26 thousand images) and testing set (more than 9 thousand images) from Logo-2K+ database for ResNet50V2 model fine-tuning and verification. The excepted results show that the LogoSimNet model can be retrieved with multiple visual semantics. Model verification have shown excellent results with Recall@16 exceeding 95%.

    摘要----------------------------------------I Abstract-----------------------------------II 致謝----------------------------------------IV Table of Contents----------------------------V List of Figures-----------------------------VI List of Tables------------------------------VII List of Equations---------------------------VIII 1.Introduction-------------------------------1 1.1 Research background----------------------1 1.2 Research purpose-------------------------2 1.3 Research framework-----------------------3 2.Literature review--------------------------6 2.1 History of image retrieval technology----6 2.2 Traditional image retrieval technology---7 2.3 Deep learning image retrieval technology-9 2.4 Related research-------------------------11 2.5 Patent trend analysis based on the image retrieval technology-14 2.5.1 Collect patent data in the field of IR technology-----------15 2.5.2 Main applications of IR patents-----------------------------16 2.5.3 Main technologies of IR patents-----------------------------17 2.5.4 TFM analysis results----------------------------------------19 3.LogoSimNet model training---------------------------------------21 3.1 The pre-processing stage--------------------------------------22 3.2 Training stage of LogoSimNet model----------------------------24 3.3 Triplet sampling strategy-------------------------------------26 3.4 LogoSimNet model fine tuning----------------------------------29 4.Experiment------------------------------------------------------31 4.1 Parameter settings--------------------------------------------31 4.2 Verify different backbones of LogoSimNet performance----------32 4.3 Comparison different classifier layers and dimensions---------33 4.4 Compare different sampling strategies-------------------------34 4.5 Analyze model retrieval results-------------------------------36 5.Conclusions-----------------------------------------------------42 6.References------------------------------------------------------43

    [1] Abdesselam, A. (2009). Texture image retrieval using Fourier transform. Paper presented at the Proc. Int. Conf. Commun., Comput. Power (ICCCP’09).
    [2] Alzu’bi, A., Amira, A., & Ramzan, N. (2015). Semantic content-based image retrieval: A comprehensive study. Journal of Visual Communication and Image Representation, 32, 20-54.
    [3] Appalaraju, S., & Chaoji, V. (2017). Image similarity using deep CNN and curriculum learning. arXiv preprint arXiv:1709.08761.
    [4] Ashley, J., Flickner, M., Hafner, J., Lee, D., Niblack, W., & Petkovic, D. (1995). The query by image content (QBIC) system. Paper presented at the Proceedings of the 1995 ACM SIGMOD International conference on Management of data.
    [5] Bay, H., Tuytelaars, T., & Van Gool, L. (2006). Surf: Speeded up robust features. Paper presented at the European conference on computer vision.
    [6] Bing Song, D. M. (2019). US10049300B2.
    [7] Bloice, M. D., Stocker, C., & Holzinger, A. (2017). Augmentor: an image augmentation library for machine learning. arXiv preprint arXiv:1708.04680.
    [8] Busby, M. (2003). Learn Google: Wordware Publishing, Inc.
    [9] Deng, J., Guo, J., Xue, N., & Zafeiriou, S. (2019). Arcface: Additive angular margin loss for deep face recognition. Paper presented at the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
    [10] François, C. (2019). Keras. 2015. URL: https://keras. io/[accessed 2019-05-30][WebCite Cache].
    [11] Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier neural networks. Paper presented at the Proceedings of the fourteenth international conference on artificial intelligence and statistics.
    [12] Gupta, R., & Singh, V. (2018). A Framework for Semantic based Image Retrieval from Cyberspace by mapping low level features with high level semantics. Paper presented at the 2018 3rd International Conference On Internet of Things: Smart Innovation and Usages (IoT-SIU).
    [13] Hawes, J. E. (1992). Trademark Registration Practice: C. Boardman Company.
    [14] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.
    [15] Howarth, P., & Rüger, S. (2004). Evaluation of texture features for content-based image retrieval. Paper presented at the International conference on image and video retrieval.
    [16] Hu, S. (2019). The Influence of Artificial Intelligence Development on Patent Legislation. Paper presented at the 2019 International Conference on Robots & Intelligent System (ICRIS).
    [17] Kato, T. (1992). Database architecture for content-based image retrieval. Paper presented at the image storage and retrieval systems.
    [18] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
    [19] Klambauer, G., Unterthiner, T., Mayr, A., & Hochreiter, S. (2017). Self-normalizing neural networks. arXiv preprint arXiv:1706.02515.
    [20] Krizhevsky, A., & Hinton, G. E. (2011). Using very deep autoencoders for content-based image retrieval. Paper presented at the ESANN.
    [21] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 1097-1105.
    [22] Lan, T., Feng, X., Li, L., & Xia, Z. (2018). Similar trademark image retrieval based on convolutional neural network and constraint theory. Paper presented at the 2018 Eighth International Conference on Image Processing Theory, Tools and Applications (IPTA).
    [23] Latif, A., Rasheed, A., Sajid, U., Ahmed, J., Ali, N., Ratyal, N. I., . . . Khalil, T. (2019). Content-based image retrieval and feature extraction: a comprehensive review. Mathematical Problems in Engineering, 2019.
    [24] LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.
    [25] Lowe, D. G. (1999). Object recognition from local scale-invariant features. Paper presented at the Proceedings of the seventh IEEE international conference on computer vision.
    [26] Min, W., Mei, S., Li, Z., & Jiang, S. (2020). A Two-Stage Triplet Network Training Framework for Image Retrieval. IEEE Transactions on Multimedia.
    [27] Ming Yang, X. W., Yuanqing Lin, Kai Yu. (2011). US8787682B2.
    [28] Mundhenk, T. N., Chen, B. Y., & Friedland, G. (2019). Efficient Saliency Maps for Explainable AI. arXiv preprint arXiv:1911.11293.
    [29] Narayan, S. (1997). The generalized sigmoid activation function: Competitive supervised learning. Information Sciences, 99(1-2), 69-82.
    [30] Nwankpa, C., Ijomah, W., Gachagan, A., & Marshall, S. (2018). Activation functions: Comparison of trends in practice and research for deep learning. arXiv preprint arXiv:1811.03378.
    [31] Oh Song, H., Xiang, Y., Jegelka, S., & Savarese, S. (2016). Deep metric learning via lifted structured feature embedding. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.
    [32] Schietse, J., Eakins, J. P., & Veltkamp, R. C. (2007). Practice and challenges in trademark image retrieval. Paper presented at the Proceedings of the 6th ACM international conference on Image and video retrieval.
    [33] Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.
    [34] Simant Dube, S. R., Xiaofan Lin, Arnab Sanat Kumar Dhua, Colin Jon Taylor, Jaishanker K. Pillai. (2017). US10140549B2.
    [35] Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
    [36] Singh, S. M., Rajkumar, R., & Hemachandran, K. (2013). Comparative study on content based image retrieval based on Gabor texture features at different scales of frequency and orientations. International Journal of Computer Applications, 78(7).
    [37] Smith, J. R., & Chang, S.-F. (1997). Visually searching the web for content. IEEE multimedia, 4(3), 12-20.
    [38] Su, H., Gong, S., & Zhu, X. (2017). Weblogo-2m: Scalable logo detection by deep learning from the web. Paper presented at the Proceedings of the IEEE International Conference on Computer Vision Workshops.
    [39] Swain, M. J., & Ballard, D. H. (1991). Color indexing. International journal of computer vision, 7(1), 11-32.
    [40] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., . . . Rabinovich, A. (2015). Going deeper with convolutions. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.
    [41] Tamura, H., Mori, S., & Yamawaki, T. (1978). Textural features corresponding to visual perception. IEEE Transactions on Systems, man, and cybernetics, 8(6), 460-473.
    [42] Tan, M., & Le, Q. V. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946.
    [43] Trappey, C. V., Trappey, A. J., & Lin, S. C.-C. (2020). Intelligent trademark similarity analysis of image, spelling, and phonetic features using machine learning methodologies. Advanced Engineering Informatics, 45, 101120.
    [44] Ulrich Buddemeier, G. T., Hartwig Adam, Charles Rosenberg, Hartmut Neven, David Petrou, Fernando Brucher. (2015). US8761512B1.
    [45] Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of machine learning research, 9(11).
    [46] Veit, A., Belongie, S., & Karaletsos, T. (2017). Conditional similarity networks. Paper presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
    [47] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Wang, H., & Jiang, S. (2020). Logo-2K+: A Large-Scale Logo Dataset for Scalable Logo Classification. Paper presented at the AAAI.
    [48] Wei Di, V. J., Robinson Piramuthu, Rohit Pandey, Anurag Bhardwaj. (2017). US9697233B2.
    [49] WIPO. (2020). World Intellectual Property Indicators 2020. Retrieved from https://www.wipo.int/publications/en/details.jsp?id=4526
    [50] Wu, C.-Y., Manmatha, R., Smola, A. J., & Krahenbuhl, P. (2017). Sampling matters in deep embedding learning. Paper presented at the Proceedings of the IEEE International Conference on Computer Vision.
    [51] Xu, B., Wang, N., Chen, T., & Li, M. (2015). Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853.
    [52] Zagoruyko, S., & Komodakis, N. (2015). Learning to compare image patches via convolutional neural networks. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.

    QR CODE