簡易檢索 / 詳目顯示

研究生: 郭達人
Kuo, Da Ren
論文名稱: 透過深度類神經網路結合細部外觀變化以及局部組態達到商品辨識之目的
Mobile Product Recognition by Involving Fine-Grained Appearance and Part Configuration into Deep Neural Network
指導教授: 許秋婷
Hsu, Chiou Ting
口試委員: 孫民
王聖智
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2016
畢業學年度: 104
語文別: 英文
論文頁數: 37
中文關鍵詞: 商品辨識深度學習
外文關鍵詞: production recognition, deep learning
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 商品辨識的目標在於:使用者給予一張商品影像後,系統從資料庫中搜尋數張與此照片相似的影像,透過這些相似影像來辨識受測影像屬於何種商品。在此領域中,其準確度主要受到不同商品影像之間的高度相似性、相同商品影像因外部因素造成的變化這兩大挑戰所影響。在本篇論文中,我們首先提出多重階段的架構,此架構中涵蓋多個類神經網路,透過這些類神經網路去一步步解決上述提到的兩大挑戰。此外我們對此多重階段架構進行優化,提出了二重階段的架構。在此二重階段架構中,我們先透過一個類神經網路去同時尋找影像上商品可能的位置以及約略的分辨此影像是屬於哪幾類商品,並在最終辨識階段藉由一類神經網路來結合商品細部外觀變化以及局部組態,並用以計算受測影像與資料庫中這幾類商品影像的相似度。我們設計了兩種不同的類神經網路將其應用於最終辨識階段。相較於多重階段架構,二重階段架構能在不降低準確度的前提下更有效率的處理商品辨識的問題。我們更自行拍攝且收集了一百種商品的影像並建立一個全新的商品辨識資料庫。實驗結果顯示,我們的方法在此資料庫以及另一資料庫上的辨識結果皆遠勝於現今商品辨識的方法。


    Mobile product recognition aims to recognize the product image by retrieving the similar images from the dataset. Recognition accuracy is largely affected by two challenging issues: inter-product similarity and intra-product variations. In this work, we first introduce a multi-stage method to tackle the two issues through multiple convolutional neural networks. After validating the effectiveness, we further propose to simplify the repetitive convolutional operations involved in different stages. In the second proposed method (the two-stage method), we aim to design a deep neural network that can jointly solve the two above-mentioned issues. We first adopt Faster RCNN to simultaneously locate the product and roughly categorize the query image. We next re-use the feature representation learned in earlier layers and involve the part configuration and fine-grained appearance into a deep neural network in the final product recognition stage. In the final stage, we design two kinds of deep neural networks that are rotation-invariant and measure the similarity accurately. The second proposed two-stage method tackles the product recognition task in a more elegant and efficient way without compromising the performance. In addition, we collect a new publicly available dataset PRODUCT-100, which contains 100 products taken under real-world scenarios. Our experiments demonstrate that our method achieve promising results and outperforms existing methods on both PRODUCT-100 and SHORT dataset.

    中文摘要 I Abstract II 1. Introduction 1 2. Related Work 6 2.1 Mobile product recognition 6 2.2 Fine-grained appearance 6 2.3 Deep network for image pair recognition 7 3. Proposed Method 9 3.1 Multi-stage Method for Product Recognition 9 3.1.1 Product Localization by Fine-Tuned Faster RCNN 10 3.1.2 Image Alignment by RotationNet 10 3.1.3 Similarity measurement 11 3.2 The Two-stage Product Recognition Method 14 3.2.1 Product Location and Category Determination 16 3.2.2 Feature Extraction 17 3.2.3 Product Classification 17 4. Experimental Results 20 4.1 Implementation Details 20 4.2 Datasets 22 4.3 Evaluation of the Proposed Method 23 4.4 Comparison with existing method 28 5. Discussion 30 6. Conclusions 32 7. References 33

    [1] A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet classification with deep convolutional neural networks,” in NIPS, 2012.
    [2] S. Ren, K He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in NIPS, 2015.
    [3] J. Rivera-Rubio, S. Idrees, I. Alexiou, L. Hadjilucas, and A. Bharath, “Small hand-held object recognition test (short) ,” in WACV, 2014.
    [4] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” in ECCV, 2014.
    [5] B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik, “Hypercolumns for Object Segmentation and Fine-grained Localization,” in CVPR, 2015.
    [6] L. Liu, C. Shen, A.V.D. Hengel, and C. Wang, “The treasure beneath convolutional layers: Cross-convolutional-layer pooling for image classification,” in CVPR, 2015.
    [7] A. Fathi, X. Ren and M. Rehg, “Learning to Recognize Objects in Egocentric Activities,” in CVPR, 2011.
    [8] S. Bambach, S. Lee, D. Crandall, and C. Yu, “Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions,” in ICCV, 2015.
    [9] X. Shen, Z. Lin, J. Brandt, and Y. Wu, “mobile Product Image Search by Automatic Query Object Extraction,” in ECCV, 2012.
    [10] M. George and C. Floerkemeier, “Recognizing products: A per-exemplar multi-label image classification approach,” in ECCV, 2014.
    [11] M. George, D. Mircic, G. Soros, C. Floerkemeier, and F. Mattern, “Fine-Grained Product Class Recognition for Assisted Shopping,” in ICCV, 2015.
    [12] D.G. Lowe, “Distinctive image features from scale-invariant keypoints,” IJCV, 60(2):91-110, 2004.
    [13] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong, “Locality-constrained linear coding for image classification,” in CVPR, 2010.
    [14] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part-based models,” IEEE TPAMI, vol. 32, no. 9, pp. 1627-1645, Sept. 2010.
    [15] D. Koubaroulis and J. Matas, “Evaluating colour-based object recognition algorithms using the SOIL-47 database,” in ACCV, 2002.
    [16] S. Nene, S. Nayar, and H. Murase, “Columbia Object Image Library (COIL-100),” Technical Report CUCS-006-96, Columbia University, 1996.
    [17] H. Azizpour and I. Laptev, “Object Detection Using Strongly-Supervised Deformable Part Models,” in ECCV, 2012.
    [18] I. Endres, K.J Shih, J. Jiaa, and D. Hoiem, “Learning Collections of Part Models for Object Recognition,” in CVPR, 2013.
    [19] J. Krause, H. Jin, J. Yang, and L. Fei-Fei, “Fine-grained recognition without part annotations,” in CVPR, 2015.
    [20] R.B. Girshick, F.N. Iandola, T. Darrell, and J. Malik, “Deformable part models are convolutional neural networks,” in CVPR, 2015.
    [21] N. Zhang, J. Donahue, R. Girshick, and T. Darrell, “Part-based R-CNNs for Fine-grained Category Detection,” in ECCV, 2014.
    [22] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in ICLR, 2015.
    [23] N. Zhang, R. Farrell, and T. Darrell, “Pose pooling kernels for sub-category recognition,” in CVPR, 2012.
    [24] N. Zhang, R. Farrell, F. Iandola, and T. Darrell, “Deformable part descriptors for fine-grained recognition and attribute prediction,” in ICCV, 2013.
    [25] Y. Zhang, X. Wei, J. Wu, J. Cai, J. Lu, V. Nguyen, and M. Do, “Weakly supervised fine-grained image categorization,” in arXiv, 2015.
    [26] C. Rother, V. Kolmogorov, and A. Blake, “Grabcut—interactive foreground extraction using iterated graph cuts,” in SIGGRAPH , 2004.
    [27] G. Csurka, C. Dance, and L. Fan, “Visual categorization with bags of keypoints,” in ECCV Workshop, 2004.
    [28] F. Perronnin, J. Sanchez, and T. Mensink, “Improving the Fisher kernel for large-scale image classification,” in ECCV, 2010.
    [29] Q. Feng, J. Pan, L. Yan, “Two Classifiers Based on Nearest Feature Plane for Recognition,” in ICIP, 2013.
    [30] Q. Feng, C. Yuan, J. Huang, and W. Li, “Center-based weighted kernel linear regression for image classification,” in ICIP, 2015.
    [31] T. Zhang, K. Huang, X. Li, J. Yang, and D. Tao, “Discriminative orthogonal neighborhood-preserving projections for classification,” in TSMCB, 2010.
    [32] H. Kekre, S. Thepade, T. Sarode, and V. Suryawanshi, “Image retrieval using texture features extracted from GLCM, LBG and KPE,” in IJCTE, 2010.
    [33] S.K. Naik, and C.A. Murthy, “Distinct Multicolored Region Descriptors for Object Recognition,” in IEEE TPAMI, 2007.
    [34] N. Dalal, B. Triggs, “Histogram of oriented gradients for human detection,” in CVPR, 2005.
    [35] Y. Sun, X. Wang, X, Tang, “Hybrid Deep Learning for Face Verification,” in ICCV, 2013.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE