研究生: |
陳浩雲 Chen, Hao-Yun |
---|---|
論文名稱: |
基於分層式互補目標的神經網絡訓練 Learning with Hierarchical Complement Objective |
指導教授: |
張世杰
Chang, Shih-Chieh |
口試委員: |
王鈺強
Wang, Yu-Chiang 吳凱強 Wu, Kai-Chiang |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 英文 |
論文頁數: | 32 |
中文關鍵詞: | 類別標籤層次 、最佳化 、熵 、深度學習 、分類 、語義分割 |
外文關鍵詞: | category hierarchy, optimization, entropy, deep learning, image recognition, semantic segmentation |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
分層式標籤已經被廣泛地運用在標註計算機視覺相關的任務上,例如有明確分層的圖像分類任務與隱式分層的語意分割任務。然而,在目前計算機視覺相關的任務上表現最好的模型與方法都是使用交叉熵,相當於隱式的假設了類別標籤之間並不具有分層的架構。 基於類別標籤與標籤之間必定具有某些相似性存在,我們提出了一種全新的模型訓練準則: Hierarchical Complement Objective Training (HCOT),這個全新的訓練準則能夠有效的利用分層式標籤的資訊來訓練深度神經網絡。HCOT 嘗試要最大化正確類別的機率,且同時依照其餘類別與正確類別之間的分層關聯來平攤剩餘的機率分布,這麼做可以使模型有效地利用分層式標籤的優點。我們使用我們提出的全新訓練準則 HCOT 在圖像分類任務與語意分割任務上。實驗結果證實,使用了 HCOT 能夠超越現今存在最優的模型與訓練方法在 CIFAR-100, ImageNet-2012, and PASCAL-Context 訓練集上。我們額外的研究更表明,HCOT 能夠使用在所有擁有隱式分層的計算機視覺任務上。
Label hierarchies widely exist in many vision-related problems, ranging from explicit label hierarchies existed in image classification to latent label hierarchies existed in semantic segmentation. Nevertheless, state-of-the-art methods often deploy cross-entropy loss that implicitly assumes class labels to be exclusive and thus independence from each other. Motivated by the fact that classes from the same parental category usually share certain similarity, we design a new training diagram called Hierarchical Complement Objective Training (HCOT) that leverages the information from label hierarchy. HCOT maximizes the probability of the ground truth class, and at the same time, neutralizes the probabilities of rest of the classes in a hierarchical fashion, making the model take advantage of the label hierarchy explicitly. The proposed HCOT is evaluated on both image classification and semantic segmentation tasks. Experimental results confirm that HCOT outperforms state-of-the-art models in CIFAR-100, ImageNet-2012, and PASCAL-Context. The study further demonstrates that HCOT can be applied on tasks with latent label hierarchies, which is a common characteristic in many machine learning tasks.
[1] B. Alsallakh, A. Jourabloo, M. Ye, X. Liu, and L. Ren. Do convolutional neural networks learn class hierarchy? IEEE Transactions on Visualization and Computer Graphics, Volume: 24, 2017.
[2] H.-Y.Chen, J.-H.Liang, S.-C.Chang, J.-Y.Pan, Y.T.Chen, W.Wei,andD.-C.Juan. Improving adversarial robustness via guided complement entropy. In ICCV’19, 2019.
[3] H.-Y.Chen, P.-H.Wang, C.-H.Liu, S.-C.Chang, J.-Y.Pan, Y.-T.Chen, W.Wei, and D.-C. Juan. Complement objective training. In ICLR’19, 2019.
[4] L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv preprint arXiv:1606.00915, 2016.
[5] L. Chen, G. Papandreou, F. Schroff, and H. Adam. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587, 2017.
[6] T. Devries and G. W. Taylor. Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552, 2017.
[7] W. Goo, J. Kim, G. Kim, and S. J. Hwang. Taxonomy-regularized semantic deep convolutional neural networks. In ECCV’16, 2016.
[8] P.Goyal, P.Doll´ar, R.B.Girshick, P.Noordhuis, L.Wesolowski, A.Kyrola, A.Tulloch, Y. Jia, and K. He. Accurate, large minibatch SGD: Training ImageNet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.
[9] Y. Guo, Y. Liu, E. M. Bakker, Y. Guo, and M. S. Lew. Cnn-rnn: a large-scale hierarchical image classification framework. Multimedia Tools and Applications, 2018.
[10] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR’16, 2016.
[11] K. He, X. Zhang, S. Ren, and J. Sun. Identity mappings in deep residual networks. In ECCV’16, 2016.
[12] J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation networks. In CVPR’18, June 2018.
[13] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML’15, 2015.
[14] A. Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009.
[15] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS’12, 2012.
[16] G. Lin, A. Milan, C. Shen, and I. D. Reid. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. arXivpreprintarXiv:1611.06612, 2016.
[17] G. A. Miller. Wordnet: A lexical database for english. COMMUNICATIONS OF THE ACM, 1995.
[18] R. Mottaghi, X. Chen, X. Liu, N. Cho, S. Lee, S. Fidler, R. Urtasun, and A. Yuille. The role of context for object detection and semantic segmentation in the wild. In CVPR’14, 2014.
[19] C.Murdock, Z.Li, H.Zhou, and T.Duerig. Blockout: Dynamic model selection for hierarchical deep networks. In CVPR’16, 2016.
[20] F. Redmon. Yolo9000: Better, faster, stronger. In CVPR’17, 2017.
[21] M. Ristin, J. Gall, M. Guillaumin, and L. Van Gool. From categories to subcategories: Large-scale image classification with partial class label refinement. In CVPR’15, 2015.
[22] C. E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 1948.
[23] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 2014.
[24] A.-M. Tousch, S. Herbin, and J.-Y. Audibert. Semantic hierarchies for image annotation: A survey. Pattern Recognition, 2012.
[25] H. Wu, J. Zhang, K. Huang, K. Liang, and Y. Yu. Fastfcn: Rethinking dilated
convolution in the backbone for semantic segmentation. arXiv preprint arXiv: 1903.11816, 2019.
[26] S.Xie, R.B.Girshick, P.Doll´ar, Z.Tu, and K.He. Aggregated residual transformations for deep neural networks. In CVPR’17, 2017.
[27] S. Xie, T. Yang, Xiaoyu Wang, and Yuanqing Lin. Hyper-class augmented and regularized deep learning for fine-grained image classification. In CVPR’15, 2015.
[28] Z. Yan, H. Zhang, R. Piramuthu, V. Jagadeesh, D. DeCoste, W. Di, and Y. Yu. Hdcnn: Hierarchical deep convolutional neural networks for large scale visual recognition. In ICCV’15, 2015.
[29] F.Yu, V.Koltun, and T.A.Funkhouser. Dilated residual networks. CVPR’17,2017.
[30] S. Zagoruyko and N. Komodakis. Wide residual networks. In BMVC’16, 2016.
[31] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz. Mixup: Beyond empirical risk minimization. In ICLR’18, 2018.
[32] H. Zhang, K. Dana, J. Shi, Z. Zhang, X. Wang, A. Tyagi, and A. Agrawal. Context encoding for semantic segmentation. In CVPR’18, 2018.
[33] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia. Pyramid scene parsing network. In CVPR’17, July 2017.
[34] B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso, and A. Torralba. Scene parsing through ade20k dataset. In CVPR’17, 2017