自動搜尋於影像語意分割的動態推論模型架構

簡易檢索 / 詳目顯示

回結果列表

研究生：	孔啟熙 Kung, Chi-Hsi
論文名稱：	自動搜尋於影像語意分割的動態推論模型架構 Auto-Dynamic-DeepLab: A Fine-grained Dynamic Inference Architecture for Semantic Image Segmentation
指導教授：	李哲榮 Lee, Che-Rung
口試委員:	陳煥宗 Chen, Hwann-Tzong 黃稚存 Huang, Chih-Tsun
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2020
畢業學年度：	108
語文別：	英文
論文頁數：	26
中文關鍵詞：	動態推論、影像語意分割、神經架構搜索、高效推論、電腦視覺、自動機器學習
外文關鍵詞：	Dynamic Inference, Semantic Image Segmentation, Neural Architecture Search, Efficient Inference, Computer Vision, AutoML
相關次數：	點閱：3 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

提出為動態推論的模型設計自動機器學習中的架構搜索找出更好的架構。現有的動態推論以神經方塊為單位插入新的分類器達到效果，我們擔憂其方法的優化效果，於是提出使⽤自動搜索模型中的細胞來來動態推論。單位更細小的細胞可以達到顆粒度更細微的多分類器模型。我們所提出的架構其中包含了（一）完全連結細胞取代舊有模型中的細胞，（二）使用自動搜索架構的方法重新尋求更優化的網路架構以及（三）提早決定者來來加速推論。研究實驗設計於影像語義分割中。實驗結果展⽰可以接近一般通用模型的準確率下提升速度1.6倍，在高速模式下達到2.15倍的速度以及只有2%精準度下降。

Dynamic inference that adaptively skips parts of model execution based on the complexity of input data can effectively reduce the computation cost of deep learning models during the inference. However, current architectures for dynamic inference only consider the exits at the block level, whose results may not be suitable for different applications. In this paper, we present the Auto-Dynamic-DeepLab (ADD), a network architecture that enables the fine-grained dynamic inference for semantic image segmentation. To allow the exit points in the cell level, ADD utilizes Neural Architectural Search (NAS), supported by the framework of Auto-DeepLab, to seek the optimal network structure. In addition, ADDreplaces the cells in Auto-DeepLab with the densely connected cells to ease the interference among multiple classifiers and employs the earlier decision-maker to further optimize the performance. Experimental results show that ADD can achieve similar accuracy as Auto-DeepLab in terms of mIoU with 1.6 times speedup. For the fast mode, ADD can achieve 2.15 times speedup with only a 2% accuracy drop compared to those of Auto-DeepLab.

1. Introduction                                         1
2. Related Work                                         4
2-1. Semantic Image Segmentation                        4
2-2. Efficient Inference                                5
2-3. Neural Architecture Search                         6
3. Auto-Dynamic-DeepLab                                 7
3-1. Overview                                           7
3-2. Cell Design                                        8
3-3. Network Architecture Search                        9
3-4. Dynamic Inference for Semantic Image Segmentation  11
4. Experimental Results                                 14
4-1. Training of ADD                                    14
4-2. Searched Architecture                              15
4-3. Ablation Study of Auto-Dynamic-DeepLab             16
4-4. Earlier Decision Maker                             18
5. Conclusions                                          21
References                                              22
                                

[1] Karen Simonyan and Andrew Zisserman. “Very Deep Convolutional Networks for large-Scale Image Recognition”.International Conference on Learning Representations. 2015.
[2] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. “ImageNet Classification with Deep Convolutional Neural Networks”.Advances in Neural Information Pro-cessing Systems 25. Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1097–1105.URL:http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf.
[3] Christian Szegedy et al. “Going Deeper With Convolutions”.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015.
[4] Kaiming He et al. “Deep Residual Learning for Image Recognition”.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016.
[5] Gao Huang et al. “Densely Connected Convolutional Networks”.Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
[6] Shaoqing Ren et al. “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”.Advances in Neural Information Processing Systems 28.Ed. by C. Cortes et al. Curran Associates, Inc., 2015, pp. 91–99.URL:http://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks.pdf.
[7] Kaiming He et al. “Mask R-CNN”.The IEEE International Conference on ComputerVision (ICCV). 2017.
[8] Francois Chollet. “Xception: Deep Learning With Depthwise Separable Convolutions”.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2017.
[9] Jie Hu, Li Shen, and Gang Sun. “Squeeze-and-Excitation Networks”.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2018.
[10] Saining Xie et al. “Aggregated Residual Transformations for Deep Neural Networks”.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017.
[11] Hengshuang Zhao et al. “Pyramid Scene Parsing Network”.CVPR. 2017.
[12] L. Chen et al. “DeepLab: Semantic Image Segmentation with Deep ConvolutionalNets, Atrous Convolution, and Fully Connected CRFs”.IEEE Transactions on Pattern Analysis and Machine Intelligence40.4 (2018), pp. 834–848.
[13] Xiaolong Wang et al. “Non-Local Neural Networks”.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2018.[14] Zilong Huang et al. “CCNet: Criss-Cross Attention for Semantic Segmentation”.IEEE International Conference on Computer Vision (ICCV). 2019.
[15] Jun Fu et al. “Dual Attention Network for Scene Segmentation”.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2019.[16] Gao Huang et al. “Multi-Scale Dense Networks for Resource Efficient Image Classification”.International Conference on Learning Representations. 2018.URL:https://openreview.net/forum?id=Hk2aImxAb.22
[17] S. Yokoo, S. Iizuka, and K. Fukui. “MLSNet: Resource-Efficient Adaptive Inferencewith Multi-Level Segmentation Networks”.2019 IEEE International Conference on Image Processing (ICIP). 2019, pp. 1510–1514.[18] Chenxi Liu et al. “Auto-DeepLab: Hierarchical Neural Architecture Search for Se-mantic Image Segmentation”.The IEEE Conference on Computer Vision and pattern recognition (CVPR). 2019.[19] Marius Cordts et al. “The Cityscapes Dataset for Semantic Urban Scene Understand-ing”.Proc. of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2016.[20] Jonathan Long, Evan Shelhamer, and Trevor Darrell. “Fully Convolutional Networks for Semantic Segmentation”.The IEEE Conference on Computer Vision and pattern recognition (CVPR). 2015.[21] V. Badrinarayanan, A. Kendall, and R. Cipolla. “SegNet: A Deep ConvolutionalEncoder-Decoder Architecture for Image Segmentation”.IEEE Transactions on Pattern Analysis and Machine Intelligence39 (2017).[22] Liang-Chieh Chen et al. “Semantic Image Segmentation with Deep ConvolutionalNets and Fully Connected CRFs”.3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference TrackProceedings. Ed. by Yoshua Bengio and Yann LeCun. 2015.URL:http://arxiv.org/abs/1412.7062.[23] Liang-Chieh Chen et al. “Rethinking Atrous Convolution for Semantic Image Segmentation”.arXiv:1706.05587(2017).URL:http:/ /arxiv.org/abs /1706.05587.[24] Liang-Chieh Chen et al. “Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation”.The European Conference on Computer Vision (ECCV).2018.[25] Fisher Yu, Vladlen Koltun, and Thomas Funkhouser. “Dilated Residual Networks”.Computer Vision and Pattern Recognition (CVPR). 2017.[26] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. “U-net: Convolutional networks for biomedical image segmentation”.International Conference on Medical image computing and computer-assisted intervention. Springer. 2015, pp. 234–241.[27] Towaki Takikawa et al. “Gated-SCNN: Gated Shape CNNs for Semantic Segmentation”.The IEEE International Conference on Computer Vision (ICCV). 2019.
[28] Jingdong Wang et al. “Deep High-Resolution Representation Learning for VisualRecognition”.TPAMI(2019).
[29] Guosheng Lin et al. “RefineNet: Multi-Path Refinement Networks for Dense Prediction”.IEEE Transactions on Pattern Analysis and Machine Intelligence(2019).DOI:10.1109/TPAMI.2019.2893630.[30] Andrew G. Howard et al. “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications”.arXiv preprint arXiv:1704.04861(2017).[31] Xiangyu Zhang et al. “ShuffleNet: An Extremely Efficient Convolutional Neural Net-work for Mobile Devices”.The IEEE Conference on Computer Vision and pattern recognition (CVPR). 2018.23
[32] Mark Sandler et al. “MobileNetV2: Inverted Residuals and Linear Bottlenecks”.IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2018.
[33] Forrest N. Iandola et al. “SqueezeNet: AlexNet-level accuracy with 50x fewer param-eters and <1MB model size”.arXiv preprint arXiv:1602.07360(2016).URL:http://arxiv.org/abs/1602.07360.
[34] Forrest N. Iandola et al. “SqueezeNet: AlexNet-level accuracy with 50x fewer pa-rameters and<0.5MB model size”.arXiv:1602.07360(2016).
[35] Geoffrey Hinton, Oriol Vinyals, and Jeffrey Dean. “Distilling the Knowledge in a Neural Network”.NIPS Deep Learning and Representation Learning Workshop.2015.URL:http://arxiv.org/abs/1503.02531.
[36] Song Han, Huizi Mao, and William J. Dally. “Deep Compression: CompressingDeep Neural Network with Pruning, Trained Quantization and Huffman Coding”.4th International Conference on Learning Representations, ICLR 2016, San Juan,Puerto Rico, May 2-4, 2016, Conference Track Proceedings. Ed. by Yoshua Bengioand Yann LeCun. 2016.URL:http://arxiv.org/abs/1510.00149.
[37] Surat Teerapittayanon, Bradley McDanel, and Hsiang-Tsung Kung. “Branchynet:Fast inference via early exiting from deep neural networks”.2016 23rd InternationalConference on Pattern Recognition (ICPR). IEEE. 2016, pp. 2464–2469.
[38] Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. “FractalNet: Ultra-Deep Neural Networks without Residuals”.ICLR. 2017.[39] Tolga Bolukbasi et al. “Adaptive Neural Networks for Efficient Inference”.Proceedings of the 34th International Conference on Machine Learning - Volume 70.ICML’17. Sydney, NSW, Australia: JMLR.org, 2017, 527–536.
[40] Yu-Shuan Xu et al. “Dynamic Video Segmentation Network”.IEEE Conference onComputer Vision and Pattern Recognition (CVPR). 2018.
[41] Song Han et al. “Learning both weights and connections for efficient neural net-work”.Advances in neural information processing systems. 2015, pp. 1135–1143.
[42] Matthieu Courbariaux et al. “Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1”.arXiv preprint arXiv:1602.02830(2016).
[43] Chenzhuo Zhu et al. “Trained Ternary Quantization”.arXiv preprint arXiv:1612.01064(2016).
[44] Kuan Wang et al. “Haq: Hardware-aware automated quantization with mixed precision”.Proceedings of the IEEE conference on computer vision and pattern recognition. 2019, pp. 8612–8620.
[45] Michael Figurnov et al. “Spatially Adaptive Computation Time for Residual Net-works”.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2017.
[46] Weizhe Hua et al. “Channel gating neural networks”.Advances in Neural Information Processing Systems. 2019, pp. 1886–1896.
[47] Xin Wang et al. “SkipNet: Learning Dynamic Routing in Convolutional Networks”.The European Conference on Computer Vision (ECCV). 2018.24
[48] Andreas Veit and Serge Belongie. “Convolutional Networks with Adaptive InferenceGraphs”.The European Conference on Computer Vision (ECCV). 2018.[49] Zuxuan Wu et al. “BlockDrop: Dynamic Inference Paths in Residual Networks”.CVPR. 2018.
[50] Barret Zoph and Quoc V. Le. “Neural Architecture Search with Reinforcement Learning”. 2017.URL:https://arxiv.org/abs/1611.01578.
[51] Esteban Real et al. “Large-Scale Evolution of Image Classifiers”.Proceedings of the34th International Conference on Machine Learning - Volume 70. ICML’17. Sydney, NSW, Australia: JMLR.org, 2017, 2902–2911.
[52] Barret Zoph et al. “Learning Transferable Architectures for Scalable Image Recognition”.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2018.
[53] Hanxiao Liu, Karen Simonyan, and Yiming Yang. “DARTS: Differentiable Architecture Search”.International Conference on Learning Representations. 2019.URL:https://openreview.net/forum?id=S1eYHoC5FX.[54] Han Cai, Ligeng Zhu, and Song Han. “ProxylessNAS: Direct Neural ArchitectureSearch on Target Task and Hardware”.International Conference on Learning Representations. 2019.URL:https://arxiv.org/pdf/1812.00332.pdf.
[55] Hanxiao Liu et al. “Hierarchical Representations for Efficient Architecture Search”.International Conference on Learning Representations. 2018.URL:https://openreview.net/forum?id=BJQRKzbA-.
[56] L. Xie and A. Yuille. “Genetic CNN”.2017 IEEE International Conference on Computer Vision (ICCV). 2017, pp. 1388–1397.
[57] Risto Miikkulainen et al. “Evolving deep neural networks”.Artificial Intelligence in the Age of Neural Networks and Brain Computing. Elsevier, 2019, pp. 293–312.
[58] Barret Zoph et al. “Learning transferable architectures for scalable image recognition”.Proceedings of the IEEE conference on computer vision and pattern recognition. 2018, pp. 8697–8710.
[59] Surat Teerapittayanon, Bradley McDanel, and Hsiang-Tsung Kung. “Distributed deep neural networks over the cloud, the edge and end devices”.2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE. 2017, pp. 328–339.
[60] Sanghyun Woo et al. “Cbam: Convolutional block attention module”.Proceedings of the European conference on computer vision (ECCV). 2018, pp. 3–19.
[61] Jie Hu, Li Shen, and Gang Sun. “Squeeze-and-excitation networks”.Proceedings of the IEEE conference on computer vision and pattern recognition. 2018, pp. 7132–7141.
[62] Diederik P. Kingma and Jimmy Ba. “Adam: A Method for Stochastic Optimization”.3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings. Ed. by Yoshua Bengio and Yann LeCun. 2015.URL:http://arxiv.org/abs/1412.6980.
[63] Wei Liu, Andrew Rabinovich, and Alexander C Berg. “Parsenet: Looking wider to see better”.arXiv preprint arXiv:1506.04579(2015).25
[64] Yuan Yuhui and Wang Jingdong. “Ocnet: Object context network for scene parsing”.arXiv preprint arXiv:1809.00916(2018).

簡易檢索 / 詳目顯示

相關論文