運用於圖像語意分割的決策神經網路之高效率邊緣計算

簡易檢索 / 詳目顯示

回結果列表

研究生：	張芸綺 Chang, Yun-Chi
論文名稱：	運用於圖像語意分割的決策神經網路之高效率邊緣計算 A decision network design for efficient semantic segmentation computation in edge computing
指導教授：	李濬屹 Lee, Chun-Yi
口試委員:	黃稚存 Huang, Chih-Tsun 周志遠 Chou, Jerry
學位類別：	碩士 Master
系所名稱：
論文出版年：	2018
畢業學年度：	106
語文別：	英文
論文頁數：	40
中文關鍵詞：	語意分割、深度學習、邊際運算
外文關鍵詞：	semantic segmentation, deep learning, edge computing
相關次數：	點閱：133 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

將深度類神經網路運用在邊際運算中並不是一件容易的事，因為深度類神經網路需要耗費相當大量的運算。由於邊際裝置在運算量及電力上有限制，因此將深度類神經網路簡化變成是一項重要的挑戰。因此有許多相關研究，都是在針對如何將深度網路做簡化，像是知識真餾即為其中一種方法。知識真餾的概念為使用一個較小的網路模仿原網路，必學習預測出與原網路相近的結果，進而取而代之。然而這種知識真餾的方法較適用於圖像預測，較不適用於圖像語意分割。由於圖像語意分割需要對每一個像素做預測，因此比起圖像預測需要更多的資訊，而原網路無法提供更多的資訊來訓練一個更小的網路，因此很難再語意分割問題上使用知識真餾。因此我們將問題從直接做網路簡化轉成如何更有效率的預測圖像語意分割。本篇論文的概念為我們希望能提早預測出一張輸入的圖片是否為容易預測的圖片，若是容易預測的圖片，我們將其留在邊際裝置上用較簡單的網路做預測，反之則傳送到遠端伺服器用較複雜的網路做預測。我們提出一個「決策網路」用來預測出輸入是否為容易做語意分割的圖片，進而達到節省在邊際裝置上的運算並且讓準確度維持在一定的水準之上。套用我們的決策網路之後，我們可以省下平均 25.1% 的運算量，並維持平均只有5.77%下降的準確度。

Applying the deep convolutional neural network (DCNN) on edge computing is not an easy work, since the huge computation load of DCNN makes it hard to run on an edge device. Although there are many researches about model compression like knowledge distillation, which replace the large model with the smaller one, most of them are focusing on classification issues. It is hard to apply knowledge distillation on semantic segmentation, since there is no more information the teacher model can provide than the pixel-wise ground truth data. Therefore we change our target from replace the original network to how to make it more efficient to predict semantic segmentation. The idea of this paper is that we run a large network on remote server to perform segmentation on images that need a more complicated model, and a smaller network on the edge device to deal with the other images. We propose the "Decision Network" to decide whether the input image should run on the edge device or the remote server by predicting each class intersection-over-union (IoU) of the image. We not only reduce the computation on local device but also keep the mean IoU (mIoU) very close to the result that test on the large model only. By applying the Decision Network on PASCAL 2012 validation set, DeepLab-VGGNet as the small network, and DeepLab-ResNet-101 as the large network, we save 25.1% overall computation load and with only 5.77% mIoU drop on average.

 誌謝 ............................................................................................................ v
Acknowledgements .................................................................................... vii
摘要 ............................................................................................................ ix
Abstract ....................................................................................................... xi
Introduction ............................................................................................... 1
Related Work ............................................................................................. 7
1 DeepConvolutionalNeuronNetwork ......................................................... 7
2 ObjectDetection ..................................................................................... 8
3 SemanticSegmentation........................................................................... 9
4 KnowledgeDistillation.............................................................................. 11
Proposed Architecture and Training Methodology .................................... 13
1 DecisionNetworkandtheControlUnit ........................................................ 14
2 BaseNetwork............................................................................................ 16
3 TrainingMethodology .............................................................................. 18
4 EvaluationCriteria .................................................................................... 20
Experimental Results .................................................................................. 23
1 Experimentalsetup ................................................................................... 23
2 ExperimentsonDecisionNetwork .............................................................. 25
3 ComputationReduction ............................................................................ 26
4 DecisionNetworkVariants ........................................................................ 27
5 ControlUnit .............................................................................................. 29
Conclusion ................................................................................................. 33
Future Work ............................................................................................... 35
References .................................................................................................... 37

                                

[1] M. Everingham et al., “The pascal visual object classes challenge: A retrospec- tive,” Int. J. Computer Vision, vol. 111, pp. 98–136, Jan. 2015.
[2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. Neural Information Processing Systems (NIPS), pp. 1097–1105, Dec. 2012.
[3] S.Ren,K.He,R.Girshick,andJ.Sun,“Deepresiduallearningforimagerecogni- tion,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, Jun. 2016.
[4] S. Zagoruyko and N. Komodakis, “Wide residual networks,” in Proc. British Ma- chine Vision Conf. (BMVC), pp. 87.1–87.12, Sep. 2016.
[5] C. S. et al., “Going deeper with convolutions,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 1–9, Jun. 2015.
[6] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large- scale image recognition,” in Proc. Int. Conf. Learning Representations (ICLR), pp. 1–14, May 2015.
[7] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for ac- curate object detection and semantic segmentation,” in Proc. IEEE Conf. on Com- puter Vision and Pattern Recognition (CVPR), pp. 580–587, Jun. 2014.
[8] R.B.Girshick,“Fastr-cnn,”inProc.IEEEInt.Conf.onComputerVision(ICCV), pp. 1440–1448, Dec. 2015.
[9] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 39, pp. 1137–1149, 2015.
[10] J. Dai, Y. Li, K. He, and J. Sun, “R-fcn: Object detection via region-based fully convolutional networks,” in Proc. Neural Information Processing Systems (NIPS), pp. 379–387, Dec. 2016.
[11] A. Shrivastava and A. Gupta, “Contextual priming and feedback for faster r-cnn,” in Proc. European Conf. Computer Vision (ECCV), vol. 9905, pp. 330–348, Oct. 2016.
[12] J.Redmon,S.Divvala,R.Girshick,andA.Farhadi,“Youonlylookonce:Unified, real-time object detection,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 779–788, Jun. 2016.
[13] W.L.etal.,“Inceptionsingleshotmultiboxdetectorforobjectdetection,”inProc. European Conf. Computer Vision (ECCV), vol. 9905, pp. 22–37, Oct. 2016.
[14] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in Proc. IEEE Int. Conf. on Computer Vision (ICCV), pp. 3431–3440, Oct. 2017.
[15] P. O. Pinheiro, R. Collobert, and P. Dollár, “Learning to segment object candi- dates,” in Proc. Neural Information Processing Systems (NIPS), pp. 1990–1998, Dec. 2015.
[16] J. Dai, K. He, and J. Sun, “Instance-aware semantic segmentation via multi-task network cascades,” in Proc. IEEE Conf. on Computer Vision and Pattern Recog- nition (CVPR), pp. 3150–3158, Jun. 2016.
[17] P.O.Pinheiro,R.Collobert,andP.Dollár,“Learningtorefineobjectsegments,”in Proc. European Conf. Computer Vision(ECCV), vol. 9905, pp. 75–91, Oct. 2016.
[18] S. Z. et al., “A multipath network for object detection,” in Proc. British Machine Vision Conf. (BMVC), pp. 15.1–15.12, Sep. 2016.
[19] H.Hu,S.Lan,Y.Jiang,Z.Cao,andF.Sha,“Fastmask:Segmentmulti-scaleobject candidates in one shot,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 2280–2288, Jan. 2017.
[20] S. H. et al., “Dsd: Regularizing deep neural networks with dense-sparse-dense training flow,” arXiv:1607.04381, Jul. 2016.
[21] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic image segmentation with deep convolutional nets and fully connected CRFs,” in Proc. Int. Conf. Learning Representations (ICLR), May 2015.
[22] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs,” IEEE Trans. Pattern Analysis and Machine Intelli- gence (TPAMI), Apr 2017.
[23] L.-C. Chen, G. Papandreou, S. F, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,” arXiv:1706.0558, Aug. 2017.
[24] K. R. Konda and R. Memisevic, “Unsupervised learning of depth and motion,” arXiv:1312.3429v2, Dec. 2013.
[25] D.TeneyandM.Hebert,“Learningtoextractmotionfromvideosinconvolutional neural networks,” in Proc. Asian Conf. on Computer Vision (ACCV), vol. 10115, pp. 412–428, Mar. 2017.
[26] H. Fang, S. Xie, Y.-W. Tai, and C. Lu, “Rmpe: Regional multi-person pose es- timation,” in Proc. IEEE Int. Conf. on Computer Vision (ICCV), pp. 2353–2362, Oct. 2017.
[27] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime multi-person 2d pose es- timation using part affinity fields,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 1302–1310, May 2017.
[28] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” pp. 2818–2826, Jun. 2016.
[29] C.Szegedy,S.Ioffe,V.Vanhoucke,andA.Alemi,“Inception-v4,inception-resnet and the impact of residual connections on learning,” in Proc. Association for the Advancement of Artificial Intelligence (AAAI), p. 4278 4284, Feb. 2017.
[30] M.Lin,Q.Chen,andS.Yan,“Networkinnetwork,”arXiv:1312.4400,Mar.2014.
[31] J.Long,E.Shelhamer,andT.Darrell,“Fullyconvolutionalnetworksforsemantic segmentation,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440, Jun. 2015.
[32] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolu- tional networks for visual recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence (TPAMI), vol. 37, pp. 1904–1916, Sep. 2015.
[33] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, “Learning transferable architec- tures for scalable image recognition,” arXiv:1707.07012, Dec. 2017.
[34] O. R. et al., “Imagenet large scale visual recognition challenge,” Int. Journal of Computer Vision (IJCV), vol. 115, no. 3, pp. 211–252, 2015.
[35] H.Z.etal.,“Pyramidsceneparsingnetwork,”inProc.ofIEEEConf.onComputer Vision and Pattern Recognition (CVPR), Apr. 2017.
[36] H.Z.etal.,“Icnetforreal-timesemanticsegmentationonhigh-resolutionimages,” arXiv preprint arXiv:1704.08545, Apr. 2017.
[37] P.G.L.etal.,“Edge-centriccomputing:Visionandchallenges,”ACMSIGCOMM Computer Communication Review, vol. 45, pp. 37–42, Oct. 2015.
[38] J. Ba and R. Caruana, “Do deep nets really need to be deep?,” in Proc. Neural Information Processing Systems (NIPS), Dec. 2014.
[39] G.Hinton,O.Vinyals,andJ.Dean,“Distillingtheknowledgeinaneuralnetwork,” in Proc. Neural Information Processing Systems (NIPS) Workshop, Dec. 2014.
[40] A. R. et al., “Fitnets: Hints for thin deep nets,” in Proc. Int. Conf. on Learning Representations (ICLR), May 2015.
[41] F. N. I. et al., “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and<0.5mb model size,” arXiv:1602.07360, Nov. 2016.
[42] C. Lo, Y.-Y. Su, C.-Y. Lee, and S.-C. Chang, “A dynamic deep neural network design for efficient workload allocation in edge computing,” in Proc. IEEE Int. Conf. Computer Design (ICCV), pp. 273–280, Nov. 2017.
[43] J. M. Alvarez and M. Salzmann, “Learning the number of neurons in deep net- works,” in Proc. Advances in Neural Information Processing Systems (NIPS), pp. 2270–2278, Dec. 2016.
[44] H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning filters for efficient convnets,” Apr. 2017.
[45] Y. He, X. Zhang, and J. Sun, “Channel pruning for accelerating very deep neural networks,” Oct. 2017.
[46] J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders, “Selective search for object recognition,” Int. Journal of Computer Vision, vol. 104, pp. 154–171, Sep. 2013.
[47] T. L. et al., “Microsoft coco: Common objects in context,” in Proc. European Conf. Computer Vision (ECCV), pp. 740–755, Sep. 2014.
[48] “convnet-burden.” https://github.com/albanie/convnet-burden.

簡易檢索 / 詳目顯示

相關論文