簡易檢索 / 詳目顯示

研究生: 謝冠賢
Hsieh, Kuan-Hsian
論文名稱: 使用注意力和多尺度加速八度卷積
FOCAM: Faster Octave Convolution Using Attention and Multi-scaling
指導教授: 李哲榮
Lee, Che-Rung
口試委員: 陳煥宗
Chen, Hwann-Tzong
王聖智
Wang, Sheng-Jyh
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 29
中文關鍵詞: 注意力多尺度八度卷積卷積
外文關鍵詞: FOCAM, Octave Convolution, Multi-scaling
相關次數: 點閱:99下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 八度捲積提出將特徵圖中不同解析度的特徵分開進行捲積運算,不但可以有效的減少空間和計算上的冗餘,還可以提高捲積網路的精度。在這篇論文中,我們提出了一個更加快速的八度捲積--FOCAM進一步的減少八度捲積在捲積網路中的計算成本。FOCAM和八度捲積類似,都是一個簡單,通用且即插即用捲積運算,並且FOCAM和八度捲積都把輸入和輸出的特徵圖都切成不同解析度來進行運算和存取,但FOCAM使用注意力機制來取代八度捲積中不同解析特徵圖資訊交換的部份,始用多尺度來取代高解析特徵圖資訊更新的部份並且用捲積分解來進一步的減少計算成本。在資料集為ImageNet的實驗中,我們可以發現FOCAM可以比一般的機網路減少42\%到54\%的計算成本,且可以比八度捲積減少24\% 到 38\%的計算成本在相近的top-1 和top-5的精度上面。


    Octave convolution that separates the operations for different resolutions is an effective method to reduce the spatial redundancy and improve the accuracy in Convolution Neural Networks (CNN). In this paper, we propose a faster version of octave convolution, FOCAM, which can further reduce the computational cost of CNNs. Similar to the octave convolution, FOCAM creates a single, generic, plug-and-play convolution unit and divides the input and output feature maps into the domains of different resolutions, but without explicit information exchange among them. Instead, FOCAM employs the attention mechanism to collect information conveyed at different resolutions. In addition, multi-scaled convolution kernels are utilized to learn different sized spatial features, and using factorizing convolutions can effectively reduce the computation cost.Experiments on various depth ResNet with ImageNet data-set have shown that FOCAM can reduce 42\% to 54\% operations of the original models, and save 24\% to 38\% FLOPS of the models using octave convolutions, with similar top-1 and top-5 accuracy.

    中文摘要---------------------------------------------------------1 Abstract--------------------------------------------------------2 List of Figures-------------------------------------------------5 List of Tables--------------------------------------------------6 1.Introduction--------------------------------------------------7 2.Related Work--------------------------------------------------10 2.1 Multiple Resolution-----------------------------------------10 2.2 Inception and Multi-Scaling---------------------------------11 2.3 Attention Mechanism-----------------------------------------11 3.Architecture--------------------------------------------------13 3.1 Octave Convolution------------------------------------------13 3.2 Faster Octave Convolution using Attention (FOCA)------------16 3.3 Faster Octave Convolution using Multi-scaling (FOCM)--------18 3.4 Faster Octave Convolution using Attention and Multi-scaling (FOCAM)---------------------------------------------------------20 4.Experiments---------------------------------------------------21 4.1 Performance Evaluations-------------------------------------21 4.2 Ablation Study on Attention---------------------------------23 4.3 Ablation Study on Multi-scaling-----------------------------23 5.Conclusion----------------------------------------------------25 6.References----------------------------------------------------26 7.Appendix------------------------------------------------------29

    [1] Kaiming He et al. “Deep residual learning for image recognition”.Proceedings ofthe IEEE conference on computer vision and pattern recognition. 2016, pp. 770–778.
    [2] Kaiming He et al. “Identity mappings in deep residual networks”.Europeanconference on computer vision. Springer. 2016, pp. 630–645.
    [3] Shanghua Gao et al. “Res2net: A new multi-scale backbone architecture”.IEEEtransactions on pattern analysis and machine intelligence(2019).
    [4] Saining Xie et al. “Aggregated residual transformations for deep neural net-works”.Proceedings of the IEEE conference on computer vision and patternrecognition. 2017, pp. 1492–1500.
    [5] Christian Szegedy et al. “Going deeper with convolutions”.Proceedings of theIEEE conference on computer vision and pattern recognition. 2015, pp. 1–9.
    [6] Christian Szegedy et al. “Rethinking the inception architecture for computervision”.Proceedings of the IEEE conference on computer vision and patternrecognition. 2016, pp. 2818–2826.
    [7] Christian Szegedy et al. “Inception-v4, inception-resnet and the impact of resid-ual connections on learning”.arXiv preprint arXiv:1602.07261(2016).
    [8] Gao Huang et al. “Densely connected convolutional networks”.Proceedings ofthe IEEE conference on computer vision and pattern recognition. 2017, pp. 4700–4708.
    [9] Karen Simonyan and Andrew Zisserman. “Very deep convolutional networks forlarge-scale image recognition”.arXiv preprint arXiv:1409.1556(2014).
    [10] Forrest Iandola et al. “Densenet: Implementing efficient convnet descriptor pyra-mids”.arXiv preprint arXiv:1404.1869(2014).[11] Xiangyu Zhang et al. “Shufflenet: An extremely efficient convolutional neuralnetwork for mobile devices”.Proceedings of the IEEE conference on computervision and pattern recognition. 2018, pp. 6848–6856.
    [12] François Chollet. “Xception: Deep learning with depthwise separable convo-lutions”.Proceedings of the IEEE conference on computer vision and patternrecognition. 2017, pp. 1251–1258.
    [13] Mark Sandler et al. “Mobilenetv2: Inverted residuals and linear bottlenecks”.Proceedings of the IEEE conference on computer vision and pattern recognition.2018, pp. 4510–4520.
    [14] Huiyu Wang et al. “Elastic: Improving cnns with dynamic scaling policies”.Pro-ceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019, pp. 2258–2267.
    [15] Chun-Fu Chen et al. “Big-little net: An efficient multi-scale feature representa-tion for visual and speech recognition”.arXiv preprint arXiv:1807.03848(2018).
    [16] Yunpeng Chen et al. “Drop an octave: Reducing spatial redundancy in con-volutional neural networks with octave convolution”.Proceedings of the IEEEInternational Conference on Computer Vision. 2019, pp. 3435–3444.
    [17] Jie Hu, Li Shen, and Gang Sun. “Squeeze-and-excitation networks”.Proceed-ings of the IEEE conference on computer vision and pattern recognition. 2018,pp. 7132–7141.
    [18] Sanghyun Woo et al. “Cbam: Convolutional block attention module”.Proceed-ings of the European conference on computer vision (ECCV). 2018, pp. 3–19.
    [19] Xiang Li et al. “Selective kernel networks”.Proceedings of the IEEE conferenceon computer vision and pattern recognition. 2019, pp. 510–519.
    [20] Qilong Wang et al. “ECA-net: Efficient channel attention for deep convolutionalneural networks”.Proceedings of the IEEE/CVF Conference on Computer Vi-sion and Pattern Recognition. 2020, pp. 11534–11542.
    [21] Hengshuang Zhao et al. “Pyramid scene parsing network”.Proceedings of theIEEE conference on computer vision and pattern recognition. 2017, pp. 2881–2890.
    [22] Tsung-Yi Lin et al. “Feature pyramid networks for object detection”.Proceed-ings of the IEEE conference on computer vision and pattern recognition. 2017,pp. 2117–2125.
    [23] Ke Sun et al. “Deep high-resolution representation learning for human poseestimation”.Proceedings of the IEEE conference on computer vision and patternrecognition. 2019, pp. 5693–5703.
    [24] Gao Huang et al. “Multi-scale dense networks for resource efficient image clas-sification”.arXiv preprint arXiv:1703.09844(2017).
    [25] Tsung-Wei Ke, Michael Maire, and Stella X Yu. “Multigrid neural architec-tures”.Proceedings of the IEEE Conference on Computer Vision and PatternRecognition. 2017, pp. 6665–6673.
    [26] Sergey Ioffe and Christian Szegedy. “Batch normalization: Accelerating deepnetwork training by reducing internal covariate shift”.arXiv preprint arXiv:1502.03167(2015).
    [27] Kelvin Xu et al. “Show, attend and tell: Neural image caption generation withvisual attention”.International conference on machine learning. 2015, pp. 2048–2057.
    [28] Tony Lindeberg.Scale-space theory in computer vision. Vol. 256. Springer Sci-ence & Business Media, 2013.
    [29] Jia Deng et al. “Imagenet: A large-scale hierarchical image database”.2009 IEEEconference on computer vision and pattern recognition. Ieee. 2009, pp. 248–255.
    [30] Vinod Nair and Geoffrey E Hinton. “Rectified linear units improve restrictedboltzmann machines”.ICML. 2010.

    QR CODE