研究生: |
鄭安傑 Cheng, An-Chieh |
---|---|
論文名稱: |
實例感知神經架構搜索 InstaNAS: Instance-aware Neural Architecture Search |
指導教授: |
孫民
Sun, Min |
口試委員: |
邱維辰
Chiu, Wei-Chen 胡敏君 Hu, Min-Chun |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 英文 |
論文頁數: | 32 |
中文關鍵詞: | 神經架構搜索 |
外文關鍵詞: | Neural Architecture Search |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
常規神經架構搜索(NAS)目標是在搜索空間中找到一神經網路架構,該網路架構能在目標任務中有最優表現,例如準確度最高。然而,單一神經網路架構並不足以處理高歧異度或高多樣性的資料集。若要同時優化多個目標,應對於資料集中的不同領域,分別由專精該領域之特徵的專家網路架構負責。因此我們提出實例感知神經架構搜索,透過訓練一控制器(controller)來尋找搜索空間中的神經架構布點(distribution),使得模型可以對於較困難的實例使用較複雜的神經架構,對於普通的實例使用較簡易的神經架構。在推論時,我們的控制器可以將每個單一實例分配給為其量身打造的專家神經架構負責,達到高準確度低延遲的效果。我們設計了一個以MobileNetV2為基線的搜索空間,並在多個資料集中實驗我們的方法,在不降低準確度的前提下,延遲最多可以減少48%。另外我們也展現我們方法的其中一個可能應用:透過不同專家神經架構,盡可能滿足不同情境下的硬體服務品質(QoS)需求。
Conventional Neural Architecture Search (NAS) aims at finding a single architecture that achieves the best performance, which usually optimizes task related learning objectives such as accuracy. However, a single architecture may not be representative enough for the whole dataset with high diversity and variety. Intuitively, electing domain-expert architectures that are proficient in domain-specific features can further benefit architecture related objectives such as latency. In this paper, we propose InstaNAS---an instance-aware NAS framework---that employs a controller trained to search for a "distribution of architectures" instead of a single architecture; This allows the model to use sophisticated architectures for the difficult samples, which usually comes with large architecture related cost, and shallow architectures for those easy samples. During the inference phase, the controller assigns each of the unseen input samples with a domain expert architecture that can achieve high accuracy with customized inference costs. Experiments within a search space inspired by MobileNetV2 show InstaNAS can achieve up to 48.8% latency reduction without compromising accuracy on a series of datasets against MobileNetV2. We also present a possible application of our approach for satisfying different levels of Quality of Service (QoS) metrics.
[1] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern
recognition, pp. 770–778, 2016. xi, 5, 16, 18
[2] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.C. Chen, “Mobilenetv2:
Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pp. 4510–4520, 2018. xi, xiii, 16,
18, 19
[3] G. Bender, P.J. Kindermans, B. Zoph, V. Vasudevan, and Q. Le, “Understanding and simplifying oneshot architecture search,” in International Conference on
Machine Learning, pp. 549–558, 2018. xi, 4, 10, 20, 21
[4] A. Veit and S. Belongie, “Convolutional networks with adaptive inference graphs,”
in Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–
18, 2018. xi, 20, 21
[5] Z. Wu, T. Nagarajan, A. Kumar, S. Rennie, L. S. Davis, K. Grauman, and R. Feris,
“Blockdrop: Dynamic inference paths in residual networks,” in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pp. 8817–8826,
2018. xi, 5, 11, 20, 21
[6] L. McInnes, J. Healy, and J. Melville, “UMAP: Uniform Manifold Approximation
and Projection for Dimension Reduction,” ArXiv eprints, Feb. 2018. xii, 23
[7] L. McInnes, J. Healy, N. Saul, and L. Grossberger, “Umap: Uniform manifold approximation and projection,” The Journal of Open Source Software, vol. 3, no. 29,
p. 861, 2018. xii, 23
[8] A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” tech. rep., Citeseer, 2009. xiii, 18
[9] N. Ma, X. Zhang, H.T. Zheng, and J. Sun, “Shufflenet v2: Practical guidelines
for efficient cnn architecture design,” in Computer Vision ECCV 2018 15th European Conference, Munich, Germany, September 814, 2018, Proceedings, Part
XIV, pp. 122–138, 2018. xiii, 18, 19
[10] M. Tan, B. Chen, R. Pang, V. Vasudevan, and Q. V. Le, “Mnasnet: Platformaware
neural architecture search for mobile,” arXiv preprint arXiv:1807.11626, 2018.
xiii, 2, 4, 15, 18
[11] D. Ha Kim, S. Hyun Lee, and B. Cheol Song, “Munet: Macro unitbased convolutional neural network for mobile devices,” in Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition Workshops, pp. 1668–1676, 2018.
xiii, 19
[12] B. Zoph and Q. V. Le, “Neural architecture search with reinforcement learning,”
2017. 1, 4, 11
[13] A.C. Cheng, C. H. Lin, D.C. Juan, W. Wei, and M. Sun, “Instanas: Instanceaware neural architecture search.” 2
[14] J.D. Dong, A.C. Cheng, D.C. Juan, W. Wei, and M. Sun, “Dppnet: Deviceaware progressive search for paretooptimal neural architectures,” in Computer
Vision ECCV 2018 15th European Conference, Munich, Germany, September
814, 2018, Proceedings, Part XI, pp. 540–555, 2018. 2, 4, 15, 18
[15] C.H. Hsu, S.H. Chang, D.C. Juan, J.Y. Pan, Y.T. Chen, W. Wei, and S.C.
Chang, “Monas: Multiobjective neural architecture search using reinforcement
learning,” arXiv preprint arXiv:1806.10332, 2018. 2
[16] T.J. Yang, A. Howard, B. Chen, X. Zhang, A. Go, M. Sandler, V. Sze, and
H. Adam, “Netadapt: Platformaware neural network adaptation for mobile applications,” in Computer Vision ECCV 2018 15th European Conference, Munich,
Germany, September 814, 2018, Proceedings, Part X, pp. 289–304, 2018. 2
[17] A.C. Cheng, C. H. Lin, D.C. Juan, W. Wei, and M. Sun, “Qosaware neural architecture search.” 3
[18] R. Vilalta and Y. Drissi, “A perspective view and survey of metalearning,” Artificial Intelligence Review, vol. 18, no. 2, pp. 77–95, 2002. 3
[19] E. Real, S. Moore, A. Selle, S. Saxena, Y. L. Suematsu, J. Tan, Q. Le, and A. Kurakin, “Largescale evolution of image classifiers,” in Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 611 August 2017, pp. 2902–2911, 2017. 4
[20] E. Real, A. Aggarwal, Y. Huang, and Q. V. Le, “Regularized evolution for image
classifier architecture search,” arXiv preprint arXiv:1802.01548, 2018. 4
[21] L. Xie and A. L. Yuille, “Genetic cnn.,” in ICCV, pp. 1388–1397, 2017. 4
[22] T. Elsken, J. H. Metzen, and F. Hutter, “Multiobjective architecture search for
cnns,” arXiv preprint arXiv:1804.09081, 2018. 4
[23] Y. Zhou, S. Ebrahimi, S. Ö. Arık, H. Yu, H. Liu, and G. Diamos, “Resourceefficient neural architect,” arXiv preprint arXiv:1806.07912, 2018. 4
[24] Y.H. Kim, B. Reddy, S. Yun, and C. Seo, “Nemo: Neuroevolution with multiobjective optimization of deep neural network for speed and accuracy,” ICML.
4
[25] H. Pham, M. Y. Guan, B. Zoph, Q. V. Le, and J. Dean, “Efficient neural architecture
search via parameter sharing,” in Proceedings of the 35th International Conference
on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July
1015, 2018, pp. 4092–4101, 2018. 4
[26] H. Liu, K. Simonyan, and Y. Yang, “Darts: Differentiable architecture search,”
arXiv preprint arXiv:1806.09055, 2018. 4, 18
[27] E. Bengio, P.L. Bacon, J. Pineau, and D. Precup, “Conditional computation in
neural networks for faster models,” arXiv preprint arXiv:1511.06297, 2015. 5
[28] J. Kuen, X. Kong, Z. Lin, G. Wang, J. Yin, S. See, and Y.P. Tan, “Stochastic
downsampling for costadjustable inference and improved regularization in convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, pp. 7929–7938, 2018. 5
[29] L. Liu and J. Deng, “Dynamic deep neural networks: Optimizing accuracyefficiency tradeoffs by selective execution,” in Proceedings of the ThirtySecond
AAAI Conference on Artificial Intelligence, (AAAI18), the 30th innovative Applications of Artificial Intelligence (IAAI18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI18), New Orleans, Louisiana,
USA, February 27, 2018, 2017. 5
[30] R. Teja Mullapudi, W. R. Mark, N. Shazeer, and K. Fatahalian, “Hydranets: Specialized dynamic architectures for efficient inference,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 8080–8089, 2018.
5
[31] T. Véniat and L. Denoyer, “Learning time/memoryefficient deep architectures
with budgeted super networks,” in The IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), June 2018. 5
[32] R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, “Policy gradient
methods for reinforcement learning with function approximation,” in Advances in
neural information processing systems, pp. 1057–1063, 2000. 8
[33] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and
K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” in
International conference on machine learning, pp. 1928–1937, 2016. 11
[34] S. J. Rennie, E. Marcheret, Y. Mroueh, J. Ross, and V. Goel, “Selfcritical sequence training for image captioning,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 7008–7024, 2017. 11
[35] Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum learning,” in
Proceedings of the 26th annual international conference on machine learning,
pp. 41–48, ACM, 2009. 13
[36] C. Wong, N. Houlsby, Y. Lu, and A. Gesmundo, “Transfer learning with neural
automl,” in Advances in Neural Information Processing Systems, pp. 8366–8375,
2018. 13
[37] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, “Learning transferable architectures for scalable image recognition,” in The IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), June 2018. 15, 18
[38] K. Sun, M. Li, D. Liu, and J. Wang, “Igcv3: Interleaved lowrank group convolutions for efficient deep neural networks,” in British Machine Vision Conference
2018, BMVC 2018, Northumbria University, Newcastle, UK, September 36, 2018,
p. 101, 2018. 18
[39] T. DeVries and G. W. Taylor, “Improved regularization of convolutional neural
networks with cutout,” arXiv preprint arXiv:1708.04552, 2017. 17
[40] B. Gaudette, C.J. Wu, and S. Vrudhula, “Improving smartphone user experience by balancing performance and energy with probabilistic qos guarantee,” in
2016 IEEE International Symposium on High Performance Computer Architecture
(HPCA), pp. 52–63, IEEE, 2016. 22