透過實例感知參數化減少在線持續學習時的災難性遺忘

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳泓仁 Chen, Hung-Jen
論文名稱：	透過實例感知參數化減少在線持續學習時的災難性遺忘 Mitigating Forgetting in Online Continual Learning via Instance-Aware Parameterization
指導教授：	孫民 Sun, Min
口試委員:	李濬屹 Lee, Chun-Yi 陳祝嵩 Chen, Chu-Song
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2021
畢業學年度：	109
語文別：	英文
論文頁數：	30
中文關鍵詞：	深度學習、神經網路架構探索、持續學習、在線學習
外文關鍵詞：	Deep Learning, Neural Architecture Search, Continual Learning, Online Learning
相關次數：	點閱：3 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在線持續學習(Online continual learning)是一個需要機器學習模型從連續的數據流中學習，並且無法重新訪問以前遇到的數據資料的困難情境。模型需要解決任務級(task-level)的遺忘問題，以及同一任務中的實例級別(instance-level)的遺忘問題。為了克服這種情況，我們採用神經網絡中的“實例感知”(Instance aware)，其中對於每個數據實例，將透過由控制器(controller)從元圖(meta-graph)搜索到的網路路徑做預測。此外，為了保存我們從過去的實例中學到的知識，我們提出了一種保護機制：若這些實例與過往的不相似，將會限制該實例的梯度更新以防止覆蓋與其不相似實例所經過的路徑。反之，如果傳入的實例與以前的實例具有相似之處，則鼓勵微調(Fine tune)以往相似實例的路徑。選擇路徑的機制是由控制器根據根據實例相似性決定的。實驗結果表明，於在線持續學習的情境下，所提出的方法在CIFAR10，CIFAR100和TinyImageNet等數據集勝過當前表現最好技術。此外，該方法也有測試在更貼近現實的情境，即當任務的界線是模糊時，也勝過了表現最好技術。

Online continual learning is a challenging scenario where a model needs to learn from a continuous stream of data without revisiting any previously encountered data instances. The phenomenon of catastrophic forgetting is worsened since the model should not only address the forgetting at the task-level but also at the data instance-level within the same task. To mitigate this, we leverage the concept of "instance awareness" in the neural network, where each data instance is classified by a path in the network searched by the controller from a meta-graph. To preserve the knowledge we learn from previous instances, we proposed a method to protect the path by restricting the gradient updates of one instance from overriding past updates calculated from previous instances if these instances are not similar. On the other hand, it also encourages fine-tuning the path if the incoming instance shares the similarity with previous instances. The mechanism of selecting paths according to instances similarity is naturally determined by the controller, which is compact and online updated. Experimental results show that the proposed method outperforms state-of-the-arts in online continual learning. Furthermore, the proposed method is evaluated against a realistic setting where the boundaries between tasks are blurred. Experimental results confirm that the proposed method outperforms the state-of-the-arts on CIFAR-10, CIFAR-100, and Tiny-ImageNet.

摘要
Abstract
Introduction--------------------------------------------1
Related Work--------------------------------------------5
Method--------------------------------------------------9
1 Meta Graph-Controller Framework ----------------------9
2 Training the Controller-------------------------------10
3 Training the Meta-Graph-------------------------------11
4 Encouraging Explorations------------------------------13
Experiments---------------------------------------------15
1 Experiment Setup--------------------------------------15
2 Baselines---------------------------------------------16
3 Quantitative Results----------------------------------17
4 Qualitative Analysis: Distribution of Architectures---19
5 Qualitative Analysis: Instance-Awareness--------------21
6 Ablation Study: Count-Based Search Exploration--------22
7 Ablation Study: Weight Regularization-----------------23
Conclusion----------------------------------------------24
A Symbol Table--------------------------------------------25
B Algorithm-----------------------------------------------26
References------------------------------------------------27
                                

[1]D.LopezPazandM.Ranzato, “Gradient episodic memory for continual learning,” in Advances in Neural Information Processing Systems, pp. 6467–6476, 2017.1,6,17
[2]A. Chaudhry, M. Ranzato, M. Rohrbach, and M. Elhoseiny, “Efficient lifelong learning with A-GEM,” in International Conference on Learning Representations,2019.1,6,16,17,18,20
[3]R. Aljundi, M. Lin, B. Goujaud, and Y. Bengio, “Gradient-based sample selection for online continual learning,” in Advances in Neural Information Processing Systems 32(H. Wallach, H. Larochelle, A. Beygelzimer, F. d'AlchéBuc, E. Fox, and R. Garnett, eds.), pp. 11816–11825, Curran Associates, Inc., 2019.1,6,7,16,17,18,19,20
[4]R. Aljundi, E. Belilovsky, T. Tuytelaars, L. Charlin, M. Caccia, M. Lin, and L. PageCaccia, “Online continual learning with maximal interfered retrieval,” in Advances in Neural Information Processing Systems 32(H. Wallach, H. Larochelle, A. Beygelzimer, F. d'AlchéBuc, E. Fox, and R. Garnett, eds.),pp. 11849–11860, Curran Associates, Inc., 2019.1,6,7
[5]J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu,K.Milan, J.Quan, T.Ramalho, A.GrabskaBarwinska, et al., “Overcoming catastrophic forgetting in neural networks,” Proceedings of the national academy of sciences, vol. 114, no. 13, pp. 3521–3526, 2017.1,5,12,17,18,19,20
[6]M. Riemer, I. Cases, R. Ajemian, M. Liu, I. Rish, Y. Tu, and G. Tesauro, “Learning to learn without forgetting by maximizing transfer and minimizing interference,” in International Conference on Learning Representations, 2019.2,6,7
[7]A.C. Cheng, C. H. Lin, D.C. Juan, W. Wei, and M. Sun, “Instanas: Instanceaware neural architecture search,”arXiv preprint arXiv:1811.10201, 2018.2,7,9
[8]Z.Wu, T.Nagarajan, A.Kumar, S.Rennie, L.S.Davis, K.Grauman, and R.Feris, “Blockdrop: Dynamic inference paths in residual networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8817–8826,2018.2
[9]A.VeitandS.Belongie, “Convolutional networks with adaptive inference graphs,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–18, 2018.227
[10]T. L. Hayes and C. Kanan, “Lifelong machine learning with deep streaming linear discriminant analysis,” in Proceedings of the IEEE/CVF Conference on ComputerVision and Pattern Recognition Workshops, pp. 220–221, 2020.5,6
[11]X. Li, Y. Zhou, T. Wu, R. Socher, and C. Xiong, “Learn to grow: A continual structure learning framework for overcoming catastrophic forgetting,” in International Conference on Machine Learning, pp. 3925–3934, 2019.5,12
[12]A. Mallya and S. Lazebnik, “Packnet: Adding multiple tasks to a single network by iterative pruning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7765–7773, 2018.5,10
[13]J. Serra, D. Suris, M. Miron, and A. Karatzoglou, “Overcoming catastrophic forgetting with hard attention to the task,” in Proceedings of the 35th International Conference on Machine Learning(J. Dy and A. Krause, eds.), vol. 80 ofProceedings of Machine Learning Research, (Stockholmsmässan, Stockholm Sweden), pp. 4548–4557, PMLR, 10–15 Jul 2018.5,10,17,18,19,20
[14]C.Y.Hung, C.H.Tu, C.E.Wu, C.H.Chen, Y.M.Chan, andC.S.Chen, “Compacting, picking and growing for unforgetting continual learning,” in Advances in Neural Information Processing Systems, pp. 13647–13657, 2019.5
[15]R.Aljundi, P.Chakravarty,andT.Tuytelaars, “Expertgate: Lifelong learning with a network of experts,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3366–3375, 2017.5,6
[16]J. Rajasegaran, M. Hayat, S. Khan, F. S. Khan, and L. Shao, “Random path selection for incremental learning,” Advances in Neural Information Processing Systems, 2019.5,6,16,17,18
[17]T. L. Hayes, N. D. Cahill, and C. Kanan, “Memory efficient experience replay for streaming learning,” in 2019 International Conference on Robotics and Automation (ICRA), pp. 9769–9776, IEEE, 2019.6
[18]T. L. Hayes, K. Kafle, R. Shrestha, M. Acharya, and C. Kanan, “Remind your neural network to prevent catastrophic forgetting,” in Proceedings of the European Conference on Computer Vision (ECCV), 2020.6
[19]Y.C.Hsu, Y.C.Liu, A.Ramasamy, and Z.Kira, “Re evaluating continual learning scenarios: A categorization and case for strong baselines,” in NeurIPS Continual learning Workshop, 2018.6,7
[20]G.M. van de Ven and A.S.Tolias, “Three scenarios for continual learning,”arXivpreprint arXiv:1904.07734, 2019.6,7
[21]S. Yan, J. Xie, and X. He, “Der: Dynamically expandable representation for class incremental learning,”arXiv preprint arXiv:2103.16788, 2021.6
[22]G. Gupta, K. Yadav, and L. Paull, “Lamaml: Lookahead meta-learning for continual learning,”arXiv preprint arXiv:2007.13904, 2020.628
[23]J.Zhang, J.Zhang, S.Ghosh, D.Li, S.Tasci, L.Heck, H.Zhang,andC.C.J.Kuo, “Class incremental learning via deep model consolidation,” inThe IEEE Winter Conference on Applications of Computer Vision, pp. 1131–1140, 2020.7
[24]G. Bender, P.J. Kindermans, B. Zoph, V. Vasudevan, and Q. Le, “Understanding and simplifying one shot architecture search,” in International Conference on Machine Learning, pp. 549–558, 2018.7,12
[25]E. Bengio, P.L. Bacon, J. Pineau, and D. Precup, “Conditional computation in neural networks for faster models,”arXiv preprint arXiv:1511.06297, 2015.7
[26]R. M. French, “Using semidistributed representations to overcome catastrophic forgetting in connectionist networks,” in Proceedings of the 13th annual cognitive science society conference, vol. 1, pp. 173–178, 1991.7
[27]M. Lin, J. Fu, and Y. Bengio, “Conditional computation for continual learning,”arXiv preprint arXiv:1906.06635, 2019.7
[28]R.K.Srivastava, J.Masci, S.Kazerounian, F.Gomez, andJ.Schmidhuber, “Compete to compute,” in Advances in neural information processing systems, pp.2310–2318, 2013.7
[29]M.Mundt, Y.W.Hong, I.Pliushch, and V.Ramesh, “A wholistic view of continual learning with deep neural networks: Forgotten lessons and the bridge to active and open-world learning,”arXiv preprint arXiv:2009.01797, 2020.8
[30]B. Liu, “Learning on the job: Online lifelong and continual learning,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 13544–13549,2020.8
[31]K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.9,16
[32]D. Stamoulis, R. Ding, D. Wang, D. Lymberopoulos, B. Priyantha, J. Liu, and D. Marculescu, “Singlepath nas: Designing hardware efficient convnets in less than 4 hours,” inarXiv preprint arXiv:1904.02877, 2019.10
[33]S. J. Rennie, E. Marcheret, Y. Mroueh, J. Ross, and V. Goel, “Selfcritical sequence training for image captioning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7008–7024, 2017.11
[34]S. Kornblith, M. Norouzi, H. Lee, and G. E. Hinton, “Similarity of neural network representations revisited,” in Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 915 June 2019, Long Beach, California, USA(K. Chaudhuri and R. Salakhutdinov, eds.), vol. 97 of Proceedings of Machine Learning Research, pp. 3519–3529, PMLR, 2019.13
[35]H. Tang, R. Houthooft, D. Foote, A. Stooke, O. X. Chen, Y. Duan, J. Schulman,F. DeTurck, and P. Abbeel, “# exploration: A study of countbased exploration for deep reinforcement learning,” in Advances in neural information processing systems, pp. 2753–2762, 2017.1329
[36]R. Aljundi, K. Kelchtermans, and T. Tuytelaars, “Task free continual learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11254–11263, 2019.15,19
[37]H.Pham, M.Y.Guan, B.Zoph, Q.V.Le, and J.Dean, “Efficient neural architecture search via parameter sharing,” in Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July1015, 2018, pp. 4092–4101, 2018.17,18
[38]T. DeVries and G. W. Taylor, “Improved regularization of convolutional neural networks with cutout,”arXiv preprint arXiv:1708.04552, 2017.24

簡易檢索 / 詳目顯示

相關論文