深度學習模型視覺化的統計觀點｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	江伯耕 Chiang, Po-Keng
論文名稱：	深度學習模型視覺化的統計觀點 Statistical Viewpoints for Deep Learning Model Visualization
指導教授：	陳素雲 Huang, Su-Yun 謝文萍 Hsieh, Wen-Ping
口試委員:	王紹宣 Wang, Shao-Hsuan 洪弘 Hung, Hung
學位類別：	碩士 Master
系所名稱：	理學院 - 統計學研究所 Institute of Statistics
論文出版年：	2021
畢業學年度：	109
語文別：	英文
論文頁數：	36
中文關鍵詞：	反向傳播、深度學習、視覺化、熱度圖
外文關鍵詞：	Gradient map, Model visualization, Saliency map
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

近年來機器學習及深度學習得益於硬體及算法的發展，在以往統計模型難以處理的問題
上取得了重大的發展。諸如語音及圖像辨識等等。然而，過於複雜的模型架構同樣使得產生
的決策難以被解釋。在一些應用場景上若無法給出明確的解釋則會發生咎責困難或是可信度
的問題，例如一些醫療場域等等。另一方面，在研究人員而言，則是希望可以透過模型的解
釋來修正、增進模型的效果。
基於上述背景，目前已存在多個針對影像辨識模型視覺化的解讀方法。但這些方法的解
釋各不相同外，也並無一個標準的解釋準則及嚴謹的論述，各自提出其認為可行的方案。
在本篇論文中，我們挑選出現今較為熱門的幾個解釋模型，以較易推導的統計模型為
基礎，嘗試推導各自的統計意義，並檢視其合理性。除此之外，以這些架構為基礎，利用
MNIST 及 Chest X ray 資料集來比較個方法，並且驗證推導結果。
結果顯示，儘管在一定的程度上存在解釋的效果，但多數方法在較為複雜的模型及較為
多樣的資料中依然時常呈現出無法解釋的結果。可以看到，在實際資料上，解釋的效果很大
程度依賴於模型的選擇、解釋的資料、以及圖片的呈現。

In recent years, machine learning and deep learning have benefited from the development of
hardware and algorithms, and have made significant developments in the problems that were
difficult to handle with statistical models in the past such as voice and image recognition and so
on. However, the overly complex model architecture also makes the resulting decisions difficult
to interpret. In some application scenarios, if a clear explanation cannot be given, it will lead to
credibility issues. On the other hand, as far as researchers are concerned, they hope to modify
and enhance the effect of the model through the interpretation of the model.
Based on the above background, there are currently several interpretation methods for the
visualization of image recognition models. However, the interpretation of these methods is
different, and there is no standard interpretation criterion and rigorous exposition, and each
proposes the solution that it thinks is feasible.
In this paper, we select several explanatory models that are popular today and try to derive
their statistical significance based on statistical models that are easier to derive and examine
their rationality. In addition, based on these frameworks, the MNIST and Chest X-ray data
sets are used to compare the methods and verify the deduction results.
The results show that although there is an explanatory effect to a certain extent, most
methods still often show unexplainable results in more complex models and for more heterogeneous data. It can be seen from the real data examples that the effect of interpretation
largely depends on the choice of model, the data set for interpretation, and the particular pick
of images for presentation.

摘要 i
Abstract ii
Introduction 1
1 Model visualization for explainable AI . . . . . . . . . . . . . . . . . . . . . . . 1
2 Saliency maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Methods Review 4
1 Notation usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Class model visualisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Gradient map (back-propagation and class saliency map) . . . . . . . . . . . . . 6
4 Deconvolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5 Guided back-propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
6 Integrated gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
7 Occlusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
8 Class activation map (CAM) and gradient CAM . . . . . . . . . . . . . . . . . . 14
Statistical Viewpoints 17
1 Multinomial logistic regression with softmax activation . . . . . . . . . . . . . . 17
2 Multinomial logistic regression with an extra ReLU layer . . . . . . . . . . . . . 18
3 The relationship between gradient map and deconvolution . . . . . . . . . . . . 19
4 CAM as a visualization of feature tensor projection . . . . . . . . . . . . . . . . 20
5 Grad-CAM as a visualization of gradient tensor projection . . . . . . . . . . . . 21
Experiments 23
1 Data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.1 MNIST data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.2 Chest X-ray images (pneumonia) . . . . . . . . . . . . . . . . . . . . . . 23
2 Results for MNIST data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.1 Gradient map, deconvolution and guided back-propagation . . . . . . . . 24
2.2 Integrated gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3 Occlusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4 Gradient map vs. deconvolution with modified logistic regression . . . . 27
3 Results for chest X-ray images (pneumonia) . . . . . . . . . . . . . . . . . . . . 29
Conclusion 35
References 36
                                

Mahendran, A. and Vedaldi, A. (2016). Visualizing deep convolutional neural networks usingnatural pre-images.International Journal of Computer Vision, 120(3):233–255.Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., and Batra, D. (2017). Grad-cam: Visual explanations from deep networks via gradient-based localization. InProceedingsof the IEEE international conference on computer vision, pages 618–626.Simonyan, K., Vedaldi, A., and Zisserman, A. (2013). Deep inside convolutional networks:Visualising image classification models and saliency maps.arXiv preprint arXiv:1312.6034.Springenberg, J. T., Dosovitskiy, A., Brox, T., and Riedmiller, M. (2014). Striving for simplicity:The all convolutional net.arXiv preprint arXiv:1412.6806.Sundararajan, M., Taly, A., and Yan, Q. (2017). Axiomatic attribution for deep networks. InInternational Conference on Machine Learning, pages 3319–3328. PMLR.Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., and Lipson, H. (2015). Understanding neuralnetworks through deep visualization.arXiv preprint arXiv:1506.06579.Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding convolutional networks. InEuropean conference on computer vision, pages 818–833. Springer.Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., and Torralba, A. (2016). Learning deep featuresfor discriminative localization. InProceedings of the IEEE conference on computer visionand pattern recognition, pages 2921–2929.

簡易檢索 / 詳目顯示

相關論文