簡易檢索 / 詳目顯示

研究生: 歐陽飛鴻
Ou Yang, Fei-Hong
論文名稱: 基於時序同調與語意邊界強化之深度影像最佳化與行車影片天候效果合成
Depth image optimization based on temporal coherence and semantic boundary enhancement to synthesize weather effects in street-view video
指導教授: 朱宏國
Chu, Hung-Kuo
口試委員: 姚智原
Yao, Jhih-Yuan
李潤容
Li, Run-Rong
胡敏君
Hu, Min-Jun
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2019
畢業學年度: 108
語文別: 中文
論文頁數: 35
中文關鍵詞: 深度影像最佳化行車影片天候合成時序同調與語意邊界強化
外文關鍵詞: depth optimalize, street-view video, synthesize weather effects, edge aware
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來得益於深度學習在計算機視覺領域的快速發展,許多以往難以用方程式描述的複雜問題都能夠利用卷積神經網路來得到不錯的成果,像是深度預測、語意分割、自動上色、物件偵測等問題都表現良好。本研究的主要內容主要在於探討如何對現有的街景影片合成不同的天氣效果,以產生不同天候狀況下的測資,在這項任務中最重要的一個因素就是深度,雨和霧的合成都必須仰賴深度的判定。

    然而當前最先進的技術套用在台灣的街景影像上面都會產生各式各樣的問題,導致我們合成的效果並不理想。通常最大的問題分為三種 : (1)預測出來的深度的地面並不是平的、(2)接近地面的部分深度會和地面混淆,(3)時序上的深度沒有連續性導致會有閃爍的情況。

    因此本研究提出了一個系統利用深度學習預測出來的深度和語意分割當作輸入並使用用最大流最小割定理求解前景與背景兩種分類的問題,將地面精確的分離出來。然後考慮了上一幀每個像素的的移動距離將上一幀的像素投影到當下的幀得到其顏色上的差異去調整其深度,以解決缺乏時序上的連續性之問題,再使用彩色圖像中的邊界去決定深度內插的方向,以上的能量函數都能用寫成二次式的表達方式,因此我們可以使用最小平方法,來最佳化深度圖,最後能藉由我們優化出來的深度圖進行街景天氣效果合成,得到更佳的合成品質。


    In recent years, many complex problems that were difficult to describe by equations in the past can use convolutional neural networks to get results with the rapid development of deep learning in the field of computer vision, such as depth map prediction, semantic segmentation, Colorization, object detection ,and so on.

    The purpose of this study was to explore how to synthesize different weather effects on street-view video to Generate testing data under different weather conditions . The most important factor in this task is depth, because synthesis result of rain and fog event must rely on the depth map.

    However, the current state-of-the-art on depth prediction often produces poor prediction results on Taiwan street view image, resulting in the results of our synthesis is not ideal.
    There are three types of the reasons which cause wrong synthetic results: (1)The predicted depth result on the ground is not flat (2)The depth prediction near the ground is often similar to the ground, causing confusion. (3) Depth prediction results do not have temporally consistency resulting in flickering.

    Therefore, this study proposes a system that uses the depth and semantic segmentation predicted by deep learning as input and uses the minimum cut theorem to solve the problem of foreground and background classification to separate the ground.Moreover, we project the pixels of the previous frame to the current frame to adjust the depth through the difference in color, by considering the moving distance of each pixel in the current frame and the previous frame. Finally, we calculate the diffusion direction by the boundaries of the color image.
    The above energy functions can all be expressed in quadratic equation, so we use the least squares method to optimize the whole model. Finally, the optimized depth map can be used to synthesize the weather effect of street view and obtain better composite quality.

    中文摘要i Abstract ii 目錄iii 圖目錄v 1 緒論1 2 相關研究4 3 系統概觀 6 4 前處理 8 4.1 影片 8 4.2 深度預測 8 4.3 語意分割預測 9 5 產生引導遮罩11 5.1 顏色分類 11 5.2 平滑項 12 5.3 陰影去除 13 5.3.1 陰影區間 13 5.3.2 調整能量函數 14 6 時序上的平滑與邊界強化 15 6.1 時序上的平滑 15 6.2 邊緣感知 16 7 算法結果比較與天氣效果合成 18 7.1 使用語意分割與引導遮罩之結果 19 7.2 時序上的變化 20 7.3 邊界強化之效果 21 7.4 調整不同的陰影參數 22 7.5 天候合成 23 7.6 限制 24 8 結論26 A More Results 27 Bibliography 34

    [1] Richard Hartley and Andrew Zisserman. Multiple View Geometry in Computer Vision.
    Cambridge University Press, New York, NY, USA, 2 edition, 2003. ISBN 0521540518.
    [2] R. Mur-Artal, J. M. M. Montiel, and J. D. Tardós. Orb-slam: A versatile and accurate
    monocular slam system. IEEE Transactions on Robotics, 31(5):1147–1163, Oct 2015. ISSN
    1941-0468. doi: 10.1109/TRO.2015.2463671.
    [3] Aleksander Holynski and Johannes Kopf. Fast depth densification for occlusion-aware
    augmented reality. 37(6), 2018.
    [4] Pushmeet Kohli Nathan Silberman, Derek Hoiem and Rob Fergus. Indoor segmentation
    and support inference from rgbd images. In ECCV, 2012.
    [5] Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler,
    Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset
    for semantic urban scene understanding. In Proc. of the IEEE Conference on Computer
    Vision and Pattern Recognition (CVPR), 2016.
    [6] Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. Vision meets robotics:
    The kitti dataset. International Journal of Robotics Research (IJRR), 2013.
    [7] Zhengqi Li and Noah Snavely. Megadepth: Learning single-view depth prediction from
    internet photos. CoRR, abs/1804.00607, 2018. URL http://arxiv.org/abs/1804.00607.
    [8] David Novotný, Diane Larlus, and Andrea Vedaldi. Learning 3d object categories by looking
    around them. CoRR, abs/1705.03951, 2017. URL http://arxiv.org/abs/1705.03951.
    [9] Tunç Ozan Aydin, Nikolce Stefanoski, Simone Croci, Markus Gross, and Aljoscha Smolic.
    Temporally coherent local tone mapping of hdr video. ACM Trans. Graph., 33(6):196:1–
    196:13, November 2014. ISSN 0730-0301. doi: 10.1145/2661229.2661268. URL http:
    //doi.acm.org/10.1145/2661229.2661268.
    [10] Nicolas Bonneel, James Tompkin, Kalyan Sunkavalli, Deqing Sun, Sylvain Paris, and
    Hanspeter Pfister. Blind video temporal consistency. ACM Transactions on Graphics
    (Proceedings of SIGGRAPH Asia 2015), 34(6), 2015.
    [11] H. Huang, H. Wang, W. Luo, L. Ma, W. Jiang, X. Zhu, Z. Li, and W. Liu. Real-time
    neural style transfer for videos. In 2017 IEEE Conference on Computer Vision and Pattern
    Recognition (CVPR), pages 7044–7052, July 2017. doi: 10.1109/CVPR.2017.745.
    [12] Wei-Sheng Lai, Jia-Bin Huang, Oliver Wang, Eli Shechtman, Ersin Yumer, and Ming-
    Hsuan Yang. Learning blind video temporal consistency. In Vittorio Ferrari, Martial
    Hebert, Cristian Sminchisescu, and Yair Weiss, editors, Computer Vision – ECCV 2018,
    pages 179–195, Cham, 2018. Springer International Publishing. ISBN 978-3-030-01267-0.
    [13] Chia-Sheng Chang, Hung-Kuo Chu, and Niloy J. Mitra. Interactive videos: Plausible video
    editing using sparse structure points. Computer Graphics Forum (Proc. Eurographics), 35,
    2016.
    [14] Richard Szeliski. Locally adapted hierarchical basis preconditioning. ACM Trans. Graph.,
    25:1135–1143, 07 2006. doi: 10.1145/1141911.1142005.
    [15] Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, and Jian Sun. Unified perceptual
    parsing for scene understanding. CoRR, abs/1807.10221, 2018. URL http://arxiv.org/
    abs/1807.10221.
    [16] Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba.
    Semantic understanding of scenes through the ade20k dataset. arXiv preprint arXiv:
    1608.05442, 2016.
    [17] J. Canny. A computational approach to edge detection. IEEE Transactions on Pattern
    Analysis and Machine Intelligence, PAMI-8(6):679–698, Nov 1986. ISSN 1939-3539. doi:
    10.1109/TPAMI.1986.4767851.
    [18] Till Kroeger, Radu Timofte, Dengxin Dai, and Luc Van Gool. Fast optical flow using dense
    inverse search. CoRR, abs/1603.03590, 2016. URL http://arxiv.org/abs/1603.03590.
    [19] Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun.
    CARLA: An open urban driving simulator. In Proceedings of the 1st Annual Conference
    on Robot Learning, pages 1–16, 2017.

    QR CODE