研究生: |
何元植 Ho, Yuan-Chih |
---|---|
論文名稱: |
基於殘差學習優化自動化工廠鳥瞰圖轉換點雲物體偵測 Optimizing Birds-Eye-View Converted Point Cloud Object Detection Based on Residual Learning in Automated Factory |
指導教授: |
馬席彬
Ma, Hsi-Pin |
口試委員: |
蔡佩芸
Tsai, Pei-Yun 朱宏國 CHU, HUNG-KUO 胡敏君 Hu, Min-Chun |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2022 |
畢業學年度: | 111 |
語文別: | 中文 |
論文頁數: | 64 |
中文關鍵詞: | 機器學習 、深度學習 、物體偵測 、點雲 、光達 、自動化工廠 |
外文關鍵詞: | machine learning, deep learning, object detection, point cloud, LiDAR, automated factory |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在自動化工廠中,常用於自走車進行任務的方法涉及影像辨識,傳統的影像辨識會選用 RGB 攝像頭,以神經網路訓練模型來達到高準確度。但這種影像辨識通常是用來處理大量的圖像,所使用到的演算法較為複雜,會造成即時辨識的延遲,故使用雷射掃描儀 (Laser Scanner) 輔助辨識來改善此問題。本論文以雷射掃描儀所蒐集的二維點雲 (2D Point Cloud) 辨識工廠內特定的物品,能縮短辨識所需花費的時間。除此之外,RGB 影像透漏了很多工廠內的資訊,像是機台的型號、自走車型號、各類儀器的資訊,這些在各個工廠內須保持其商業機密的物體若透過攝像頭蒐集資料可能會有安全及保密的疑慮,故選用雷射掃描儀蒐集二維點雲資料並使用本論文提出的神經網路 Complex-ResYOLO 完成物體偵測 (Object Detection) 加強資訊安全性。
本篇論文以 Complex-YOLO 架構為基礎,將工廠內的物體以雷射掃描儀蒐集點雲資料,與原本的神經網路所訓練的數據集 KITTI 結合,並將資料標註 (Label) 成為新的數據集,以辨識自動化工廠內的行人、自走車、貨架、充電站及手推車。以下歸納出三點貢獻作為總結:1. 以低成本、資料獲取快速的雷射掃描儀作為收發器建立全新的工廠數據集。2. 將 ResNet 及Complex-YOLO 結合增加計算複雜度至 14.2%,並提升工廠二維點雲物體偵測之平均精確度最高至 4.88%。3. 以資料擴展的方式節省資料蒐集時間並改善過度訓練 (Overfitting) 的情形發生。
In automated factories, the method commonly used for mobile robots to perform tasks involves image recognition. Traditional image recognition uses RGB cameras and uses neural networks to process a large number of images. The algorithm used is relatively complex, which will cause the delay of real-time identification. In this paper, the 2D point cloud collected by the laser scanner is used to identify specific items in the factory, which can shorten the time required for identification. In addition to this, RGB images may have security and confidentiality concerns, so a laser scanner is used to collect 2D point cloud data and the neural network Complex-ResYOLO proposed in this paper is used to complete the object detection task to enhance information security.
Based on the Complex-YOLO architecture, this paper uses a laser scanner to collect point cloud data for objects in the factory, combines it with the original neural network training data set KITTI, and labels the data as a new dataset. To identify pedestrians, mobile robots, shelves, charging stations and carts in automated factories. Three contributions are summarized as follows: 1. Create a new factory dataset with a low-cost, fast data acquisition laser scanner as a transceiver. 2. Combining ResNet and Complex-YOLO increases the computational complexity to 14.2%, and improves the average precision of object detection in factory 2D point clouds to 4.88%. 3. Save data collection time and improve overfitting by data augmentation.
[1] M. Simony, S. Milzy, K. Amendey, and H.-M. Gross, “Complex-yolo: An euler-regionproposal for real-time 3D object detection on point clouds,” in Proc. Eur. Conf. Computer
Vision, 2018.
[2] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in
Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2016, pp. 770–778.
[3] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The KITTI Dataset,”
The International Journal of Robotics Research, vol. 32, no. 11, pp. 1231–1237, 2013.
[4] B. Prabin and N. Ko, “Scale-hierarchical 3D object recognition in cluttered scenes,” Sensors, vol. 21, no. 7, p. 2445, 2021.
[5] X. Chen, K. Kundu, Z. Zhang, H. Ma, S. Fidler, and R. Urtasun, “Monocular 3D object
detection for autonomous driving,” in Proc. IEEE Conf. Computer Vision and Pattern
Recognition, 2016, pp. 2147–2156.
[6] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, realtime object detection,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition,
2016, pp. 779–788.
[7] Y. Zhou and O. Tuzel, “Voxelnet: End-to-end learning for point cloud based 3D object detection,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2018, pp. 4490–
4499.
[8] “Princeton modelnet.” [Online]. Available: https://modelnet.cs.princeton.edu/
[9] “Shapenet.” [Online]. Available: https://shapenet.org/
[10] “Scannet: Richly-annotated 3D reconstructions of indoor scenes.” [Online]. Available:
http://www.scan-net.org/
[11] “Semantic 3D classification.” [Online]. Available: http://www.semantic3d.net/
[12] H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller, “Multi-view convolutional neural
networks for 3D shape recognition,” in Proc. IEEE Int. Conf. Computer Vision, 2015, pp.
945–953.
[13] A. Kanezaki, Y. Matsushita, and Y. Nishida, “Rotationnet: Joint object categorization and
pose estimation using multiviews from unsupervised viewpoints,” in Proc. IEEE Conf.
Computer Vision and Pattern Recognition, 2018, pp. 5010–5019.
[14] D. Maturana and S. Scherer, “Voxnet: A 3D convolutional neural network for real-time
object recognition,” in Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst., 2015, pp. 922–928.
[15] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3D ShapeNets: A deep
representation for volumetric shapes,” in Proc. IEEE Conf. Computer Vision and Pattern
Recognition, 2015, pp. 1912–1920.
[16] G. Riegler, A. Osman Ulusoy, and A. Geiger, “Octnet: Learning deep 3D representations
at high resolutions,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2017,
pp. 3577–3586.
[17] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for
3D classification and segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern
Recognition, 2017, pp. 652–660.
[18] C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on
point sets in a metric space,” Advances in neural information processing systems, vol. 30,
2017.
[19] A. Boulch, “Convpoint: Continuous convolutions for point cloud processing,” Computers
Graphics, vol. 88, pp. 24–34, 2020.
[20] M. Simonovsky and N. Komodakis, “Dynamic edge-conditioned filters in convolutional
neural networks on graphs,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2017, pp. 3693–3702.
[21] W. Zeng and T. Gevers, “3dcontextnet: Kd tree guided hierarchical learning of point
clouds using local and global contextual cues,” in Proc. Eur. Conf. Computer Vision, 2018.
[22] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, “Multi-view 3D object detection network
for autonomous driving,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition,
2017, pp. 1907–1915.
[23] J. Ku, M. Mozifian, J. Lee, A. Harakeh, and S. L. Waslander, “Joint 3D proposal generation
and object detection from view aggregation,” in Proc. IEEE/RSJ Int. Conf. Intell. Robot.
Syst., 2018, pp. 1–8.
[24] C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas, “Frustum pointnets for 3D object detection from rgb-d data,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition,
2018, pp. 918–927.
[25] B. Yang, W. Luo, and R. Urtasun, “Pixor: Real-time 3D object detection from point
clouds,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2018, pp. 7652–
7660.
[26] B. Li, T. Zhang, and T. Xia, “Vehicle detection from 3D lidar using fully convolutional
network,” arXiv preprint arXiv:1608.07916, 2016.
[27] J. Beltrán, C. Guindel, F. M. Moreno, D. Cruzado, F. Garcia, and A. De La Escalera,
“Birdnet: a 3D object detection framework from lidar information,” in Proc. 21st Int.
Conf. Intell. Transp. Syst., 2018, pp. 3517–3523.
[28] Z. Wang and F. Lu, “Voxsegnet: Volumetric cnns for semantic part segmentation of 3D
shapes,” IEEE Trans. visualization and Computer Graphics, vol. 26, no. 9, pp. 2919–2930,
2019.
[29] “Pascal voc dataset.” [Online]. Available: https://paperswithcode.com/dataset/pascal-voc
[30] J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2017, pp. 7263–7271.
[31] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the KITTI
vision benchmark suite,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition,
2012, pp. 3354–3361.
[32] P. L. A. Geiger and R. Urtasun, “Are we ready for autonomous driving? the kitti vision
benchmark suite,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2012,
pp. 3354–3361.
[33] “The cifar-10 dataset.” [Online]. Available: https://www.cs.toronto.edu/~kriz/cifar.html
[34] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image
recognition,” arXiv preprint arXiv:1409.1556, 2014.
[35] M. Lin, Q. Chen, and S. Yan, “Network in network,” arXiv preprint arXiv:1312.4400,
2013.
[36] A. Krizhevsky, “Learning multiple layers of features from tiny images,” 2009.
[37] A. D. Pon, J. Ku, C. Li, and S. L. Waslander, “Object-centric stereo matching for 3D object
detection,” in 2020 Proc. IEEE Int. Conf. Robot. Automat., 2020, pp. 8383–8389.
[38] X. Guo, S. Shi, X. Wang, and H. Li, “Liga-stereo: Learning lidar geometry aware representations for stereo-based 3D detector,” in Proc. IEEE/CVF Int. Conf. Computer Vision,
2021, pp. 3153–3163.
[39] “Point cloud library.” [Online]. Available: https://pointclouds.org/
[40] “labelcloud 0.7.8.” [Online]. Available: https://pypi.org/project/labelCloud/
[41] “pypcd.” [Online]. Available: https://pypi.org/project/pypcd/
[42] “Laser scanner.” [Online]. Available: https://www.hokuyo-aut.jp/search/single.php?
serial=174
[43] “Cuda toolkit documentation v11.6.0.” [Online]. Available: https://docs.nvidia.com/
cuda/archive/11.6.0/
[44] “Pytorch.” [Online]. Available: https://pytorch.org/
[45] “Torchvision.” [Online]. Available: https://pytorch.org/vision/stable/index.html
[46] “Tensorboard.” [Online]. Available: https://www.tensorflow.org/tensorboard/get_started
[47] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint
arXiv:1412.6980, 2014.
[48] J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z. Wojna,
Y. Song, S. Guadarrama et al., “Speed/accuracy trade-offs for modern convolutional object detectors,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2017, pp.
7310–7311.
[49] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4, inception-resnet and
the impact of residual connections on learning,” in Proc. 31st AAAI Conf. Artif. Intell.,
2017.
[50] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection
with region proposal networks,” Advances in Neural Information Processing Systems,
vol. 28, 2015.
[51] L. Bai, Y. Lyu, X. Xu, and X. Huang, “Pointnet on FPGA for real-time lidar point cloud
processing,” in IEEE International Symposium on Circuits and Systems (ISCAS), 2020,
pp. 1–5.