研究生: |
蕭 翔 Hsiao,Hsiang |
---|---|
論文名稱: |
基於優化版YOLO架構之輕量化物件檢測系統設計與實現 Design and Implementation of a Lightweight Object Detection System Based on an Optimized YOLO Architecture |
指導教授: |
馬席彬
Ma, Hsi-Pin |
口試委員: |
黃稚存
Huang, Chih-Tsun 蔡佩芸 Tsai, Pei-Yun |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2025 |
畢業學年度: | 113 |
語文別: | 中文 |
論文頁數: | 89 |
中文關鍵詞: | 物件偵測 、人工智慧 、神經網路 |
外文關鍵詞: | Object Detection, Artificial intelligence, Neural Network |
相關次數: | 點閱:131 下載:1 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著工業 4.0 時代的來臨,智慧工廠的自動化生產線對機器視覺系統的
即時性與準確性要求日益提升,而在機械手臂應用場域中,即時且準確地
辨識目標物件成為關鍵。然而,現行目標檢測系統在實務運用中,常因計
算資源受限、光照條件多變、物體受到遮擋或反光等因素影響,導致檢測
表現不易穩定。
本研究以輕量化目標檢測模型 YOLOv7-Tiny 為基礎,提出一套經過深
度精簡與性能強化的物件偵測方法。透過深度可分離卷積以降低計算複雜
度,並引入壓縮與激發注意力機制,以增強特徵提取的有效性,並將此模
型命名為結合深度可分離卷積以及壓縮與激發之注意力機制的改進 YOLO
模型 (DW-SE-YOLO)。同時,採用多樣化的資料增強策略,使模型面對不
同光照與雜訊環境時,仍能維持穩定的偵測效能。
實驗結果顯示,透過四折交叉驗證與消融實驗,證實了深度可分離卷積
和注意力機制的有效性。以交疊比 0.5 之平均精確度可達 0.976,並可達每
秒約 109.5 幀的即時偵測速度,同時將參數量縮減至約 542 萬。此外,設計
了輕量網頁框架的雲端部署架構,結合檢測系統與機械手臂,實現即時的
物件偵測與操作控制。
未來研究將著重於優化網路結構,探索更高效的特徵提取方法,並整合
多模態感知技術以適應工業場景中更多元化的應用需求,為工業視覺系統
的輕量化和實用化發展提供新的思路。
With the advent of Industry 4.0, smart factory production lines have placed in-
creasingly stringent demands on the real-time accuracy of machine vision systems,
particularly in robotic arm applications. However, existing detection frameworks
often exhibit unstable performance due to limited computational resources, varying
lighting conditions, and object occlusions or reflections.
This study proposes an enhanced lightweight object detection method based on
YOLOv7-Tiny. The model, named DW-SE-YOLO, incorporates depthwise sepa-
rable convolutions to reduce computational complexity and integrates a squeeze-
and-excitation attention mechanism to enhance feature extraction efficiency. Di-
verse data augmentation strategies are employed to maintain stable detection per-
formance under various environmental conditions.
Experimental results, through four-fold cross-validation and ablation studies,
validate the effectiveness of our approach. The model achieves a mean average
precision of 0.976 at an IoU threshold of 0.5, processes 109.5 frames per second,
and reduces parameters to 5.42 million. A lightweight web-based deployment architecture integrates the detection system with robotic arms for real-time object
detection and control.
Future research will focus on optimizing network architecture and exploring
efficient feature extraction methods to adapt to diverse industrial applications, pro-
viding new insights for industrial vision system development.
[1] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate
object detection and semantic segmentation,” in Proceedings of the IEEE conference on
computer vision and pattern recognition, 2014, pp. 580–587.
[2] R. Girshick, “Fast R-CNN,” arXiv preprint arXiv:1504.08083, 2015.
[3] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detec-
tion with region proposal networks,” IEEE transactions on pattern analysis and machine
intelligence, vol. 39, no. 6, pp. 1137–1149, 2016.
[4] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd:
Single shot multibox detector,” in Computer Vision–ECCV 2016: 14th European Con-
ference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14.
Springer, 2016, pp. 21–37.
[5] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid
networks for object detection,” in Proceedings of the IEEE conference on computer vision
and pattern recognition, 2017, pp. 2117–2125.
[6] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-
time object detection,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition,
2016, pp. 779–788.
[7] J. Redmon and A. Farhadi, “YOLO9000: better, faster, stronger,” in Proceedings of the
IEEE conference on computer vision and pattern recognition, 2017, pp. 7263–7271.
[8] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “Yolov4: Optimal speed and accuracy
of object detection,” arXiv preprint arXiv:2004.10934, 2020.
[9] “Yolov5.” [Online]. Available: https://github.com/ultralytics/yolov5
[10] C. Li, L. Li, H. Jiang, K. Weng, Y. Geng, L. Li, Z. Ke, Q. Li, M. Cheng, W. Nie et al.,
“Yolov6: A single-stage object detection framework for industrial applications,” arXiv
preprint arXiv:2209.02976, 2022.
[11] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “Yolov7: Trainable bag-of-freebies sets
new state-of-the-art for real-time object detectors,” in Proceedings of the IEEE/CVF con-
ference on computer vision and pattern recognition, 2023, pp. 7464–7475.
[12] A. G. Howard, “Mobilenets: Efficient convolutional neural networks for mobile vision
applications,” arXiv preprint arXiv:1704.04861, 2017.
[13] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the
IEEE conference on computer vision and pattern recognition, 2018, pp. 7132–7141.
[14] “Pascal Voc.” [Online]. Available: http://host.robots.ox.ac.uk/pascal/VOC/
[15] J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” in Proc. IEEE Conf. Com-
puter Vision and Pattern Recognition, 2017, pp. 7263–7271.
[16] J. Redmon, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767,
2018.
[17] “Coco Dataset.” [Online]. Available: https://cocodataset.org/#home
[18] C.-Y. Wang, H.-Y. M. Liao, Y.-H. Wu, P.-Y. Chen, J.-W. Hsieh, and I.-H. Yeh, “Csp-
net: A new backbone that can enhance learning capability of cnn,” in Proceedings of the
IEEE/CVF conference on computer vision and pattern recognition workshops, 2020, pp.
390–391.
[19] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional
networks for visual recognition,” IEEE transactions on pattern analysis and machine in-
telligence, vol. 37, no. 9, pp. 1904–1916, 2015.
[20] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for instance segmenta-
tion,” in Proceedings of the IEEE conference on computer vision and pattern recognition,
2018, pp. 8759–8768.
[21] Z. Zheng, P. Wang, D. Ren, W. Liu, R. Ye, Q. Hu, and W. Zuo, “Enhancing geometric
factors in model learning and inference for object detection and instance segmentation,”
IEEE transactions on cybernetics, vol. 52, no. 8, pp. 8574–8586, 2021.
[22] M. Tan, R. Pang, and Q. V. Le, “Efficientdet: Scalable and efficient object detection,”
in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,
2020, pp. 10 781–10 790.
[23] S. Liu, D. Huang, and Y. Wang, “Learning spatial fusion for single-shot object detection,”
arXiv preprint arXiv:1911.09516, 2019.
[24] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “Scaled-yolov4: Scaling cross stage par-
tial network,” in Proceedings of the IEEE/cvf conference on computer vision and pattern
recognition, 2021, pp. 13 029–13 038.
[25] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted
residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer
vision and pattern recognition, 2018, pp. 4510–4520.
[26] “Silu Pytorch API.” [Online]. Available: https://pytorch.org/docs/stable/generated/torch.
nn.SiLU.html
[27] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception
architecture for computer vision,” in Proceedings of the IEEE conference on computer
vision and pattern recognition, 2016, pp. 2818–2826.
[28] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in
Proceedings of the IEEE conference on computer vision and pattern recognition, 2016,
pp. 770–778.
[29] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep con-
volutional neural networks,” Advances in neural information processing systems, vol. 25,
2012.
[30] X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, and J. Sun, “Repvgg: Making vgg-style
convnets great again,” in Proceedings of the IEEE/CVF conference on computer vision
and pattern recognition, 2021, pp. 13 733–13 742.
[31] “Labelimg.” [Online]. Available: https://github.com/HumanSignal/labelImg
[32] “Tm Robot TM14.” [Online]. Available: https://www.tm-robot.com/zh-hant/tm14/
[33] “Flask Official Documentation.” [Online]. Available: https://flask.palletsprojects.com/
en/stable/
[34] H. Zhang, “mixup: Beyond empirical risk minimization,” arXiv preprint
arXiv:1710.09412, 2017.
[35] Y.-F. Zhang, W. Ren, Z. Zhang, Z. Jia, L. Wang, and T. Tan, “Focal and efficient IOU loss
for accurate bounding box regression,” Neurocomputing, vol. 506, pp. 146–157, 2022.
[36] “Cuda toolkit documentation v11.6.0.” [Online]. Available: https://docs.nvidia.com/
cuda/archive/11.6.0/
[37] “Pytorch.” [Online]. Available: https://pytorch.org/
[38] “Torchvision.” [Online]. Available: https://pytorch.org/vision/stable/index.html
[39] “Werkzeug.” [Online]. Available: https://werkzeug.palletsprojects.com/en/stable/
[40] “Rest API Design Principles.” [Online]. Available: https://www.restapitutorial.com/
[41] G. Ghiasi, Y. Cui, A. Srinivas, R. Qian, T.-Y. Lin, E. D. Cubuk, Q. V. Le, and B. Zoph,
“Simple copy-paste is a strong data augmentation method for instance segmentation,” in
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,
2021, pp. 2918–2928.
[42] “Yolov6.” [Online]. Available: https://github.com/meituan/YOLOv6