基於深度學習與震動反饋的可穿戴電腦視覺系統於視障者輔助之應用

簡易檢索 / 詳目顯示

回結果列表

研究生：	石孟立 Shih, Meng-Li
論文名稱：	基於深度學習與震動反饋的可穿戴電腦視覺系統於視障者輔助之應用 Deep Learning-based Wearable Vision-system with Vibrotactile-feedback for Visually Impaired People to Reach Objects
指導教授：	孫民 Sun, Min
口試委員:	張永儒 Chang, Yung-Ju 林嘉文 Lin, Chia-Wen
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2020
畢業學年度：	108
語文別：	英文
論文頁數：	36
中文關鍵詞：	視障輔助、物體偵測、影像辨識、深度學習、互動設計、即時系統
外文關鍵詞：	Blind and Visually Impaired assistace, Object detection, Image recognition, Deep learning, Interaction design, Real-time system
相關次數：	點閱：4 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

我們開發了基於深度學習與震動反饋的可穿戴電腦視覺系統，以指導盲人和視障人士接觸物體。該系統使用深度學習的2.5D檢測器和3-D對象追踪器，可在3-D空間中實現高精度的3-D物體檢測和定位。此外，將HTC Vive Tracker與視覺模組的訓練過程結合，可以得到幾乎無需人工標示即有正確標籤的訓練資料。為了驗證系統的效能，我們對12個盲人和視障人士進行了徹底的用戶研究。我們的系統在找尋時間和碰觸非必要物體的數量上均優於無輔助引導的方法。最後，我們蒐集盲人和視障人士用戶的使用心得。瞭解到我們的輔助系統可以有效率的使得獲取物品的過程更順利。總結來說，我們的貢獻有三個部份。第一，我們使用可學習式的方法打造一個高效能的視覺模組。第二，我們藉由HTC Vive Tracker設計一個幾乎無需人工標示的訓練資料獲取程序。第三，我們做了一個徹底的實驗以驗證我們的系統效能。

We develop a Deep Learning-based Wearable Vision system with Vibrotactile feedback (DLWV2) to guide Blind and Visually Impaired (BVI)people to reach objects. The system achieves high performance object detection and localization with learning-based 2.5-D object detector and 3-D object tracker. Furthermore, by combining HTC Vive Tracker into the training procedures of these learning-based perceptual modules, we get an almost labeling-free, large-scale annotated dataset. The dataset includes a huge number of images with 2.5-D object ground-truth (i.e., 2-D object bounding boxes and distance from the camera to objects).To validate the efficacy of our system, we conduct a thorough user study on 12 BVI people in new environments with object instances which are unseen during training. Our system outperforms the non-assistive guiding strategy with statistic significance in both time and the number of contacting irrelevant objects. Finally, the interview with BVI users confirms that they can reach target objects more easily with the aid of our system. To conclude, our contribution lies in three aspects. First,we leverage learning-based methods to build high performance perceptual module. Second, we propose a technique to collect large scale, labeling-free data with the aid of HTC Vive Tracker. Third, we conduct a thorough experiment to validate the efficacy of our system.

摘要v
Abstract vii
Introduction 1
1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.1 Wearable assistive system for BVI people . . . . . . . . . . . . 3
3.2 Deep-Learning based object detection and visual odometry . . . 4
Approach 7
1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1 Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Guidance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Hardware Component . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Dataset 17
Experiments 19
1 Perception Module Validation . . . . . . . . . . . . . . . . . . . . . . 19
1.1 Accuracy of 2.5-D Object Detector . . . . . . . . . . . . . . . 19
1.2 Accuracy of 3-D Object Tracker . . . . . . . . . . . . . . . . . 20
2 User Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Time and Superfluous contacts . . . . . . . . . . . . . . . . . . 23
2.3 Hand Search Space and Hand Moving Trajectory . . . . . . . . 25
2.4 Object Distance Effect . . . . . . . . . . . . . . . . . . . . . . 26
2.5 Object Tracking Effect . . . . . . . . . . . . . . . . . . . . . . 27
2.6 Failure Case . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.7 Post-study Interview . . . . . . . . . . . . . . . . . . . . . . . 28
Conclusion 31
References 33
                                

[1] W. H. Organization, “Vision impairment and blindness,” 2017. 1
[2] L. SCIENCE, “Blind people have superior memory skills,” 2017. 1
[3] Z. Yu, S. J. Horvath, A. Delazio, J. Wang, R. Almasi, R. Klatzky, J. Galeotti,
and G. D. Stetten, “Palmsight: an assistive technology helping the blind to locate
and grasp objects,” Tech. Rep. CMU-RI-TR-16-59, Carnegie Mellon University,
Pittsburgh, PA, December 2016. 1, 3, 4
[4] K. Thakoor, N. Mante, C. Zhang, C. Siagian, J. Weiland, L. Itti, and G. Medioni,
A system for assisting the visually impaired in localization and grasp of desired
objects, vol. 8927 of Lecture Notes in Computer Science (including subseries
Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),
pp. 643–657. Germany: Springer Verlag, 2015. 1, 4
[5] P. A. Zientara, S. Lee, G. H. Smith, R. Brenner, L. Itti, M. B. Rosson, J. M. Carroll,
K. M. Irick, and V. Narayanan, “Third eye: A shopping assistant for the visually
impaired,” Computer, vol. 50, pp. 16–24, Feb. 2017. 1, 4
[6] S. Satpute, FingerSight: A Vibrotactile Wearable Ring to Help the Blind Locate
and Reach Objects in Peripersonal Space. PhD thesis, University of Pittsburgh,
2019. 4
[7] J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi, “You only look once:
Unified, real-time object detection,” CoRR, vol. abs/1506.02640, 2015. 4
[8] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for
accurate object detection and semantic segmentation,” in Proceedings of the IEEE
conference on computer vision and pattern recognition, pp. 580–587, 2014. 4
[9] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on
computer vision, pp. 1440–1448, 2015. 4
[10] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object
detection with region proposal networks,” in Advances in neural information
processing systems, pp. 91–99, 2015. 4
[11] J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” in Proceedings of
the IEEE conference on computer vision and pattern recognition, pp. 7263–7271,
2017. 4, 9, 10
[12] P. Agrawal, J. Carreira, and J. Malik, “Learning to see by moving,” in Proceedings
of the IEEE international conference on computer vision, pp. 37–45, 2015. 4, 5
[13] S. Wang, R. Clark, H. Wen, and N. Trigoni, “Deepvo: Towards end-to-end visual
odometry with deep recurrent convolutional neural networks,” in Robotics and
Automation (ICRA), 2017 IEEE International Conference on, pp. 2043–2050,
IEEE, 2017. 4, 5, 11
[14] B. Ummenhofer, H. Zhou, J. Uhrig, N. Mayer, E. Ilg, A. Dosovitskiy, and
T. Brox, “Demon: Depth and motion network for learning monocular stereo,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
pp. 5038–5047, 2017. 4, 5
[15] H. Zhou, B. Ummenhofer, and T. Brox, “Deeptam: Deep tracking and mapping,”
in Proceedings of the European conference on computer vision (ECCV), pp. 822–
838, 2018. 4, 5
[16] T. Dharmasiri, A. Spek, and T. Drummond, “Eng: End-to-end neural geometry for
robust depth and pose estimation using cnns,” in Asian Conference on Computer
Vision, pp. 625–642, Springer, 2018. 4, 5
[17] S. Shoval, J. Borenstein, and Y. Koren, “Mobile robot obstacle avoidance in
a computerized travel aid for the blind,” in Proceedings of the 1994 IEEE
International Conference on Robotics and Automation, pp. 2023–2028 vol.3, May
1994. 4
[18] P. B. L. Meijer, “An experimental system for auditory image representations,”
IEEE Transactions on Biomedical Engineering, vol. 39, pp. 112–121, Feb 1992. 4
[19] A. Hub, J. Diepstraten, and T. Ertl, “Design and development of an indoor
navigation and object identification system for the blind,” ACM Sigaccess
Accessibility and Computing, no. 77-78, pp. 147–152, 2003. 4
[20] D. Aguerrevere, M. Choudhury, and A. Barreto, “Portable 3d sound/sonar
navigation system for blind individuals,” in 2nd LACCEI Int. Latin Amer.
Caribbean Conf. Eng. Technol. Miami, FL, 2004. 4
[21] J. L. González-Mora, A. Rodriguez-Hernandez, L. Rodriguez-Ramos, L. Díaz-
Saco, and N. Sosa, “Development of a new space perception system for blind
people, based on the creation of a virtual acoustic space,” in International Work-
Conference on Artificial Neural Networks, pp. 321–330, Springer, 1999. 4
[22] G. Sainarayanan, R. Nagarajan, and S. Yaacob, “Fuzzy image processing scheme
for autonomous navigation of human blind,” Applied Soft Computing, vol. 7, no. 1,
pp. 257–264, 2007. 4
[23] T. Kim, S. Kim, J. Choi, Y. Lee, and B. Lee, “Say and find it: A multimodal
wearable interface for people with visual impairment,” in The Adjunct Publication
of the 32nd Annual ACM Symposium on User Interface Software and Technology,
UIST ’19, (New York, NY, USA), p. 27–29, Association for Computing
Machinery, 2019. 4
[24] J. Ramsay and H. J. Chang, “Body pose sonification for a view-independent
auditory aid to blind rock climbers,” in The IEEE Winter Conference on
Applications of Computer Vision, pp. 3414–3421, 2020. 4[25] D. Ahmetovic, F. Avanzini, A. Baratè, C. Bernareggi, G. Galimberti, L. A.
Ludovico, S. Mascetti, and G. Presti, “Sonification of rotation instructions to
support navigation of people with visual impairment,” in 2019 IEEE International
Conference on Pervasive Computing and Communications (PerCom, pp. 1–10,
March 2019. 4
[26] J. C. Lock, A. G. Tramontano, S. Ghidoni, and N. Bellotto, “Activis: Mobile
object detection and active guidance for people with visual impairments,” in Image
Analysis and Processing – ICIAP 2019 (E. Ricci, S. Rota Bulò, C. Snoek, O. Lanz,
S. Messelodi, and N. Sebe, eds.), (Cham), pp. 649–660, Springer International
Publishing, 2019. 4
[27] F. H. Magalhães and A. F. Kohn, “Vibration-enhanced posture stabilization
achieved by tactile supplementation: May blind individuals get extra benefits?,”
Medical Hypotheses, vol. 77, no. 2, pp. 301 – 304, 2011. 4
[28] I. Ulrich and J. Borenstein, “The guidecane-applying mobile robot technologies
to assist the visually impaired,” IEEE Transactions on Systems, Man, and
Cybernetics-Part A: Systems and Humans, vol. 31, no. 2, pp. 131–136, 2001. 4
[29] S. Meers and K. Ward, “A substitute vision system for providing 3d perception
and gps navigation via electro-tactile stimulation,” 2005. 4
[30] K. Ito, M. Okamoto, J. Akita, T. Ono, I. Gyobu, T. Takagi, T. Hoshi, and
Y. Mishima, “Cyarm: an alternative aid device for blind persons,” in CHI’05
Extended Abstracts on Human Factors in Computing Systems, pp. 1483–1488,
2005. 4
[31] M. Bouzit, A. Chaibi, K. De Laurentis, and C. Mavroidis, “Tactile feedback
navigation handle for the visually impaired,” in ASME 2004 International
Mechanical Engineering Congress and Exposition, pp. 1171–1177, American
Society of Mechanical Engineers Digital Collection, 2004. 4
[32] C. Shah, M. Bouzit, M. Youssef, and L. Vasquez, “Evaluation of ru-netra-tactile
feedback navigation system for the visually impaired,” in 2006 International
Workshop on Virtual Rehabilitation, pp. 72–77, IEEE, 2006. 4
[33] L. A. Johnson and C. M. Higgins, “A navigation aid for the blind using tactilevisual
sensory substitution,” in 2006 International Conference of the IEEE
Engineering in Medicine and Biology Society, pp. 6289–6292, IEEE, 2006. 4
[34] S. Cardin, D. Thalmann, and F. Vexo, “A wearable system for mobility
improvement of visually impaired people,” The Visual Computer, vol. 23, no. 2,
pp. 109–118, 2007. 4
[35] N. Bourbakis and D. Ravraki, “Intelligent assistants for handicapped people’s
independence: case study,” in Proceedings IEEE International Joint Symposia
on Intelligence and Systems, pp. 337–344, IEEE, 1996. 4
[36] D. Dakopoulos, S. K. Boddhu, and N. Bourbakis, “A 2d vibration array as an
assistive device for visually impaired,” in 2007 IEEE 7th International Symposium
on BioInformatics and BioEngineering, pp. 930–937, IEEE, 2007. 4
[37] D. Dakopoulos and N. Bourbakis, “Preserving visual information in low
resolution images during navigation of visually impaired,” in Proceedings of
the 1st international conference on PErvasive Technologies Related to Assistive
Environments, pp. 1–6, 2008. 4
[38] H.-C. Wang, R. K. Katzschmann, S. Teng, B. Araki, L. Giarré, and D. Rus,
“Enabling independent navigation for visually impaired people through a wearable
vision-based feedback system,” 2017 IEEE International Conference on Robotics
and Automation (ICRA), pp. 6533–6540, 2017. 4, 12
[39] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg,
“Ssd: Single shot multibox detector,” in European conference on computer vision,
pp. 21–37, Springer, 2016. 4
[40] T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, “Unsupervised learning of depth
and ego-motion from video,” in 2017 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), pp. 6612–6619, July 2017. 5
[41] J. Bian, Z. Li, N. Wang, H. Zhan, C. Shen, M.-M. Cheng, and I. Reid,
“Unsupervised scale-consistent depth and ego-motion learning from monocular
video,” in Advances in Neural Information Processing Systems 32 (H. Wallach,
H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, eds.),
pp. 35–45, Curran Associates, Inc., 2019. 5
[42] Itseez, “Open source computer vision library.” https://github.com/itseez/
opencv, 2015. 10
[43] J. Redmon, “Darknet: Open source neural networks in c,” Pjreddie. com.[Online].
Available: https://pjreddie. com/darknet/.[Accessed: 21-Jun-2017], 2016. 10
[44] A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazirbas, V. Golkov, P. van der
Smagt, D. Cremers, and T. Brox, “Flownet: Learning optical flow with
convolutional networks,” in Proceedings of the IEEE International Conference
on Computer Vision, pp. 2758–2766, 2015. 11
[45] B. McFee, M. McVicar, O. Nieto, S. Balke, C. Thome, D. Liang, E. Battenberg,
J. Moore, R. Bittner, R. Yamamoto, D. Ellis, F.-R. Stoter, D. Repetto,
S. Waloschek, C. Carr, S. Kranzler, K. Choi, P. Viktorin, J. F. Santos, A. Holovaty,
W. Pimenta, and H. Lee, “librosa 0.5.0,” Feb. 2017. 14

簡易檢索 / 詳目顯示

相關論文