簡易檢索 / 詳目顯示

研究生: 石孟立
Shih, Meng-Li
論文名稱: 基於深度學習與震動反饋的可穿戴電腦視覺系統於視障者輔助之應用
Deep Learning-based Wearable Vision-system with Vibrotactile-feedback for Visually Impaired People to Reach Objects
指導教授: 孫民
Sun, Min
口試委員: 張永儒
Chang, Yung-Ju
林嘉文
Lin, Chia-Wen
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 36
中文關鍵詞: 視障輔助物體偵測影像辨識深度學習互動設計即時系統
外文關鍵詞: Blind and Visually Impaired assistace, Object detection, Image recognition, Deep learning, Interaction design, Real-time system
相關次數: 點閱:4下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 我們開發了基於深度學習與震動反饋的可穿戴電腦視覺系統,以指導盲人和視障人士接觸物體。該系統使用深度學習的2.5D檢測器和3-D對象追踪器,可在3-D空間中實現高精度的3-D物體檢測和定位。此外,將HTC Vive Tracker與視覺模組的訓練過程結合,可以得到幾乎無需人工標示即有正確標籤的訓練資料。為了驗證系統的效能,我們對12個盲人和視障人士進行了徹底的用戶研究。我們的系統在找尋時間和碰觸非必要物體的數量上均優於無輔助引導的方法。最後,我們蒐集盲人和視障人士用戶的使用心得。瞭解到我們的輔助系統可以有效率的使得獲取物品的過程更順利。總結來說,我們的貢獻有三個部份。第一,我們使用可學習式的方法打造一個高效能的視覺模組。第二,我們藉由HTC Vive Tracker設計一個幾乎無需人工標示的訓練資料獲取程序。第三,我們做了一個徹底的實驗以驗證我們的系統效能。


    We develop a Deep Learning-based Wearable Vision system with Vibrotactile feedback (DLWV2) to guide Blind and Visually Impaired (BVI)people to reach objects. The system achieves high performance object detection and localization with learning-based 2.5-D object detector and 3-D object tracker. Furthermore, by combining HTC Vive Tracker into the training procedures of these learning-based perceptual modules, we get an almost labeling-free, large-scale annotated dataset. The dataset includes a huge number of images with 2.5-D object ground-truth (i.e., 2-D object bounding boxes and distance from the camera to objects).To validate the efficacy of our system, we conduct a thorough user study on 12 BVI people in new environments with object instances which are unseen during training. Our system outperforms the non-assistive guiding strategy with statistic significance in both time and the number of contacting irrelevant objects. Finally, the interview with BVI users confirms that they can reach target objects more easily with the aid of our system. To conclude, our contribution lies in three aspects. First,we leverage learning-based methods to build high performance perceptual module. Second, we propose a technique to collect large scale, labeling-free data with the aid of HTC Vive Tracker. Third, we conduct a thorough experiment to validate the efficacy of our system.

    摘要v Abstract vii 1 Introduction 1 1.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3.1 Wearable assistive system for BVI people . . . . . . . . . . . . 3 1.3.2 Deep-Learning based object detection and visual odometry . . . 4 2 Approach 7 2.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.2 Guidance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Hardware Component . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3 Dataset 17 4 Experiments 19 4.1 Perception Module Validation . . . . . . . . . . . . . . . . . . . . . . 19 4.1.1 Accuracy of 2.5-D Object Detector . . . . . . . . . . . . . . . 19 4.1.2 Accuracy of 3-D Object Tracker . . . . . . . . . . . . . . . . . 20 4.2 User Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2.2 Time and Superfluous contacts . . . . . . . . . . . . . . . . . . 23 4.2.3 Hand Search Space and Hand Moving Trajectory . . . . . . . . 25 4.2.4 Object Distance Effect . . . . . . . . . . . . . . . . . . . . . . 26 4.2.5 Object Tracking Effect . . . . . . . . . . . . . . . . . . . . . . 27 4.2.6 Failure Case . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.2.7 Post-study Interview . . . . . . . . . . . . . . . . . . . . . . . 28 5 Conclusion 31 References 33

    [1] W. H. Organization, “Vision impairment and blindness,” 2017. 1
    [2] L. SCIENCE, “Blind people have superior memory skills,” 2017. 1
    [3] Z. Yu, S. J. Horvath, A. Delazio, J. Wang, R. Almasi, R. Klatzky, J. Galeotti,
    and G. D. Stetten, “Palmsight: an assistive technology helping the blind to locate
    and grasp objects,” Tech. Rep. CMU-RI-TR-16-59, Carnegie Mellon University,
    Pittsburgh, PA, December 2016. 1, 3, 4
    [4] K. Thakoor, N. Mante, C. Zhang, C. Siagian, J. Weiland, L. Itti, and G. Medioni,
    A system for assisting the visually impaired in localization and grasp of desired
    objects, vol. 8927 of Lecture Notes in Computer Science (including subseries
    Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),
    pp. 643–657. Germany: Springer Verlag, 2015. 1, 4
    [5] P. A. Zientara, S. Lee, G. H. Smith, R. Brenner, L. Itti, M. B. Rosson, J. M. Carroll,
    K. M. Irick, and V. Narayanan, “Third eye: A shopping assistant for the visually
    impaired,” Computer, vol. 50, pp. 16–24, Feb. 2017. 1, 4
    [6] S. Satpute, FingerSight: A Vibrotactile Wearable Ring to Help the Blind Locate
    and Reach Objects in Peripersonal Space. PhD thesis, University of Pittsburgh,
    2019. 4
    [7] J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi, “You only look once:
    Unified, real-time object detection,” CoRR, vol. abs/1506.02640, 2015. 4
    [8] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for
    accurate object detection and semantic segmentation,” in Proceedings of the IEEE
    conference on computer vision and pattern recognition, pp. 580–587, 2014. 4
    [9] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on
    computer vision, pp. 1440–1448, 2015. 4
    [10] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object
    detection with region proposal networks,” in Advances in neural information
    processing systems, pp. 91–99, 2015. 4
    [11] J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” in Proceedings of
    the IEEE conference on computer vision and pattern recognition, pp. 7263–7271,
    2017. 4, 9, 10
    [12] P. Agrawal, J. Carreira, and J. Malik, “Learning to see by moving,” in Proceedings
    of the IEEE international conference on computer vision, pp. 37–45, 2015. 4, 5
    [13] S. Wang, R. Clark, H. Wen, and N. Trigoni, “Deepvo: Towards end-to-end visual
    odometry with deep recurrent convolutional neural networks,” in Robotics and
    Automation (ICRA), 2017 IEEE International Conference on, pp. 2043–2050,
    IEEE, 2017. 4, 5, 11
    [14] B. Ummenhofer, H. Zhou, J. Uhrig, N. Mayer, E. Ilg, A. Dosovitskiy, and
    T. Brox, “Demon: Depth and motion network for learning monocular stereo,” in
    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
    pp. 5038–5047, 2017. 4, 5
    [15] H. Zhou, B. Ummenhofer, and T. Brox, “Deeptam: Deep tracking and mapping,”
    in Proceedings of the European conference on computer vision (ECCV), pp. 822–
    838, 2018. 4, 5
    [16] T. Dharmasiri, A. Spek, and T. Drummond, “Eng: End-to-end neural geometry for
    robust depth and pose estimation using cnns,” in Asian Conference on Computer
    Vision, pp. 625–642, Springer, 2018. 4, 5
    [17] S. Shoval, J. Borenstein, and Y. Koren, “Mobile robot obstacle avoidance in
    a computerized travel aid for the blind,” in Proceedings of the 1994 IEEE
    International Conference on Robotics and Automation, pp. 2023–2028 vol.3, May
    1994. 4
    [18] P. B. L. Meijer, “An experimental system for auditory image representations,”
    IEEE Transactions on Biomedical Engineering, vol. 39, pp. 112–121, Feb 1992. 4
    [19] A. Hub, J. Diepstraten, and T. Ertl, “Design and development of an indoor
    navigation and object identification system for the blind,” ACM Sigaccess
    Accessibility and Computing, no. 77-78, pp. 147–152, 2003. 4
    [20] D. Aguerrevere, M. Choudhury, and A. Barreto, “Portable 3d sound/sonar
    navigation system for blind individuals,” in 2nd LACCEI Int. Latin Amer.
    Caribbean Conf. Eng. Technol. Miami, FL, 2004. 4
    [21] J. L. González-Mora, A. Rodriguez-Hernandez, L. Rodriguez-Ramos, L. Díaz-
    Saco, and N. Sosa, “Development of a new space perception system for blind
    people, based on the creation of a virtual acoustic space,” in International Work-
    Conference on Artificial Neural Networks, pp. 321–330, Springer, 1999. 4
    [22] G. Sainarayanan, R. Nagarajan, and S. Yaacob, “Fuzzy image processing scheme
    for autonomous navigation of human blind,” Applied Soft Computing, vol. 7, no. 1,
    pp. 257–264, 2007. 4
    [23] T. Kim, S. Kim, J. Choi, Y. Lee, and B. Lee, “Say and find it: A multimodal
    wearable interface for people with visual impairment,” in The Adjunct Publication
    of the 32nd Annual ACM Symposium on User Interface Software and Technology,
    UIST ’19, (New York, NY, USA), p. 27–29, Association for Computing
    Machinery, 2019. 4
    [24] J. Ramsay and H. J. Chang, “Body pose sonification for a view-independent
    auditory aid to blind rock climbers,” in The IEEE Winter Conference on
    Applications of Computer Vision, pp. 3414–3421, 2020. 4[25] D. Ahmetovic, F. Avanzini, A. Baratè, C. Bernareggi, G. Galimberti, L. A.
    Ludovico, S. Mascetti, and G. Presti, “Sonification of rotation instructions to
    support navigation of people with visual impairment,” in 2019 IEEE International
    Conference on Pervasive Computing and Communications (PerCom, pp. 1–10,
    March 2019. 4
    [26] J. C. Lock, A. G. Tramontano, S. Ghidoni, and N. Bellotto, “Activis: Mobile
    object detection and active guidance for people with visual impairments,” in Image
    Analysis and Processing – ICIAP 2019 (E. Ricci, S. Rota Bulò, C. Snoek, O. Lanz,
    S. Messelodi, and N. Sebe, eds.), (Cham), pp. 649–660, Springer International
    Publishing, 2019. 4
    [27] F. H. Magalhães and A. F. Kohn, “Vibration-enhanced posture stabilization
    achieved by tactile supplementation: May blind individuals get extra benefits?,”
    Medical Hypotheses, vol. 77, no. 2, pp. 301 – 304, 2011. 4
    [28] I. Ulrich and J. Borenstein, “The guidecane-applying mobile robot technologies
    to assist the visually impaired,” IEEE Transactions on Systems, Man, and
    Cybernetics-Part A: Systems and Humans, vol. 31, no. 2, pp. 131–136, 2001. 4
    [29] S. Meers and K. Ward, “A substitute vision system for providing 3d perception
    and gps navigation via electro-tactile stimulation,” 2005. 4
    [30] K. Ito, M. Okamoto, J. Akita, T. Ono, I. Gyobu, T. Takagi, T. Hoshi, and
    Y. Mishima, “Cyarm: an alternative aid device for blind persons,” in CHI’05
    Extended Abstracts on Human Factors in Computing Systems, pp. 1483–1488,
    2005. 4
    [31] M. Bouzit, A. Chaibi, K. De Laurentis, and C. Mavroidis, “Tactile feedback
    navigation handle for the visually impaired,” in ASME 2004 International
    Mechanical Engineering Congress and Exposition, pp. 1171–1177, American
    Society of Mechanical Engineers Digital Collection, 2004. 4
    [32] C. Shah, M. Bouzit, M. Youssef, and L. Vasquez, “Evaluation of ru-netra-tactile
    feedback navigation system for the visually impaired,” in 2006 International
    Workshop on Virtual Rehabilitation, pp. 72–77, IEEE, 2006. 4
    [33] L. A. Johnson and C. M. Higgins, “A navigation aid for the blind using tactilevisual
    sensory substitution,” in 2006 International Conference of the IEEE
    Engineering in Medicine and Biology Society, pp. 6289–6292, IEEE, 2006. 4
    [34] S. Cardin, D. Thalmann, and F. Vexo, “A wearable system for mobility
    improvement of visually impaired people,” The Visual Computer, vol. 23, no. 2,
    pp. 109–118, 2007. 4
    [35] N. Bourbakis and D. Ravraki, “Intelligent assistants for handicapped people’s
    independence: case study,” in Proceedings IEEE International Joint Symposia
    on Intelligence and Systems, pp. 337–344, IEEE, 1996. 4
    [36] D. Dakopoulos, S. K. Boddhu, and N. Bourbakis, “A 2d vibration array as an
    assistive device for visually impaired,” in 2007 IEEE 7th International Symposium
    on BioInformatics and BioEngineering, pp. 930–937, IEEE, 2007. 4
    [37] D. Dakopoulos and N. Bourbakis, “Preserving visual information in low
    resolution images during navigation of visually impaired,” in Proceedings of
    the 1st international conference on PErvasive Technologies Related to Assistive
    Environments, pp. 1–6, 2008. 4
    [38] H.-C. Wang, R. K. Katzschmann, S. Teng, B. Araki, L. Giarré, and D. Rus,
    “Enabling independent navigation for visually impaired people through a wearable
    vision-based feedback system,” 2017 IEEE International Conference on Robotics
    and Automation (ICRA), pp. 6533–6540, 2017. 4, 12
    [39] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg,
    “Ssd: Single shot multibox detector,” in European conference on computer vision,
    pp. 21–37, Springer, 2016. 4
    [40] T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, “Unsupervised learning of depth
    and ego-motion from video,” in 2017 IEEE Conference on Computer Vision and
    Pattern Recognition (CVPR), pp. 6612–6619, July 2017. 5
    [41] J. Bian, Z. Li, N. Wang, H. Zhan, C. Shen, M.-M. Cheng, and I. Reid,
    “Unsupervised scale-consistent depth and ego-motion learning from monocular
    video,” in Advances in Neural Information Processing Systems 32 (H. Wallach,
    H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, eds.),
    pp. 35–45, Curran Associates, Inc., 2019. 5
    [42] Itseez, “Open source computer vision library.” https://github.com/itseez/
    opencv, 2015. 10
    [43] J. Redmon, “Darknet: Open source neural networks in c,” Pjreddie. com.[Online].
    Available: https://pjreddie. com/darknet/.[Accessed: 21-Jun-2017], 2016. 10
    [44] A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazirbas, V. Golkov, P. van der
    Smagt, D. Cremers, and T. Brox, “Flownet: Learning optical flow with
    convolutional networks,” in Proceedings of the IEEE International Conference
    on Computer Vision, pp. 2758–2766, 2015. 11
    [45] B. McFee, M. McVicar, O. Nieto, S. Balke, C. Thome, D. Liang, E. Battenberg,
    J. Moore, R. Bittner, R. Yamamoto, D. Ellis, F.-R. Stoter, D. Repetto,
    S. Waloschek, C. Carr, S. Kranzler, K. Choi, P. Viktorin, J. F. Santos, A. Holovaty,
    W. Pimenta, and H. Lee, “librosa 0.5.0,” Feb. 2017. 14

    QR CODE