研究生: |
董家瑀 Tung, Chia-Yu |
---|---|
論文名稱: |
基於手指穿戴式相機和聽覺回饋於室內空間之視障者尋物導引系統 RingGuardian: A finger-worn Camera-based System for Blind and Visually Impaired Users to Perform Room-level Search of Objects with Audio Guidance |
指導教授: |
孫民
Sun, Min |
口試委員: |
詹力韋
Chan, Liwei 張永儒 Chang, Yung-Ju |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2019 |
畢業學年度: | 108 |
語文別: | 英文 |
論文頁數: | 45 |
中文關鍵詞: | 視障輔助 、物體偵測 、影像辨識 、深度學習 、互動設計 、即時系統 |
外文關鍵詞: | Blind and Visually Impaired assistace, Object detection, Image recognition, Deep learning, Interaction design, Real-time system |
相關次數: | 點閱:4 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
我們開發出一個穿戴式系統來幫助視障者在室內空間尋物。對比之下,現行的系統皆在使用空間或是穿戴性上有所限制。我們的系統藉由一個手指穿戴式的小型相機擷取影像來檢測目標並預估它離使用者的距離,再利用骨傳導雙聲道耳機來表述目標物的種類、方向、以及位置資訊來引導使用者尋找物體。尤其我們結合了基於深度學習的目標檢測模型和基於模板匹配的物件追蹤方法,即便是在模型遺失檢測的情況下,也能得到可信賴的目標位置。我們訓練的模型在測試資料集中,檢測傢俱(例:桌子,櫃子)和日常用品(例:錢包,鑰匙)更是有辦法達到高的精確度(>85%每項物品)
我們招募12位視障者來執行使用者研究實驗,透過即時互動的實驗來檢驗我們的系統效能。在本次實驗中,不論是環境或是目標物都是我們在訓練模型時所沒有出現過的物體。
我們從實驗中發現,我們的系統跟真人輔助導引的方式在任務成功率上並沒有統計上的顯著差異,這展示了我們系統的強大性能。
最後在受測者訪談過程中,視障者指出我們的系統可以使他們了解該空間中的家具擺設,並且縮小需要搜尋的範圍和提昇對於次任務的效率。
We introduce RingGuardian, a portable wearable system to support blind and visually impaired (BVI) users to perform room-level search of objects. In contrast, most previous methods focus on limited search space and/or with limited portability.
RingGuardian captures images from a small finger-worn camera to detect and estimate the distance of objects. Then, a stereo headphone is used to guide the BVI user by conveying the object category, direction and distance information.
In particular, we combine an extended deep-learning-based object detector with a template-based object tracker to obtain reliable object tracks even under missing detection.
In our testing set, our detector achieves high precision (>85% per instance) at detecting furniture (e.g., table, cabinet) and daily necessities (e.g., wallet, key).
We empirically evaluate our full system's performance through an experiment followed by a real-time interactive user study with 12 BVI participants conducting in an environment with object instances which are unseen during model training. We discover that our system and human-assistive guiding strategy have no statistically significant difference in trial success rates. This demonstrates the strong performance of our full system.
Finally, in the interview session, BVI users indicate that RingGuardian allowed them to know the arrangement of furniture in the environment also narrow down the search space and increase the efficiency of the task.
[1] J. R. Gleason, “An accurate, non-iterative approximation for studentized range
quantiles,” vol. 31, no. 2, pp. 147–158, 1999. ix, 35
[2] L. SCIENCE, “Blind people have superior memory skills,” 2017. 1
[3] W. H. Organization, “Vision impairment and blindness,” 2017. 1
[4] Z. Yu, S. J. Horvath, A. Delazio, J. Wang, R. Almasi, R. Klatzky, J. Galeotti,
and G. D. Stetten, “Palmsight: an assistive technology helping the blind to locate
and grasp objects,” Tech. Rep. CMU-RI-TR-16-59, Carnegie Mellon University,
Pittsburgh, PA, December 2016. 1, 2, 5, 6, 39
[5] P. A. Zientara, S. Lee, G. H. Smith, R. Brenner, L. Itti, M. B. Rosson, J. M. Carroll,
K. M. Irick, and V. Narayanan, “Third eye: A shopping assistant for the visually
impaired,” Computer, vol. 50, pp. 16–24, Feb 2017. 1, 5, 6, 39
[6] K. A. Thakoor, N. Mante, C. Zhang, C. Siagian, J. D. Weiland, L. Itti, and G. G.
Medioni, “A system for assisting the visually impaired in localization and grasp of
desired objects,” in ECCV Workshops, 2014. 1, 2, 5, 6, 39
[7] M. Eckert, M. Blex, and C. Friedrich, “Object detection featuring 3d audio local-
ization for microsoft hololens - a deep learning based sensor substitution approach
for the blind,” pp. 555–561, 01 2018. 1, 2, 5, 6, 39
[8] S. et al, “Dlwv2: a deep learning-based wearable vision-system with vibrotactile-
feedback for visually impaired people to reach objects,” International Conference
on Intelligent Robots and Systems, IROS, submission accepted to IROS 2018. 1,
5, 6, 29, 39
[9] D. G. Lowe, “Object recognition from local scale-invariant features,” in Computer
vision, 1999. The proceedings of the seventh IEEE international conference on,
vol. 2, pp. 1150–1157, Ieee, 1999. 1, 7
[10] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,”
in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer
Society Conference on, vol. 1, pp. 886–893, IEEE, 2005. 1, 7
[11] S. Nanayakkara, R. Shilkrot, K. P. Yeo, and P. Maes, “Eyering: A finger-worn
input device for seamless interactions with our surroundings,” in Proceedings of
the 4th Augmented Human International Conference, AH ’13, (New York, NY,
USA), pp. 13–20, ACM, 2013. 2, 5, 6
[12] B. Söveny, G. Kovács, and Z. T. Kardkovács, “Blind guide - a virtual eye for guid-
ing indoor and outdoor movement,” in 2014 5th IEEE Conference on Cognitive
Infocommunications (CogInfoCom), pp. 343–347, Nov 2014. 5, 6
[13] J. Bai, S. Lian, Z. Liu, K. Wang, and D. Liu, “Smart guiding glasses for visually
impaired people in indoor environment,” IEEE Transactions on Consumer Elec-
tronics, vol. 63, pp. 258–266, August 2017. 5, 6
[14] M. Gharat, R. Patanwala, and A. Ganaparthi, “Audio guidance system for blind,”
in 2017 International conference of Electronics, Communication and Aerospace
Technology (ICECA), vol. 1, pp. 381–384, April 2017. 6
[15] J. R. Blum, M. Bouchard, and J. R. Cooperstock, “What’s around me? spatialized
audio augmented reality for blind users with a smartphone,” in Mobile and Ubiq-
uitous Systems: Computing, Networking, and Services (A. Puiatti and T. Gu, eds.),
(Berlin, Heidelberg), pp. 49–62, Springer Berlin Heidelberg, 2012. 6
[16] K. Matsuda and K. Kondo, “Towards a navigation system for the visually impaired
using 3d audio,” in 2016 IEEE 5th Global Conference on Consumer Electronics,
pp. 1–2, Oct 2016. 6
[17] J. Sánchez and M. Sáenz, “3d sound interactive environments for blind children
problem solving skills,” Behaviour & Information Technology, vol. 25, no. 4,
pp. 367–378, 2006. 6
[18] R. Shilkrot, J. Huber, J. Steimle, S. Nanayakkara, and P. Maes, “Digital digits: A
comprehensive survey of finger augmentation devices,” ACM Computing Surveys
(CSUR), vol. 48, no. 2, p. 30, 2015. 7
[19] L. Chan, R.-H. Liang, M.-C. Tsai, K.-Y. Cheng, C.-H. Su, M. Y. Chen, W.-H.
Cheng, and B.-Y. Chen, “Fingerpad: private and subtle interaction using finger-
tips,” in Proceedings of the 26th annual ACM symposium on User interface soft-
ware and technology, pp. 255–260, ACM, 2013. 7
[20] L. Chan, Y.-L. Chen, C.-H. Hsieh, R.-H. Liang, and B.-Y. Chen, “Cyclopsring:
Enabling whole-hand and context-aware interactions through a fisheye ring,” in
Proceedings of the 28th Annual ACM Symposium on User Interface Software &
Technology, pp. 549–556, ACM, 2015. 7
[21] D. Ashbrook, P. Baudisch, and S. White, “Nenya: subtle and eyes-free mobile in-
put with a magnetically-tracked finger ring,” in Proceedings of the SIGCHI Con-
ference on Human Factors in Computing Systems, pp. 2043–2046, ACM, 2011.
7
[22] S. Nanayakkara, R. Shilkrot, K. P. Yeo, and P. Maes, “Eyering: a finger-worn input
device for seamless interactions with our surroundings,” in Proceedings of the 4th
Augmented Human International Conference, pp. 13–20, ACM, 2013. 7
[23] M. Ogata, Y. Sugiura, H. Osawa, and M. Imai, “iring: intelligent ring using infrared
reflection,” in Proceedings of the 25th annual ACM symposium on User interface
software and technology, pp. 131–136, ACM, 2012. 7
[24] L. Stearns, R. Du, U. Oh, C. Jou, L. Findlater, D. A. Ross, and J. E. Froehlich,
“Evaluating haptic and auditory directional guidance to assist blind people in read-
ing printed text using finger-mounted cameras,” ACM Transactions on Accessible
Computing (TACCESS), vol. 9, no. 1, p. 1, 2016. 7
[25] W. Kienzle and K. Hinckley, “Lightring: always-available 2d input on any sur-
face,” in Proceedings of the 27th annual ACM symposium on User interface soft-
ware and technology, pp. 157–160, ACM, 2014. 7
[26] L. Stearns, U. Oh, L. Findlater, and J. E. Froehlich, “Touchcam: Realtime recog-
nition of location-specific on-body gestures to support users with visual impair-
ments,” Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous
Technologies, vol. 1, no. 4, p. 164, 2018. 7
[27] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object
detection with region proposal networks,” in Advances in neural information pro-
cessing systems, pp. 91–99, 2015. 8
[28] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv
preprint arXiv:1804.02767, 2018. 8
[29] T. Lin, P. Goyal, R. B. Girshick, K. He, and P. Dollár, “Focal loss for dense object
detection,” in IEEE International Conference on Computer Vision, ICCV 2017,
Venice, Italy, October 22-29, 2017, pp. 2999–3007, 2017. 8, 16
[30] J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z. Wojna,
Y. Song, S. Guadarrama, et al., “Speed/accuracy trade-offs for modern convolu-
tional object detectors,” in IEEE CVPR, vol. 4, 2017. 8
[31] K. Chen, J. Wang, S. Yang, X. Zhang, Y. Xiong, C. C. Loy, and D. Lin, “Optimizing
video object detection via a scale-time lattice,” in CVPR, 2018. 8
[32] K. Kang, H. Li, T. Xiao, W. Ouyang, J. Yan, X. Liu, and X. Wang, “Object detec-
tion in videos with tubelet proposal networks,” in Proc. CVPR, vol. 2, p. 7, 2017.
8
[33] J. P. Lewis, “Fast template matching,” in Vision interface, vol. 95, pp. 15–19, 1995.
8, 20
[34] Itseez, “Open source computer vision library.” https://github.com/itseez/
opencv, 2015. 14
[35] A. Shrivastava, A. Gupta, and R. Girshick, “Training region-based object detectors
with online hard example mining,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 761–769, 2016. 16
[36] S. R. Bulo, G. Neuhold, and P. Kontschieder, “Loss maxpooling for semantic im-
age segmentation,” CVPR), July, vol. 7, 2017. 16
[37] C. Peng, T. Xiao, Z. Li, Y. Jiang, X. Zhang, K. Jia, G. Yu, and J. Sun, “Megdet:
A large mini-batch object detector,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 6181–6189, 2018. 16
[38] J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learn-
ing and stochastic optimization,” Journal of Machine Learning Research, vol. 12,
no. Jul, pp. 2121–2159, 2011. 21
[39] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training
by reducing internal covariate shift,” pp. 448–456, 2015. 21
[40] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and
C. L. Zitnick, “Microsoft coco: Common objects in context,” in European confer-
ence on computer vision, pp. 740–755, Springer, 2014. 21
[41] C. Vondrick, D. Patterson, and D. Ramanan, “Efficiently scaling up crowdsourced
video annotation,” International Journal of Computer Vision, vol. 101, no. 1,
pp. 184–204, 2013. 24