簡易檢索 / 詳目顯示

研究生: 方同德
Fang, Tung-Te
論文名稱: 討論文本與知識地圖之知識節點對映方法
Mapping of Discussion Text to Knowledge Nodes of Knowledge Map
指導教授: 黃能富
Huang, Nen-Fu
口試委員: 陳俊良
Chen, Jun-Liang
許建平
Sheu, Jang-Ping
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 49
中文關鍵詞: 深度學習知識地圖LSTM磨課師NLP文本分類
外文關鍵詞: Deep Learning, Knowledge Map, LSTM, MOOCs, NLP, Text Classification
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 現今許多MOOCs 的學習平台都有課程討論區,但是幾乎沒有平台的討論區會做到文
    章分類管理。原因很簡單,第一點人工分類耗時耗力,第二點沒有好的分類依據。在
    討論區並沒有做到好的分類管理的狀況下,學生使用討論區沒有辦法有效率的找到與
    自己疑問相關的討論串,或是在大量的文章中找不到有系統的學習資訊。種種原因大
    大降低了學生使用討論區的意願,也讓許多好的討論文章無法被多數學生看到。
    因此本研究提出了一種分類討論區文章的深度學習方法,我們透過google crawler,
    exercise, 舊討論區文章當作訓練資料,並以知識地圖的知識節點作為分類基礎,透過
    NLP 的處理步驟,使用處理過後的資料訓練一個LSTMs 模型來預測討論區文章屬於
    哪個知識節點分類
    最後,我們也將討論區文章預測的結果結合知識地圖,讓學生能夠更有效率的找到
    感興趣的文章。我們認為,透過討論區文章分類系統,未來能夠更好的管理並分類討
    論區文章。


    Many MOOCs learning platforms have course discussion forums today, but there
    are few platform discussion forums that have article classification. The reason is
    that manual classification is time-consuming and labor-intensive, and there is no
    good classification basis. In the absence of good classification management in the
    discussion forum, there is no way for students to use the discussion forum efficiently.
    Various reasons have greatly reduced the willingness of students to use
    the discussion forum, and many good discussion articles cannot be seen by most
    of students. Therefore, this thesis proposed a deep learning method for categorizing
    discussion forum articles. We use google crawler, exercise, and old discussion
    forum articles as training data, and use the knowledge nodes of knowledge maps
    as the classification basis. Through NLP steps, processed data use to train an
    LSTMs model to predict which knowledge node classification the discussion forum
    article belongs to. Finally, we also combine the predicted results of articles with
    the knowledge map to enable students to find articles of interest more efficiently.
    We believe that through the discussion forum articles classification system, the
    discussion forum articles can be better managed and classified in the future.

    Abstract i 中文摘要ii Contents iii List of Figures vi List of Tables viii Chapter 1 Introduction 1 Chapter 2 Background and Related Works 5 2.1 MOOCs platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 Coursera . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.2 edX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.3 ShareCourse . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Knowledge Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 Khan Academy . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.2 ShareCourse . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3.1 Deep Neural Network . . . . . . . . . . . . . . . . . . . . . 10 2.3.2 Long Short Term Memory network . . . . . . . . . . . . . . 12 2.4 Discussion forum classification . . . . . . . . . . . . . . . . . . . . . 16 2.4.1 Nature Language Processing . . . . . . . . . . . . . . . . . . 17 2.4.2 Text Classification . . . . . . . . . . . . . . . . . . . . . . . 17 2.4.2.1 Text Representation . . . . . . . . . . . . . . . . . 18 2.4.2.2 Classifier Construction . . . . . . . . . . . . . . . 18 2.4.2.3 Classifier Evaluation . . . . . . . . . . . . . . . . . 18 2.4.3 Different Discussion Forum Classification . . . . . . . . . . . 18 Chapter 3 System Architecture 20 3.1 Data Collection module . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 Data Pre-processing module . . . . . . . . . . . . . . . . . . . . . . 23 3.3 LSTM module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.4 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Chapter 4 System Implementation 25 4.1 Data Collection module . . . . . . . . . . . . . . . . . . . . . . . . 26 4.1.1 Google Crawler . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.1.2 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.1.3 Discussion Forum Articles . . . . . . . . . . . . . . . . . . . 28 4.2 Data Pre-processing module . . . . . . . . . . . . . . . . . . . . . . 29 4.2.1 Data Balance . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2.2 Word Segmentation . . . . . . . . . . . . . . . . . . . . . . 30 4.2.3 Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2.4 Remove Stop Words . . . . . . . . . . . . . . . . . . . . . . 31 4.3 LSTM model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.3.1 Embedding Layer . . . . . . . . . . . . . . . . . . . . . . . . 33 4.3.2 LSTM layer . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.3.3 Dense layer . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.4 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Chapter 5 Experiment and Result 35 5.1 Experiment Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.2 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Chapter 6 Conclusion and Future Work 41 6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Bibliography 43

    [1] T. Brahimi and A. Sarirete, “Learning outside the classroom through
    moocs,” Comput. Hum. Behav., vol. 51, no. PB, pp. 604–609, Oct. 2015.
    [Online]. Available: http://dx.doi.org/10.1016/j.chb.2015.03.013
    [2] C. Gütl, R. H. Rizzardini, V. Chang, and M. Morales, “Attrition in mooc:
    Lessons learned from drop-out students,” in Learning Technology for Education
    in Cloud. MOOC and Big Data, L. Uden, J. Sinclair, Y.-H. Tao, and
    D. Liberona, Eds. Cham: Springer International Publishing, 2014, pp. 37–48.
    [3] K. S. Hone and G. R. E. Said, “Exploring the factors affecting mooc
    retention: A survey study,” Computers & Education, vol. 98, pp. 157 – 168,
    2016. [Online]. Available: http://www.sciencedirect.com/science/article/pii/
    S0360131516300793
    [4] D. Onah, J. Sinclair, and R. Boyatt, “Exploring the use of mooc discussion
    forums,” 11 2014.
    [5] J. Liang, C. Li, and L. Zheng, “Machine learning application in moocs:
    Dropout prediction,” in 2016 11th International Conference on Computer
    Science Education (ICCSE), Aug 2016, pp. 52–57.
    [6] J. A. Baxter and J. Haycock, “Roles and student identities in online large
    course forums: Implications for practice,” The International Review of
    Research in Open and Distributed Learning, vol. 15, no. 1, Jan. 2014. [Online].
    Available: http://www.irrodl.org/index.php/irrodl/article/view/1593
    [7] B. Schweizer, “Confessions of an unreconstructed mooc(h)er,” 2013.
    [Online]. Available: https://www.researchgate.net/publication/309032104_
    Exploring_the_use_of_MOOC_discussion_forums
    [8] A. Ng and D. Koller, “Coursera,” Retrieved June 29, 2018, from the World
    Wide Web:https://zh-tw.coursera.org, 2012.
    [9] M. I. of Technology and H. University, “edx,” Retrieved June 29, 2018, from
    the World Wide Web:https://www.edx.org, 2012.
    [10] N. T. University, “Sharecourse,” Retrieved April 16, 2017, from the World
    Wide Web:http://www.sharecourse.net/sharecourse/, 2012.
    [11] Y. C. Cheng, J. W. Tzeng, N. F. Huang, C. A. Lee, and M. L. Kuo, “Development
    of alternative conception diagnostic system based on item response
    theory in moocs,” in Proceedings of the 25th International Conference on
    Computers in Education (ICCE 2017). New Zealand: Asia-Pacific Society
    for Computers in Education, 2017, pp. 469 – 474.
    [12] H. M. Chang, T. M. L. Kuo, S. C. Chen, C. A. Li, Y. W. Huang, Y. C.
    Cheng, H. H. Hsu, N. F. Huang, and J. W. Tzeng, “Developing a data-driven
    learning interest recommendation system to promoting self-paced learning on
    moocs,” in 2016 IEEE 16th International Conference on Advanced Learning
    Technologies (ICALT), 2016, pp. 23–25.
    [13] N. F. Huang, I. H. Hsu, C. A. Lee, H. C. Chen, J. W. Tzeng, and T. T. Fang,
    “The clustering analysis system based on students’ motivation and learning
    behavior,” in 2018 Learning With MOOCS (LWMOOCS), Sep. 2018, pp. 117–
    119.
    [14] N. F. Huang, C. A. Lee, Y. W. Huang, P. W. Ou, H. H. Hsu, S. C. Chen,
    and J. W. Tzeng, “On the automatic construction of knowledge-map from
    handouts for mooc courses,” in Advances in Intelligent Information Hiding
    and Multimedia Signal Processing, J.-S. Pan, P.-W. Tsai, J. Watada, and L. C.
    Jain, Eds. Cham: Springer International Publishing, 2018, pp. 107–114.
    [15] J. H. Lee and A. Segev, “Knowledge maps for e-learning,” Computers &
    Education, vol. 59, no. 2, pp. 353–364, 2012.
    [16] S. Khan, “Knowledge map from khan academy,” Retrieved June
    29, 2018, from the World Wide Web:https://www.khanacademy.org/
    exercisedashboard, 2007.
    [17] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no.
    7553, pp. 436–444, 5 2015.
    [18] I. J. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge,
    MA, USA: MIT Press, 2016, http://www.deeplearningbook.org.
    [19] J. Schmidhuber, “Deep learning in neural networks: An overview,” Neural
    Networks, vol. 61, pp. 85–117, 2015, published online 2014; based on TR
    arXiv:1404.7828 [cs.NE].
    [20] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks
    are universal approximators,” Neural Netw., vol. 2, no. 5, pp. 359–366, Jul.
    1989. [Online]. Available: http://dx.doi.org/10.1016/0893-6080(89)90020-8
    [21] G. Cybenko, “Approximation by superpositions of a sigmoidal function,”
    Mathematics of Control, Signals, and Systems (MCSS), vol. 2, no. 4, pp. 303–
    314, Dec. 1989. [Online]. Available: http://dx.doi.org/10.1007/BF02551274
    [22] V. Nair and G. E. Hinton, “Rectified linear units improve restricted
    boltzmann machines,” in Proceedings of the 27th International Conference
    on International Conference on Machine Learning, ser. ICML’10. USA:
    Omnipress, 2010, pp. 807–814. [Online]. Available: http://dl.acm.org/
    citation.cfm?id=3104322.3104425
    [23] The Standards Task Force and American Society of Colon and Rectal
    Surgeons, “Practice parameters for sigmoid diverticulitis,” Diseases of the
    Colon & Rectum, vol. 38, no. 2, pp. 125–125, Feb 1995. [Online]. Available:
    https://doi.org/10.1007/BF02052438
    [24] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
    Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997. [Online]. Available:
    http://dx.doi.org/10.1162/neco.1997.9.8.1735
    [25] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning
    Representations by Back-propagating Errors,” Nature, vol. 323, no. 6088,
    pp. 533–536, 1986. [Online]. Available: http://www.nature.com/articles/
    323533a0
    [26] C. T. Duong, R. Lebret, and K. Aberer, “Multimodal classification for
    analysing social media,” 2017. [Online]. Available: https://arxiv.org/abs/
    1708.02099
    [27] S. Li, “Multi-class text classification with lstm,” Retrieved July
    10, 2019, from the World Wide Web:https://towardsdatascience.com/
    multi-class-text-classification-with-lstm-1590bee1bd17, 2019.
    [28] T. Hastie, J. Friedman, and R. Tibshirani, Unsupervised Learning. New
    York, NY: Springer New York, 2001, pp. 437–508. [Online]. Available:
    https://doi.org/10.1007/978-0-387-21606-5_14
    [29] F. Sebastiani, “Machine learning in automated text categorization,” ACM
    Comput. Surv., vol. 34, no. 1, pp. 1–47, Mar. 2002. [Online]. Available:
    http://doi.acm.org/10.1145/505282.505283
    [30] R. Xu and D. Wunsch, II, “Survey of clustering algorithms,” Trans.
    Neur. Netw., vol. 16, no. 3, pp. 645–678, May 2005. [Online]. Available:
    https://doi.org/10.1109/TNN.2005.845141
    [31] Y. Yang and J. O. Pedersen, “A comparative study on feature selection
    in text categorization,” in Proceedings of the Fourteenth International
    Conference on Machine Learning, ser. ICML ’97. San Francisco, CA, USA:
    Morgan Kaufmann Publishers Inc., 1997, pp. 412–420. [Online]. Available:
    http://dl.acm.org/citation.cfm?id=645526.657137
    [32] X. Wei, H. Lin, L. Yang, and Y. Yu, “A convolution-lstm-based deep neural
    network for cross-domain mooc forum post classification,” Information, vol. 8,
    p. 92, 07 2017.
    [33] A. Ezen-Can, K. E. Boyer, S. Kellogg, and S. Booth, “Unsupervised modeling
    for understanding mooc discussion forums: A learning analytics approach,”
    in Proceedings of the Fifth International Conference on Learning Analytics
    And Knowledge, ser. LAK ’15. New York, NY, USA: ACM, 2015, pp.
    146–150. [Online]. Available: http://doi.acm.org/10.1145/2723576.2723589
    [34] A. W. Wong, K. Wong, and A. Hindle, “Tracing forum posts to
    mooc content using topic analysis,” 2019. [Online]. Available: https:
    //arxiv.org/abs/1904.07307
    [35] Google Brain Team, “Tensorflow,” Retrieved June 29, 2018, from the World
    Wide Web:https://github.com/tensorflow/tensorflow, 2015.
    [36] G. van Rossum, “Python,” Retrieved July 10, 2019, from the World Wide
    Web:www.python.org, 1990.
    [37] wention, “beautifulsoup4,” Retrieved July 10, 2019, from the World Wide
    Web:https://github.com/wention/BeautifulSoup4, 2015.
    [38] H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Transactions
    on Knowledge and Data Engineering, vol. 21, no. 9, pp. 1263–1284, Sep. 2009.
    [39] fxsjy, “jieba,” Retrieved July 2, 2019, from the World Wide Web:https://
    github.com/fxsjy/jieba, 2019.
    [40] ssut, “googletrans,” Retrieved July 10, 2019, from the World Wide Web:https:
    //github.com/ssut/py-googletrans, 2019.
    [41] Kyubyong, “wordvectors,” Retrieved July 10, 2019, from the World Wide
    Web:https://github.com/Kyubyong/wordvectors, 2017.
    [42] National Tsinghua University, “2017 introduction to computer networks (autumn),”
    Retrieved July 10, 2019, from the World Wide Web:http://www.
    sharecourse.net/sharecourse/course/view/courseInfo/1246, 2017.
    [43] ——, “2016 introduction to computer networks (autumn),” Retrieved July 10,
    2019, from the World Wide Web:http://www.sharecourse.net/sharecourse/
    course/view/courseInfo/908, 2016.
    [44] ——, “2015 introduction to computer networks (autumn),” Retrieved July 10,
    2019, from the World Wide Web:http://www.sharecourse.net/sharecourse/
    course/view/courseInfo/568, 2015.
    [45] ——, “2018 introduction to computer networks (autumn),” Retrieved July 10,
    2019, from the World Wide Web:http://www.sharecourse.net/sharecourse/
    course/view/courseInfo/1620, 2018.
    [46] ——, “2018 introduction to computer networks (spring),” Retrieved July 10,
    2019, from the World Wide Web:http://www.sharecourse.net/sharecourse/
    course/view/courseInfo/1406, 2018.

    QR CODE