簡易檢索 / 詳目顯示

研究生: 周樂儀
CHOW, YVONNE LORK-YEE
論文名稱: 基於字義和發音預測馬來西亞華裔姓名之年齡
Age Estimation based on Character Meaning and Pronunciation Using Ethnic-Chinese Malaysian Names
指導教授: 陳宜欣
Chen, Yi-Shin
口試委員: 彭文志
Peng, Wen-Chih
賴郁雯
Lai, Yu-Wen
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 52
中文關鍵詞: 年齡預測字義發音馬來西亞華裔姓名
外文關鍵詞: Age Estimation, Character Meaning, Pronunciation, Ethnic-Chinese, Malaysian names
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在我們的日常生活中,不同年齡的人會因為自身的生活經歷不一樣而往往具有不同
    的性格,不同的偏好或不同的行為。不幸的是,由於隱私設置的原因,用戶的年齡
    信息很難收集,因此,我們調查其他可能與年齡有關的有用信息,例如:姓名。有
    一個研究針對台灣人的中文名字進行分析,透過中華文化在取名上常用的特徵進行
    年齡預測。由於該研究僅針對特定的國家和語言,因此,本研究的目的是要探討年
    齡預測模型在不同語言和不同國家的可推廣性。我們的實驗結果表明,透過使用名
    字本身字義和發音的特徵,則能夠用來預測該名字的年齡層。


    In our daily life, people with different age tends to have different personalities due to their life experiences and also have different preference or behavior. Unfortunately, due to the privacy setting, the user’s information for age is difficult to collect, therefore, we look into other useful information, which might related to age, such as name. Previously, there is a research focus on estimating the age-interval of Taiwanese name. Through the observation of Taiwanese culture to give a name, they extract the features from the name to do age prediction. As the work is only focused on a specific country and language, therefore, the objective of this research is to explore the generalisability of the age prediction model on different linguistic and for different country. The experiment results indicates that the name itself carry a lot of meaning and the meaning can be use as a feature to predict the age of a name.

    1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 The Role of Age and Gender . . . . . . . . . . . . . . . . . . . . . 5 2.2 Age Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Relation beyond Words . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.4 Word Pronunciation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1 Thesis statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.3 Data Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3.1 Name Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3.2 Family Name Segmentation . . . . . . . . . . . . . . . . . . . . 13 3.4 Given Name Features Extraction . . . . . . . . . . . . . . . . . 14 3.4.1 Pronunciation Feature . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.4.2 Word Embedding Feature . . . . . . . . . . . . . . . . . . . . . . 20 3.4.3 Word Radical Feature . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.4.4 Fortune-Telling Feature . . . . . . . . . . . . . . . . . . . . . . . . 22 3.4.5 Gender Prediction Feature . . . . . . . . . . . . . . . . . . . . . . 23 3.4.6 Phase I: Age-interval Classifier . . . . . . . . . . . . . . . . . . . . 24 3.4.7 Phase II: Cross-border Learning on Malaysian name . . 24 4 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.1 Taiwanese Dataset Collection . . . . . . . . . . . . . . . . . . . . . . . 28 4.1.1 Taiwanese Student Name Crowdsourcing . . . . . . . . . . . . 28 4.1.2 Collecting Name of Taiwanese Public Figure . . . . . . . . . . 28 4.2 Malaysia Dataset Collection . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2.1 Malaysian Student Name Crowdsourcing . . . . . . . . . . . 29 5 Experiment and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 5.1.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 5.1.2 Evaluation method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5.3.1 Gender Feature Classification . . . . . . . . . . . . . . . . . . . . 38 5.3.2 Age-interval Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 6 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 47 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

    [1] Arthur S. Abramson. 1. the principles on which the ipa should be based. Journal of
    the International Phonetic Association, 18(2):66–68, 1988.

    [2] Phillip Ackerman and Margaret Beier. Intelligence, personality, and interests in the
    career choice process. Journal of Career Assessment - J CAREER ASSESSMENT,
    11:205–218, 05 2003.

    [3] Margaret E. Beier and Phillip L. Ackerman. Determinants of health knowledge: An
    investigation of age, gender, abilities, personality, and interests. Journal of Personality
    and Social Psychology, 84(2):439,448, 2003-02.

    [4] Xinxiong Chen, Lei Xu, Zhiyuan Liu, Maosong Sun, and Huan-Bo Luan. Joint learning of character and word embeddings. In IJCAI, 2015.

    [5] Kevin Chung. Effects of pinyin and first language words in learning of chinese characters as a second language. Journal of Behavioral Education, 12:207–223, 09 2003.

    [6] Akshay Gulati. Extracting Information from Indian First Names. In Proceedings of
    the 12th International Conference on Natural Language Processing, pages 138–143,
    Trivandrum, India, 12 2015. NLP Association of India.

    [7] Mohd Hilmi Hamzah, Aini Ahmad, and Mohd Hasren Yusuf. A comparative study
    of pronunciation among chinese learners of english from malaysia and china: The
    case of voiceless dental fricatives /θ/ and alveolar liquids /r/. Sains Humanika, 9, 11
    2017.

    [8] Ching-Yen Hsiao. A Comparative Framework for Person Age Estimation Using only
    Taiwanese Name Data. 11 2017.

    [9] Erich L. Lehmann and George Casella. Theory of Point Estimation. Springer-Verlag,
    New York, NY, USA, second edition, 1998.

    [10] M. K. C. Macmahon. The international phonetic association: The first 100 years.
    Journal of the International Phonetic Association, 16(1):30–38, 1986.

    [11] J.G. Melton. The Encyclopedia of Religious Phenomena. Visible Ink Press, 2008.

    [12] Tomas Mikolov, Ilya Sutskever, Kai Chen, G.s Corrado, and Jeffrey Dean. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 26, 10 2013.

    [13] Dong Nguyen, Noah A. Smith, and Carolyn P. Rose. Author age prediction from text ´using linear regression. In Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pages 115–123, Portland, OR, USA, June 2011. Association for Computational Linguistics.

    [14] Claudia Peersman, Walter Daelemans, and Leona Van Vaerenbergh. Predicting age and gender in online social networks. In Proceedings of the 3rd International Workshop on Search and Mining User-generated Contents, SMUC ’11, pages 37–44, New York, NY, USA, 2011. ACM.

    [15] Daniel Preot¸iuc-Pietro, Johannes Eichstaedt, Gregory Park, Maarten Sap, Laura
    Smith, Victoria Tobolsky, H. Andrew Schwartz, and Lyle Ungar. The role of personality, age, and gender in tweeting about mental illness. In Proceedings of the 2nd
    Workshop on Computational Linguistics and Clinical Psychology: From Linguistic
    Signal to Clinical Reality, pages 21–30, Denver, Colorado, June 5 2015. Association
    for Computational Linguistics.

    [16] Daniel Preotiuc-Pietro, Johannes C. Eichstaedt, Gregory J. Park, Maarten Sap, Laura Smith, Victoria Tobolsky, H. Andrew Schwartz, and Lyle H. Ungar. The role of
    personality, age, and gender in tweeting about mental illness. In CLPsych@HLTNAACL, 2015.

    [17] Sara Rosenthal and Kathleen McKeown. Age prediction in blogs: A study of style,
    content, and online behavior in pre- and post-social media generations. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT ’11, pages 763–772, Stroudsburg, PA, USA, 2011. Association for Computational Linguistics.

    [18] Maarten Sap, Gregory Park, Johannes Eichstaedt, Margaret Kern, David Stillwell,
    Michal Kosinski, Lyle Ungar, and H. Schwartz. Developing age and gender predictive
    lexica over social media. pages 1146–1151, 01 2014.

    [19] H. Schwartz, Johannes Eichstaedt, Margaret Kern, Lukasz Dziurzynski, Stephanie
    Ramones, Megha Agrawal, Achal Shah, Michal Kosinski, David Stillwell, Martin
    Seligman, and Lyle Ungar. Personality, gender, and age in the language of social
    media: The open-vocabulary approach. PloS one, 8:e73791, 09 2013.

    [20] T Story, Cynthia Berg, Timothy Smith, Ryan Beveridge, Nancy Henry, and Gale
    Pearce. Age, marital satisfaction, and optimism as predictors of positive sentiment
    override in middle-aged and older married couples. Psychology and aging, 22:719–
    27, 01 2008.

    [21] Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, and Bing Qin. Learning
    sentiment-specific word embedding for twitter sentiment classification. volume 1,
    pages 1555–1565, 06 2014.

    [22] Peng Wang, Bo Xu, Jiaming Xu, Guanhua Tian, Cheng-Lin Liu, and Hongwei Hao.
    Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing, 174, 10 2015.

    [23] Daksha Yadav, Richa Singh, Mayank Vatsa, and Afzel Noore. Recognizing ageseparated face images: Humans and machines. PLOS ONE, 9(12):1–22, 12 2014.

    QR CODE