簡易檢索 / 詳目顯示

研究生: 張中議
Chang, Chung Yi
論文名稱: 應用資訊增益、簡化群體演算法及輪式搜尋策略於基因選取之研究
Information gain and wheel based simplified swarm optimization for gene selection from gene expression data
指導教授: 葉維彰
Yeh, Wei Chang
口試委員: 劉淑範
黃佳玲
學位類別: 碩士
Master
系所名稱: 工學院 - 工業工程與工程管理學系
Department of Industrial Engineering and Engineering Management
論文出版年: 2015
畢業學年度: 103
語文別: 英文
論文頁數: 42
中文關鍵詞: 簡化群體演算法、基因選取、基因表現資料、資訊增益、分類
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在過去幾年間,特徵選取方法已被廣泛使用,特徵選取是被運用在分類問題中一種縮減維度的工具,它的目的是找出資料中最具有鑑別力的特徵,同時提升分類的準確率。特徵選取能夠降低資料的雜訊與演算成本,尤其是當資料量龐大的時候,效果更為可觀;所以,特徵選取非常適合應用在高維度且雜訊多的真實資料上,例如醫學上可以利用此一方法來篩選出可能導致癌症的重要基因,並提升癌症的鑑別率,此時,特徵選取稱為基因選取。基因選取可以幫助醫生及早發現並及早治療癌症以提升治癒率。本篇研究透過10組癌症的基因資料集建立一個有效的基因選取模型;此一模型結合資訊增益、簡化群體演算法以及輪式搜尋策略形成完整之基因選取方法。首先,我們利用資訊增益將冗餘的基因剃除;其次,將剩餘基因利用柔性運算的簡化群體演算法以及輪式搜尋策略找出真正具有鑑別度的少數基因。在演算法選擇基因的過程中,利用支持向量機器搭配留一交叉驗證來計算準確率。為驗證演算法效能,我們將本研究提出的演算法與過去文獻提出之方法做比較與討論;結果顯示,本研究提出的資訊增益搭配簡化群體演算法及輪式搜尋策略的基因選取模型能夠在選取較少基因的情形下達到更高的準確率。


    Recently, feature selection has been an important issue in data mining problems. The object of feature selection is to find the most distinguished features among datasets which have enormous number of features and then improve the classification accuracy. Feature selection can reduce the noise and save lots of time and costs for researchers, especially when the volume of data is huge. Feature selection has wide applications for high dimensional real world situations such as cancer research in medical field. When feature selection is being used in cancer research to find cancerous genes, it is called “gene selection”. With gene selection, doctors can find the symptoms or signs of cancer at early stage and enhance the survival rate. In this paper, we try to develop an effective gene selection model for ten benchmark gene expression datasets. We proposed an information gain and wheel-based simplified swarm optimization (IG-WSSO) to solve the problem. Initially, we used information gain (IG) to remove irrelevant genes. Then, we conducted simplified swarm optimization with the wheel based search strategy for gene selection (WSSO). Support vector machine (SVM) with leave one out cross validation (LOOCV) was adopted to evaluate the accuracy. We compared our algorithm, IG-WSSO, with previous research by running ten benchmark datasets of gene expression data, which can be downloaded on: http://www.gems-system.org/. The results show IG-WSSO can achieve higher classification accuracy by selecting less number of genes.

    CONTENTS 中文摘要 II ABSTRACT III CONTENTS IV LIST OF TABLES VI LIST OF FIGURES VIII CHAPTER 1 INTRODUCTION 1 1.1 Background and Motivation 1 1.2 Framework and Organization 5 CHAPTER 2 LITERATURE REVIEW 7 2.1 Soft Computing in Feature Selection 7 2.2 Gene selection 7 CHAPTER 3 METHODOLOGY 10 3.1 Information gain 10 3.2 Simplified swarm optimization (SSO) 11 3.3 Support Vector Machine (SVM) 12 3.4 Wheel based search strategy 13 CHAPTER 4 THE PROPOSED ALGORITHM 16 4.1 Solution Representation 16 4.2 Parameter Settings 17 4.3 The proposed IG-WSSO 18 4.4 Fitness Function 19 4.5 Procedures of IG-WSSO 19 CHAPTER 5 EXPERIMENTAL RESULTS 22 5.1 Experiment data 22 5.2 Compared with previous research 23 5.3 Compared with MOBBBO 31 5.4 Compared with DEFs 34 CHAPTER 6 CONCLUSIONS 37 6.1 Discussion and conclusion 37 6.2 Limitation and future works 38 REFERENCES 39

    [1] Guyon, I., Weston, J. and Barnhill, S., Gene selection for cancer classification using support vector machines, Machine Learning, 2002. 46(1-3): p. 389-422
    [2] Li, X.T. and Yin, M.H., Multi-objective Binary Biogeography Based Optimization for Feature Selection Using Gene Expression Data, IEEE Transactions on Nano-Bioscience, 2013. 12(4): p. 343-353.
    [3] Mohamad, M.S., Omatu, S., Deris, S. and Yoshioka, M., A Modified Binary Particle Swarm Optimization for Selecting the Small Subset of Informative Genes from Gene Expression Data, IEEE Transactions on Information Technology in Biomedicine, 2011. 15(6): p. 813-822.
    [4] Liu, H. and Setino, R., A probabilistic approach to feature Selection – A Filter Solution, Machine Learning: Proceedings of the Thirteen International Conference on Machine Learning, 1996.
    [5] Saeys, Y., lnza, I. and Larrañaga, P., A review of feature selection techniques in bioinformatics, Bioinformatics, 2007. 23(19): p. 2507-2517.
    [6] Kar, S., Sharma, K.D., Maitra, M., Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique, Expert Systems with Application, 2014. 42(1): p. 612-627.
    [7] Zexuan Zhu, Yew-Soon Ong, Manoranjan Dash, Wrapper-Filter feature selection algorithm using a memetic framework, IEEE Transactions on Systems, Man, and Cybernetics-part B, Cybernetics, 2007. 37(1):
    [8] Chen, K.H., Wang, K.J., Tsai, M.L., Wang, K.M., Adrian, A.M., Cheng, W.-C., Yang, T.S., Teng, N.C., Tan, K.P., and Chang, K.S. Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm, BMC Bioinformatics, 2014. 15(49):
    [9] Huda, S., Yearwood, J., Strainieri, A., Hybrid wrapper-filter approaches for input feature selection using maximum relevance and artificial neural network input gain measurement approximation (ANNIGMA), Fourth International Conference on Network and System Security, 2010. p. 442-449.
    [10] Chuang, L.Y., Chang, H.W., Tu, C.J., and Yang, C.H., Improved binary PSO for feature selection usgin gene expression data, Comput. Bio. Chem., 2008. 32: p. 29-38.
    [11] Yang, C.H., Chuang, L.Y., Yang, C.H., IG-GA: A hybrid filter/wrapper method for feature selection of microarray data, Journal of Medical and Biological Engineering, 2009. 30(1): p. 23-28.
    [12] Al-Ani, A., Alsukker, A., Khushaba, R.N., Feature subset selection using differential evolution and a wheel based search strategy, Swarm and Evolutionary Computation, 2013. 9: p. 15-26.
    [13] Al-Obeidat, F., Belacel, N., Carretero, J.A., Mahanti, P., An Evolutionary Framework using Particle Swarm Optimization for classification method PROAFTN, Applied Soft Computing, 2011. 11(8): p.4971-4980.
    [14] Liang, J.J., Qin, A.K., Suganthan, P.N., Baskar, S., Comprehensive learning particle swarm optimizer for global optimization of multimodal functions, IEEE Transactions on Evolutionary Computation, 2006. 10(3): p. 281-295.
    [15] Yeh, W.-C., Novel swarm optimization for mining classification rules on thyroid gland data, Information Sciences, 2012. 197: p. 65-76.
    [16] Yeh, W.-C., An improved simplified swarm optimization, Knowledge-Based Systems, 2015. 82: p. 60-69.
    [17] Yeh, W.-C., Chang, W.W., and Chiu, C.W., A simplified swarm optimization for discovering the classification rule using microarray data of breast cancer, International Journal of Innovative Computing, Information and Control, 2011. 7(5): p. 2235-2246.
    [18] Yu, L., Liu, H., Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proc. 20th Internat. Conf. on Machine Leaning, 2003. p. 856–863.
    [19] Samanta, B., Al-Balushi, K.R., Al-Araimi, S.A., Artificial neural networks and support vector machines with genetic algorightm for bearing fault detection, Engineering Applications of Artificial Intelligence, 2003. 16(7-8): p. 657-665.
    [20] Huang, C.L., Wang, C.J., A GA-based feature selection and parameters optimization for support vector machines, Expert Systems with Applications, 2006. 31(4): p. 231-240.
    [21] Lin, S.W., Ying, K.C., Chen, S.C., Lee, Z.J., Particle swarm optimization for parameter determination and feature selection of support vector machines, Expert with Applications, 2008. 35(4): p. 1817-1824.
    [22] Tahir, M.A., Bouridane, A., Kurugollu, F., Simultaneous feature selection and feature weighting using hybrid tabu Search/K-nearest neighbor classifier, Pattern Recognition Letters, 2007. 28(4): p. 438-446.
    [23] Shannon, C.E., A mathematical theory of communication, The Bell System Technical Journal, 1948. 27: p. 379-423, 623-656.
    [24] Yeh, W.-C., A two-stage discrete particle swarm optimization for the problem of multiple multi-level redundancy allocation in series systems, Expert Systems with Applications, 2009. 36(5): p. 9192-9200.
    [25] Cortes, C., Vapnik, V., Support-Vector Networks, Machine Learning, 1995. 20(3): p. 273-297.
    [26] Derrac, J., Garcia, S., Molina, D., Herrera, F., A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms, Swarm and Evolutionary Computation, 2011. 1(1): p. 3-18.
    [27] Hatamlou, A., Black hole: A new heuristic optimization approach for data
    clustering, Information Sciences, 2013. 222(10): p. 175-184.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE