簡易檢索 / 詳目顯示

研究生: 廖柏鈞
Liao, Bo-Jun
論文名稱: 分群與合併的多維尺度降維法之數學背景探討
Discussion on the Mathematical Background on Split-and-Combine Multidimensional Scaling
指導教授: 鄭經斅
Cheng, Ching-hsiao
朱家杰
Chu, Chia-Chieh Jay
口試委員: 薛名成
Shiue, Ming-Cheng
蘇承芳
Su, Cheng-Fang
學位類別: 碩士
Master
系所名稱: 理學院 - 數學系
Department of Mathematics
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 20
中文關鍵詞: 降維PCAMDSSC-MDS
外文關鍵詞: Dimensionality reduction, PCA, MDS, SC-MDS
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在現代,資料科學是一門越來越重要的一門學問,我們可以透過資料科學去做到分析、預測等等的事情,像是大數據、人工AI都是現在熱門發展的技術。

    在資料科學裡,最基礎也是極為重要的便是對於資料的處理方式,而資料處理的階段主要可以分為資料的預處理以及資料的演算,想當然資料的演算方式必定是不可勝數;然而資料的預處理主要的目的是在於優化演算階段的效率、成果等等,在資訊爆炸的時代裡,我們所接收的資料數據可能千奇百怪,拿到的樣本資料如果是完整且數量合適,那是再好不過的情況,但假使拿到的資料是不完整,我們便會需要對資料預先進行缺失值的填補動作;而假如資料過於稀少或者是某部分資料相對較少,那必定是會影響分析、預測的結果,我們必須透過採樣、集成等學問來預處理;相對地來看,如果拿到的樣本資料是非常的龐大,那麼對於電腦執行肯定是會加重負擔,過於龐大的資料,會導致電腦運算的速度變慢,資料的整理更為不易,甚至是更容易的出現雜訊,都導致龐大的資料處理起來是更為棘手,在這樣的情況下,我們便可以選擇透過"降維"來避免其中的這些情況。

    降維是一門技術在資料科學裡極其重要,是將資料進行維度降低的一個動作,直觀上,不僅可以透過降低維度減少資料量使得計算難度減少,而且降維也具備其他的效果,在不同領域上佔有一定重要性;在現行的情況,降維的方法是有不少種的,而主要常見的還是以主成分分析Principle component analysis(PCA)以及多維尺度降維法Multidimensional Scaling(MDS)這兩個基礎的方法,如同上面所說,資料的數量也是會影響到降維的執行速度,所以在技術上,也日漸出現其他以基礎的降維手法去進行修改、優化方法,以MDS來說,近年就有由國立政治大學的曾正男教授發表的論文中,就有以MDS為基礎進行改進的方法-分群與合併的多維尺度降維法 Split-and-Combine Multidimensional Scaling(SC-MDS),而在計算速度上,讓原本是與資料數量的三次方成正比的MDS變成與資料數量成正比的SC-MDS。

    在曾正男教授以及日後的盧鴻興教授指導其學生陳珮琦等人在SC-MDS的研究,其中透過實驗的數據顯示,讓我們充分的了解SC-MDS這個方法在降維上所做出的貢獻,所以在今日,本人想以MDS為基礎,去研究降維的理論基礎和方式的可行性,更重要的是去對SC-MDS演算方式進行了解以及對此方法的數學背景知識的建構去做一個理論上的探討。


    Data science is an increasingly important subject in the modern subject; we can accomplish an analysis or a forecast with data science. It’s so popular, for example, AI, Big data.

    In data science, the technique of the process of data. There are the preprocessing of the data and the algorithm of the data in the phase of data. The algorithm of the data is countless. However, the main purpose of the preprocessing of the data improves the effect of the algorithm. In the era of the information explosion, we may accept various data. It’s the best situation if the data set we have is complete with the appropriate quantity, but if the data set is incomplete, we will need to improve with the part of the missing data. If data is sparse or some part of data is rare, it will influence the analysis or prediction. So we must use some technique of preprocessing like sampling, ensembling. On the other hand, if the data set is too huge, it will get more burdens for the computer's execution, leading to the speed of the calculation of the computer dropdown. The arrangement of the data will be difficult. Even it will appear noise in the more chance. Therefore, a huge data set will lead to getting into trouble in the process. Under the situation, we can use dimensionality reduction to avoid those circumstances.

    Dimensionality reduction is an important technique in data science that will reduce the dimension of the data. Intuitively, it can decrease the amount of the data and difficulty of the calculation and has another effect in different fields that are significant. Now, there are several distinct manners of dimensionality education. Principle component analysis (PCA) and Multidimensional Scaling(MDS) are the common methods that are elementary.

    As above, the speed of dimensionality reduction is also affected by the amount of data. Therefore, some elementary methods are optimized gradually. For MDS, there is an optimized method calling Split-and-Combine Multidimensional Scaling (SC-MDS) based on MDS from the paper of Jengnan Tzeng professor employed in recent years by the National Chengchi University.

    On the speed of calculation, SC-MDS is better than MDS with the speed of MDS is proportional to cubic of the amount of data and speed of SC-MDS is proportional to the amount of data.

    Because Jiangnan Tzeng professor and Pei-Chi Chen instructed by Horng-Shing LU professor research on SC-MDS respectively, exhibiting the data of experiment especially, let’s comprehend the significance of SC-MDS the dimensionality reduction plenty. Therefore, I want to research the theory of dimensionality reduction by using MDS and research the algorithm of SC-MDS by discussing its mathematics background theorem.

    1.Introduction...3 1.1降維...3 1.2 主要降維觀點...3 1.2.1 Multidimensional Scaling...3 1.2.2 Principle component analysis...3 1.2.3 Split-and-Combine Multidimensional Scaling...4 1.3降維方法...4 2.降維演算法...5 2.1多維標度法...5 2.1.1 Classical MDS...6 2.1.2 Non-Classical MDS...9 2.1.3 Non-metric MDS...9 2.2主成分分析法...10 3.演算法的改進...13 3.1 Spring Models...13 3.2 Chalmers’ 1996 Algorithm...14 4.分群與合併的多維尺度降維法...17 4.1分群與合併的多維尺度降維法SC-MDS...17 5.結論...19

    屈太國,蔡自興(2016)。快速多維標度算法研究。湖南:衡陽師範學院計算機科學與技術學院,智能信息處理與應用湖南省重點實驗室。

    陳珮琦(2008)。分群與合併的多元尺度分析法之最佳分群決策與遺失值問題的討論。新竹:國立交通大學統計學研究所。

    陳烽威(2012)。勞工職位特質分析-多元尺度法於大資料分析之應用。臺北:國立政治大學經濟研究所。

    A.Mead(1992). Review of the Development of Multidimensional Scaling Methods. The Statistician,41,27-39

    Alistair Morrison, Greg Ross and Matthew Chalmers(2003). Fast Multidimensional Scaling through Sampling, Springs and Interpolation.UK:Department of Computing Science, University of Glasgow.

    Donald A. Jackson(1993). Stopping Rules in Principal Component Analysis: A Comparison of Heuristical and Statistical Approaches. Ecology, Vol. 74, No. 8. 2204-2214

    Florian Wickelmaier(2003). An Introduction to MDS. Sound Quality Research Unit, Aalborg University, Denmark.

    Gale Young and A. S. Householder(1938). Discussion of a Set of Points in Terms of Their Mutual Distances.Psychometrika–Vol. 3, No. 1.

    Jengnan Tzeng(2008). Multidimensional scaling for large genomic data sets.Taipei:Genomics Research Center, Academia Sinica.

    Mark Steyvers(2001). Multidimensional scaling. Encyclopedia of Cognitive Science, Macmillan Reference Ltd

    Matthew Chalmers(1996). A Linear Iteration Time Layout Algorithm for Visualising
    High–Dimensional Data.Switzerland:UBILAB, Union Bank.

    Trevor F. Cox and Michael A.A. Cox(2001). Multidimensional scaling,Sceond Edition,CHAPMAN & HALL/CRC

    QR CODE