簡易檢索 / 詳目顯示

研究生: 林逸凡
論文名稱: RNA-seq 正規化的初探
A preliminary study of RNA-seq normalization strategies.
指導教授: 謝文萍
口試委員: 張升懋
黃冠華
學位類別: 碩士
Master
系所名稱: 理學院 - 統計學研究所
Institute of Statistics
論文出版年: 2011
畢業學年度: 99
語文別: 英文
論文頁數: 35
中文關鍵詞: RNA-seq
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • RNA-seq has been an important technology for sequencing-based transcriptome survey. In order to reveal important effect of expression level change in biology, any measurement of the RNA levels should reflect the true relative activities across dif-ferent genes and different samples. There are always issues of data normalization for all sorts of high throughput platforms. In this study, we consider two well-known stu-dies that provide interesting normalization ideas for the RNA-seq data. The first one is named re-weighting scheme and is proposed by Kasper et al. It is focused on correct-ing the bias introduced by library preparation and random priming. The second study proposes two normalization schemes by Li et al. The two models are based on Poisson regression and Multiple Additive Regression Tree. The idea is to model the read count variation by including the information of sequence composition at every position considered. Our evaluation was performed on two datasets with several criterions, including the uniformity of the normalized signals and the consistency of the relative expression levels before and after the normalization. Our results show that MART model can better achieve the uniformity of the signals after the adjustment among the ones compared and keeps the right relative expression levels.


    1. Chapter 1 Introduction 6 2. Chapter 2 Materials and Methods 10 Re-weighting scheme 11 Normalization scheme 12 3. Chapter 3 Result 16 Evaluation of the re-weighting process 16 Evaluation of the normalization schemes 22 4. Chapter 4 Discussion 31 Reference 32 Supplementary 34

    Barski, A., et al. (2007) High-resolution profiling of histone methylations in the human genome, Cell, 129, 823-837.
    Flicek, P. and Birney, E. (2009) Sense from sequence reads: methods for alignment and assembly, Nat Methods, 6, S6-S12.
    Friedman, J.H. (2001) Greedy function approximation: A gradient boosting machine, Ann Stat, 29, 1189-1232.
    Friedman, J.H. (2002) Stochastic gradient boosting, Comput Stat Data An, 38, 367-378.
    Gilad, Y., Pritchard, J.K. and Thornton, K. (2009) Characterizing natural variation using next-generation sequencing technologies, Trends Genet, 25, 463-471.
    Hansen, K.D., Brenner, S.E. and Dudoit, S. (2010) Biases in Illumina transcriptome sequencing caused by random hexamer priming, Nucleic Acids Res, 38, e131.
    Hodges, E., et al. (2009) High definition profiling of mammalian DNA methylation by array capture and single molecule bisulfite sequencing, Genome Res, 19, 1593-1605.
    Jiang, H. and Wong, W.H. (2009) Statistical inferences for isoform expression in RNA-Seq, Bioinformatics, 25, 1026-1032.
    Laurent, L., et al. (2010) Dynamic changes in the human methylome during differentiation, Genome Res, 20, 320-331.
    Li, J., Jiang, H. and Wong, W.H. (2010) Modeling non-uniformity in short-read rates in RNA-Seq data, Genome Biol, 11, R50.
    Licatalosi, D.D., et al. (2008) HITS-CLIP yields genome-wide insights into brain alternative RNA processing, Nature, 456, 464-469.
    Lister, R., et al. (2009) Human DNA methylomes at base resolution show widespread epigenomic differences, Nature, 462, 315-322.
    Medvedev, P., Stanciu, M. and Brudno, M. (2009) Computational methods for discovering structural variation with next-generation sequencing, Nat Methods, 6, S13-20.
    Namba, R., et al. (2004) Molecular characterization of the transition to malignancy in a genetically engineered mouse-based model of ductal carcinoma in situ, Mol Cancer Res, 2, 453-463.
    Okoniewski, M.J. and Miller, C.J. (2006) Hybridization interactions between probesets in short oligo microarrays lead to spurious correlations, BMC Bioinformatics, 7, 276.
    Pepke, S., Wold, B. and Mortazavi, A. (2009) Computation for ChIP-seq and RNA-seq studies, Nat Methods, 6, S22-32.
    Royce, T.E., Rozowsky, J.S. and Gerstein, M.B. (2007) Toward a universal microarray: prediction of gene expression through nearest-neighbor probe sequence identification, Nucleic Acids Res, 35, e99.
    Schwartz, S., Oren, R. and Ast, G. (2011) Detection and removal of biases in the analysis of next-generation sequencing reads, PLoS ONE, 6, e16685.
    Srivastava, S. and Chen, L. (2010) A two-parameter generalized Poisson model to improve the analysis of RNA-seq data, Nucleic Acids Res, 38, e170.
    Trapnell, C. and Salzberg, S.L. (2009) How to map billions of short reads onto genomes, Nat Biotechnol, 27, 455-457.
    Wang, Z., Gerstein, M. and Snyder, M. (2009) RNA-Seq: a revolutionary tool for transcriptomics, Nat Rev Genet, 10, 57-63.
    Wilhelm, B.T. and Landry, J.R. (2009) RNA-Seq-quantitative measurement of expression through massively parallel RNA-sequencing, Methods, 48, 249-257.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE