字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者&題名查詢臺灣博碩士以作者查詢全國書目
研究生中文姓名:蕭宇宏
中文論文名稱:完全親緣關係重建問題之啟發式演算法
英文論文名稱:A Heuristic Algorithm for the Full Sibling Relationship Reconstruction Problem
指導教授姓名:陳彥宏
學位類別:碩士
校院名稱:臺北市立教育大學
系所名稱:資訊科學系碩士班
論文出版年:100
畢業學年度:100
語文別:中文
中文關鍵詞:生物資訊分群完全親緣關係重建問題孟德爾遺傳法則4對偶特性圖論連通圖啟發式演算法
英文關鍵字:Bioinformaticspartitionfull sibling reconstruction problemMendelian inheritance rule4-allele conditionheuristic algorithmconnected componentgraph theory
相關次數:
  • 推薦推薦:0
  • 點閱點閱:142
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏收藏:0
生物資訊為近年來非常熱門的研究領域,是利用計算機的技術幫生物學家解決生物學上的問題。在眾多的生物學問題中,如何找出一群物種間的親緣關係是非常重要的一個主題。本研究主要是探討一個命名為完全親緣關係重建問題(full sibling reconstruction problem)並設計一個啟發式演算法解決此問題。給定n個物種(elements)的集合U,每個物種i有l個基因座(locus),每個基因座有兩個對偶基因<xij,yij>,0<jl,其中xij 和yij 為物種i在第j基因座的2個對偶基因。根據孟德爾遺傳法則假設,一個親緣關係的群集(a full sibling group)的必要條件為群集內每個物種的每個基因座的所有對偶基因不能超過4種(即4-allele特性)。因此,完全親緣關係重建問題(full sibling reconstruction problem)定義為在不知道物種的父代情形下將n個物種進行最少群組的分群,每個群集S須滿足完全四對偶特性(4-allele),即(  1jl | i  S xij  xij |4 )。當基因座的數目為2(respectively, O(n3))時,除非RP=NP,否則這問題無法設計出1.00014(respectively, 1.0065)倍的近似演算法。本研究將透過圖論的連通圖概念設計一個啟發式演算法來解決完全親緣關係重建問題並進行模擬測試。根據我們的數據顯示我們的啟發式演算法所得的解與最佳解間的倍率最高到2倍。而且我們啟發式演算法所用之時間平均為15至171微秒(ms),低於之前Berger-Wolf等人所提出的方法。
Bioinformatics is an important interdisciplinary field between Computer Science and Biology. The purpose is to develop some algorithms to solve some biological problems. One of these problems is to reconstruct the full sibling groups without parental information from a single generation of individuals and hence many algorithms had been proposed. Let U be a population of n diploid individuals of the same generation at l loci and <xij,yij>, 0<jl, are the two alleles of the individual i at locus j. A group of individuals with the same parents is called as the full sibling group. A group SU of individuals is consistent with the 4-allele property if no more than 4 alleles at any locus (i,e.,  1jl | i  S xij  xij |4). For the full sibling group, the 4-allele property is a necessary condition of the rule of Mendelian inheritance. Hence, the full sibling reconstruction problem is concerned with the determination of a partition from U with minimum number of groups such that each group S is a full sibling (i.e., satisfy the 4-allele property). This problem had been showed to be no 1.0065-approximation algorithm (respectively, 1.00014-approximation algorithm) unless RP=NP, when the number of loci is O(n3) (respectively, 2). In this thesis, we will design a heuristic algorithm for the full sibling reconstruction problem and perform the algorithm on some simulated data. This algorithm is based on the connected component for graph theory. It will show that our algorithms can improve the computational speed of Berger-Wolf’s heuristic algorithm for the full sibling reconstruction problem. It will also show that the number of groups of the heuristic solution is at most 2 times the number of groups of the optimal solution for the full sibling reconstruction problem.
中文摘要....................................................................................................I
英文摘要...................................................................................................II
目次..........................................................................................................III
圖目次......................................................................................................IV
表目次......................................................................................................VI
第一章 緒論...............................................................................................1
    第一節 研究背景與動機..................................................................1
  第二節 問題描述..............................................................................1
第三節 研究目的..............................................................................4
  第四節 研究架構..............................................................................4
第二章 文獻探討......................................................................................5
第一節 分群演算法..........................................................................5
  第二節 親緣關係的重建..................................................................6
第三章 研究方法....................................................................................10
第一節 方法概述............................................................................10
  第二節 演算法流程圖....................................................................11
第四章 實測驗證....................................................................................16
第五章 結論與未來研究方向................................................................36
參考文獻..................................................................................................37
[1] A. Almudevar, C. Field, Estimation of single generation sibling relationships based on DNA markers, Journal of Agricultura, Biological, and Environmental Statistics 4 (1999), 136–165.

[2] A. Archer, R. Rajagopalan, D. B. Shmoys, Lagrangian relaxation for the k-median problem: new insights and continuity properties, in: Proc. 11th Annu. European Symp. on Algorithms (2003), 31–42.

[3] M. V. Ashley, T. Y. Berger-Wolf, W. Chaovalitwongse, B. DasGupta, S. Sheikh, On Approximating An Implicit Cover Problem in Biology, Proceedings of 5th International Conference on Algorithmic Aspects in Information and Management (2009), 43–54.

[4] M. Ashley, T. Berger-Wolf, I. Caballero, W. Chaovalitwongse, B. DasGupta, P. Govindan, S. Sheikh, Full Sibling Reconstructions in Wild Populations From Microsatellite Genetic Markers, accepted for the Computational Biology, New Research, Nova Science Publishers (2009), 231–258.

[5] M. V. Ashley, I. C. Caballero, W. Chaovalitwongse, B. DasGupta, P. Govindan, S. Sheikh, T. Y. Berger-Wolf, KINALYZER, A Computer Program for Reconstructing Sibling Groups, Molecular Ecology Resources (2009), 1127–1131.

[6] A. M. Bagirov, K. Mardaneh, Modified global k-means Algorithm for Clustering in Gene Expression Data Sets, Intelligent systems for Bioinformatics (2006), 23–28.



[7] A. Ben-Dor, Z. Yakhin, Clustering gene expression patterns, Journal of Computational Biology 6 (1999), 281–297.

[8] T. Y. Berger-Wolf, B. DasGupta, W. Chaovalitwongse, M. V. Ashley, Combinatorial reconstruction of sibling relationships, In Proceedings of the 6th International Symposium on Computational Biology and Genome Informatics (2005), 1252–1255.

[9] T. Y. Berger-Wolf, S. I. Sheikh, B. Dasgupta, M. V. A. I. C. Caballero, W. Chaovalitwongse, S. P. Lahari, Reconstructing sibling relationships in wild populations, Bioinformatics (2007), 49–56.

[10] J. Beyer, B. May, A graph-theoretic approach to the partition of individuals into full-sib families, Molecular Ecology 12 (2003), 2243–2250.

[11] W. Chaovalitwongse, C-A Chou, T. Y. Berger-Wolf, B. DasGupta, S. Sheikh, M. V. Ashley, I. C. Caballero, New Optimization Model and Algorithm for Sibling Reconstruction from Genetic Markers, INFORMS Journal of Computing (2010), 180–194.

[12] M. Charikar, S. Guha, E. Tardos, D. B. Shmoys, A constant factor approximation algorithm for the k-median problem, Journal Computer System Sciences 65 (2002), 129–149.

[13] Y. C. Chiou, L. W. Lan, Genetic clustering algorithms, European Journal of Operational Research 135 (2001), 413–427.



[14] S. Dasgupta, P. M. Long, Performance guarantees for hierarchical clustering, Journal of Computer and System Sciences (2005), 555–569.

[15] Z. Drezner, Facility Location, A Survey of Applications and theory, Springer Series in Operations Research, New York, 1995.

[16] J. Fernández, M. A. Toro, A new method to estimate relatedness from molecular markers, Molecular Ecology 15 (2006), 1657–1667.

[17] S. Guha, S. Khuller, Greedy strikes back: Improved facility location algorithms, Journal of algorithms 31 (1999), 228–248.

[18] D. Gusfield, Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology, Cambridge University Press, 1997.

[19] J. Han, M. Kamber, Data Mining Concepts and Techniques, Academic Press 2nd edition, San Francisco, 2006.

[20] D. Hochbaum, D.B. Shmoys, A best possible heuristic for the k-center problem, Mathematics of Operations Research 10 (1985), 180–184.

[21] E. Jensen, A dynamic programming algorithm for cluster analysis, Operation Research 12 (1969), 1034–1057.

[22] D. A. Konovalov, C. Manning, M. T. Henshaw, KinGroup, a program for pedigree relationship reconstruction and kin group assignments using genetic markers, Molecular Ecology Notes 4 (2004), 779–782.


[23] S. Olafsson, X. Li, S. Wu, Operations research and data mining, European Journal on Operational Research 187 (2008), 1429–1448.

[24] J. Pinte'r, G. Pesti, Set partition by globally optimized cluster seed points, European Journal of Operational Research 51 (1991), 127–135.

[25] B. Saglam, F.S. Salman, S. Sayin, M. Turkay, A mixed-integer programming approach to the clustering problem with an application in customer segmentation, European Journal of Operational Research 173 (2006), 866–879.

[26] D. Sankoff, J. Kruskal, Time warps, string edits and Macromolecules: the theory and practice of sequence comparison, Addision-Wesley, 1983.

[27] J. Setubal, J. Meidanis, Introduction to Computational Molecular Biology, PWS Publishing Company, 1997.

[28] S. Sheikh, T. Y. Berger-Wolf, W. Chaovalitwongse, B. DasGupta, M. Ashley, Reconstructing Sibling Relationships from Microsatellite Data, European Conf. on Computational Biology (2007).

[29] S. I. Sheikh, T. Y. Berger-Wolf, M. V. Ashley, I. C. Caballero, W. Chaovalitwongse, B. DasGupta, Error Tolerant Sibship Reconstruction for Wild Populations, Proc. of the 7th Annual International Conference on Computational Systems Biology (2008), 273–284.

[30] S. I. Sheikh, T. Y. Berger-Wolf, A. A. Khokhar, B. DasGupta. Consensus methods for reconstruction of sibling relationships from genetic data, In Proceedings of the 4th Workshop on Advances in Preference Handling (2008), 97–102.

[31] S. I. Sheikh, T. Y. Berger-Wolf, A. Khokhar, I. C. Caballero, M. V. Ashley, W. Chaovalitwongse, B. DasGupta, Combinatorial Reconstruction of Half-Sibling Groups, Proceedings of 8th Annual International Conference on Computational Systems Biology (2009), 59–67.

[32] S. I. Sheikh, T. Y. Berger-Wolf, A. Khokhar, I. C. Caballero, M. V. Ashley, W. Chaovalitwongse, C. A. Chou, B. DasGupta, Combinatorial Reconstruction of Half-sibling Groups from Microsatellite DataJournal, Bioinformatics and Computational Biology (2010), 337–56.

[33] S. I. Sheikh, A. A. Khokhar, T. Y. Berger-Wolf, Efficient and scalable parallel reconstruction of sibling relationships from genetic data in wild-populations, Proceedings of Tenth IEEE International Workshop on High Performance Computational Biology held with IPDPS (2010), 1–8.

[34] D. B. Shmoys, approximation algorithms for the clustering problem, In Proceedings of the 12th Annual Conference on Computational Learning Theory (1999), 100–101.

[35] D.B. Shmoys, E. Tardos, K.I. Aardal, Approximation algorithms for facility location problems, In the Proceedings of the 29th Annual ACM Symposium on Theory of Computing (1997), 265–274.

[36] P. N. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining, Addison Wesley, 2006.

[37] A. J. Vakharia, J. Mahajan, The Clustering of objects and attributes for manufacturing and marketing applications, European Journal of Operational Research 123 (2000), 640–651.
[38] J. Wang, Sibship reconstruction from genetic data with typing errors, Genetics (2004), 1968–1979.

[39] 曾憲雄、蔡秀滿、蘇東興、曾秋蓉、王慶堯 (民97)。資料探勘 (Data mining)。臺北市:旗標。
(此全文未開放授權)
封面
謝誌
中文摘要
英文摘要
目次
第一章
第二章
第三章
第四章
第五章
參考文獻
校內電子全文開放日期:不公開
校外電子全文開放日期:不公開
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *