Under CopyrightSchwikowski, B.B.Schwikowski2022-03-0631.07.20021998https://publica.fraunhofer.de/handle/publica/27333810.24406/publica-fhg-273338Betrachtet man die Evolution biologischer Sequenzen, konzentriert man sich häufig auf einen von zwei Aspekten: deren multiples Alignment, das die Buchstaben verschiedener Sequenzen miteinander in Beziehung setzt, oder den evolutionären Baum, der die Abstammungsgeschichte der vollständigen Sequenzen beschreibt. In der Praxis müssen Alignment und Baum meist beide aus den Sequenzen selbst rekonstruiert werden, und beide Aspekte sind eng miteinander korreliert. Deshalb bietet sich die gleichzeitige Berechnung von Alignment und Baum an. Das entsprechende Optimierungsproblem stellt allerdings in theoretischer und in praktischer Hinsicht ein schwieriges Problem dar. Mit dieser Arbeit wird der erste Ansatz vorgestellt, multiple Alignments und evolutionäre Bäume gleichzeitig und für reale Probleminstanzen zu lösen. Er ist motiviert durch eine Analogie zum Steiner-Problem aus der Informatik einerseits, und den in der praktischen Bioinformatik erfolgreichen iterativen Alignmentverfahren andererseits. Dabei vermeidet die neue Datenstruktur des gewichteten Sequenzgraphen typische Nachteile iterativer Alignmentverfahren. Neben dem Verfahren selbst werden auch die im Zuge dieser Arbeit entstandene Implementierung, sowie Experimente auf simulierten und realen Daten besprochen.In understanding the evolution of biological sequences, it is common practice to focus at one of two aspects: either the multiple alignment of the characters that relates the individual characters of the sequences, or the evolutionary tree that describes the evolutionary history of the complete sequences. Usually, alignment and tree must both be reconstructed from the sequences alone, and both aspects are closely interrelated. It is therefore desirable to simultaneously reconstruct alignment and tree from the given sequences. However, this represents a difficult task in theory and in practice. This study presents the first approach to the simultaneous reconstruction of multiple alignments and evolutionary trees for real data. It is motivated by an analogy to the Steiner problem from computer science and to iterative alignment algorithms, which are successful in practical bioinformatics. The data structure of the weighed sequence graph avoids typical disadvantages of previous iterative algorithms. Besides the new algorithm, we describe implementation and experimental results on simulated and real biological data.Synopsis S.iii Kurzfassung (German) S.iii Acknowledgment S.vii Chapter 1. Introduction S.1 - 1.1. Overview S.2 - 1.2. Using Score Functions S.5 Chapter 2. Foundations S.9 - 2.1. Sequence Comparison S.9 - 2.2. Pairwise Sequence Alignment S.9 - 2.3. Multiple Sequence Alignment S.14 - 2.4. Fitch's Method S.20 - 2.5. Evolutionary Trees S.25 - 2.6. The Circular Dependence S.26 Chapter 3. Steiner Trees S.29 - 3.1. Generalized Tree Alignment and the Steiner Problem S.29 - 3.2. Applying Steiner Tree Approaches S.33 - 4.1. Path Heuristics S.37 Chapter 4. Deferred Path Heuristics S.37 - 4.2. Correspondence between Shortest Paths and Optimal Alignments S.39 - 4.3. Implications for our Method S.44 - 4.4. Sequence Graphs S.47 - 4.5. Shortest Paths between Candidate Sets S.48 - 4.6. A New Overall Algorithm: Deferred Path Heuristics S.55 - 4.7. Variations that Maintain a Guaranteed Error Bound of (2 , 2=n ) for n Sequences S.57 Chapter 5. Three Fundamental Enhancements S.63 - 5.1. An Algorithm for A ne Linear Gap Score Functions S.64 - 5.2. A Common Algorithmic Scheme for Tree Alignments S.75 - 5.3. Nonmetric Score Functions and Detours S.80 Chapter 6. Implementation of Deferred Path Heuristics S.101 - 6.1. Object-Oriented Software Design S.101 - 6.2. Implementation of the Time-Critical Core Algorithm S.102 Chapter 7. Experimental Evaluation S.111 - 7.1. General Considerations S.112 - 7.2. Comparison with Hein's Treealign S.116 - 7.3. Detour Graphs - Case Study S.121 - 7.4. Evaluation on Structurally Aligned Proteins S.123 - 7.5. Evaluation on a Random Model of Sequence Evolution S.131 - 7.6. Conclusions from our Experiments S.142 Chapter 8. Conclusion S.145 - 8.1. Weighted Sequence Graphs S.145 - 8.2. Deferred Path Heuristics S.146 - 8.3. Tree Alignment Score S.148 - 8.4. Outlook S.149 Appendix A. Frequently Used Symbols S.151 Appendix B. Experimental Evaluation - Details S.153 - B.1. Rose Parameters S.153 - B.2. Evolutionary Trees for the Sample Tree Alignments S.157 References S.159enmolekulare BioinformatikAlgorithmenmultiples Alignmentevolutionärer Baumphylogenetischer Baumcomputational molecular biologyalgorithmmultiple alignmentevolutionary treephylogenetic treegeneralized tree alignment problem003005006518A new algorithmic approach to the construction of multiple alignments and evolutionary treesdoctoral thesis