RNA in phylogenetic reconstruction
Ribosomal RNA (rRNA) genes are the most widely used data source in phylogenetic reconstruction. They are highlystructured, with large parts of the molecules exhibiting very strong conservation of their base pairing patterns. Since also sequence conservation varies dramatically between different regions of rRNA genes, these data are informative on a wide range of phylogenetic timescales, ranging from recent to ancient splits. But individual columns of rRNA alignments are not independent. This is a consequence of biologically functional secondary structures with highly varying degrees of conservation. Therefore, any correlations within sequence alignments of structurally functional RNA will distort the phylogenetic signal and/or lead to gross overestimates of tree stability. On the other hand, alignment accuracy can be improved substantially by incorporating secondary structure conservation. Maximum Likelihood and Bayesian approaches are amenable to using RNA-specific sub stitution models that treat conserved base pairs appropriately, but they require accurate secondary structure models as input. Structure prediction algorithms for single RNA sequences are well known and widely used; but it is not straightforward to apply these predictions for phylogenetic purposes. The main limitation is that accuracy of thermodynamic folding algorithms declines sharply as the length of the RNA increases, this is in part due toinaccuracies within the thermodynamic folding parameters, in part caused by the kinetics of folding process and tertiary interactions, and even more in the fact that RNA and protein components of the ribosome are tightly packed and thus mutually influence their folds. The functional rRNA structures, therefore, cannot reasonably be expected to be identical with the minimum free energy structures of isolated rRNAs computed by thermodynamic folding algorithms.