A machine-learned knowledge discovery method for associating complex phenotypes with complex genotypes. Application to pain
Background: The association of genotyping information with common traits is not satisfactorily solved. One of the most complex traits is pain and association studies have failed so far to provide reproducible predictions of pain phenotypes from genotypes in the general population despite a well-established genetic basis of pain. We therefore aimed at developing a method able to prospectively and highly accurately predict pain phenotype from the underlying genotype. Methods: Complex phenotypes and genotypes were obtained from experimental pain data including four different pain stimuli and genotypes with respect to 30 reportedly pain relevant variants in 10 genes. The training data set was obtained in 125 healthy volunteers and the independent prospective test data set was obtained in 89 subjects. The approach involved supervised machine learning. Results: The phenotype-genotype association was reached in three major steps. First, the pain phenotype data was projected an d clustered by means of emergent self-organizing map (ESOM) analysis and subsequent U-matrix visualization. Second, pain sub-phenotypes were identified by interpreting the cluster structure using classification and regression tree classifiers. Third, a supervised machine learning algorithm (Unweighted Label Rule generation) was applied to genetic markers reportedly modulating pain to obtain a complex genotype underlying the identified subgroups of subjects with homogenous pain response. This procedure correctly identified 80% of the subjects as belonging to an extreme pain phenotype in an independently and prospectively assessed cohort. Conclusion: The developed methodology is a suitable basis for complex genotype-phenotype associations in pain. It may provide personalized treatments of complex traits. Due to its generality, this new method should also be applicable to other association tasks except pain.