Options
2014
Journal Article
Titel
Multi-Label-Classification to predict repeated dose toxicity in the context of REACH
Abstract
Repeated dose toxicity is a very complex toxicological endpoint and several attempts have been made to get an acceptable prediction of this endpoint to fill data gaps in the REACH dossiers without using too many animal studies. In the present project, we do apply Multi-Label-Classification (MLC) to simultaneously predict multiple toxic effects of chemical compounds. The final, validated MLC models are well documented and published as freely accessible prediction web service. The predicted target values are based on toxicity endpoints from over 2000 repeated dose toxicity studies in rats with sub-acute or sub-chronic duration and oral or inhalation application. The data basis was joined from publicly available repeated dose toxicity data of industrial chemicals (RepDose data base), and confidential toxicity data of the European List of Notified Chemical Substances (ELINCS) data base. Although this data set with 1022 compounds and 28 endpoints is very comprehensive and detailed from a toxicological point of view it is extremely sparse for statistical analysis: 82% of the experimentally derived endpoint values are missing as the endpoint was either not considered in the particular study, or there was no measurable effect in all investigated dose groups. We have discretized the numeric endpoint values to the binary class values 'active' and 'inactive'. Thus, the LOEL values have been clustered, while taking the study duration into account. We calculated physico-chemical descriptors and structural fragments as input features for the prediction models. The structural fragments have been computed by matching the dataset compounds with established, pre-defined lists of SMARTS (SMiles ARbitrary Target Specification) patterns. We evaluated various MLC algorithms with a 10-times repeated 10-fold cross-validation. The selected MLC technique 'Ensemble of Classifier Chains' (ECC) exploits correlations between endpoints by repeatedly building chains of multiple classifiers in randomized orders. The overall predictive performance of this method is 68% Area-under-ROC (AUC) (63% sensitivity, 61% selectivity). We compared the ECC approach with 'Predictive Clustering Trees' (PCT) which is an alternative MLC method; PCT which is slightly less predictive, but is applied within this project to create categories. This project is part of a BMBF project 'Strategies to develop chemical categories in the context of REACH'.
Author(s)