Options
2014
Journal Article
Titel
Development of chemical categories by optimized clustering strategies
Abstract
According to the OECD definition a chemical category is a group of chemicals whose physicochemical and human health and/or ecotoxicological properties and/or environmental fate properties are likely to be similar or follow a regular pattern. The building of categories has often been tried on the basis of conventional structure based approaches where similarities are based on functional groups, common constituents and physicochemical properties only. In the present project we developed an approach by which toxicological and structural properties likewise contribute to the building of chemical categories for (sub)chronic toxicity. As data basis we used two databases on repeated-dose toxicity (RepDose and the "ELINCS" data base). The toxicological data are organized into organ toxicity in parts splitted into subgroups according to similarities at the phenotypic and at the mechanistic level. For the definition of a category, the following characteristics were considered: organ investigated, not investigated, no findings, findings; potency in terms of no observed adverse effect level (NOAEL), organ specificity. Several clustering methods have been tested in the project and in the final version a multi-label clustering by using predictive clustering trees (PCT) was established. Several critical decisions had to be considered carefully during development and refinement of the method; they concerned the structural features and chemicals properties on the one hand and the toxicological data set on the other. Decision had to be taken on: - the selection of the appropriate features and their SMARTS description - the non-inclusion of PC parameters - the use of imputation methods to handle missing values - the level of detail for a consistent representation of toxicological data versus the density of data in the matrix. During method development all resulting categories (clusters) were visualized by using CheS-Mapper (1) and were checked by expert judgment for their plausibility. One important decision about a stopping criterion for clustering was the use of toxicological variance data in combination with a statistical significance test. In the process of developing this new approach we needed many incremental improvements; the final approach produces a set of useful and representative clusters now. This project is supported by BMBF in the funding focus "Ersatzmethoden zum Tierversuch" (1) http://opentox.informatik.uni-freiburg.de/ches-mapper/
Author(s)