In silico prediction of DILI: Extraction of histopathology data from preclinical toxicity studies of the eTOX database for new in silico models of hepatotoxicity
The eTOX consortium extracted in vivo data from unpublished preclinical toxicity studies of 13 industry partners. This new database contains high-quality toxicity results in high detail level from 1,947 drug candidates (8,196 studies) supplemented with 1,286 chemicals from the RepDose database (2,695 studies). Different compilation steps were applied to transform these data into usable in silico model training datasets: Initially, all toxicity findings were extracted from study reports (paper/PDF). Then the verbatim terms for all treatment-related hepatotoxicity findings were harmonized using special ontologies. Finally, to receive model training sets with sufficient compound numbers and chemical space coverage, all primary histopathology terms were combined and grouped to different first- and then second-level clusters of similar toxicity mechanisms: e.g., primary necrosis terms such as ""centrilobular,"" ""periportal,"" etc. were grouped to first-level cluster ""necrosis,"" then clusters such as ""necrosis,"" ""vacuolization,"" etc., were grouped to second-level cluster ""degenerative lesions."" With this approach, various training datasets were compiled depending on the species (rat, dog and monkey), treatment durations (2 weeks-2 years) and administration routes. Then, different modeling approaches were applied on these datasets, including structural alerts, fragment-based and molecular descriptor-based machine learning approaches (e.g., random forest, decision tree, k-nearest neighbor). Models were validated and optimized, first by internal validation (test set 10%) then by external validation using Sanofi's confidential data. For example, best external validation results (n=66) were achieved for the first-cluster rat necrosis models (229 positives, 198 negatives) using fragment-based (Sensitivity: 0.80, Specificity: 0.77) and a molecular descriptor-based decision tree approach (Sensitivity: 0.81, Specificity: 0.88). These validation results show that by reasonable clustering histopathology data from eTOX, it is possible to develop highly predictive in silico models for drug-induced liver injury (DILI).