Treffer: A hierarchical clustering approach for large compound libraries.

Title:
A hierarchical clustering approach for large compound libraries.
Authors:
Böcker A; Johann Wolfgang Goethe-Universität, Institut für Organische Chemie und Chemische Biologie, Marie-Curie-Str. 11, D-60439 Frankfurt, Germany., Derksen S, Schmidt E, Teckentrup A, Schneider G
Source:
Journal of chemical information and modeling [J Chem Inf Model] 2005 Jul-Aug; Vol. 45 (4), pp. 807-15.
Publication Type:
Journal Article; Research Support, Non-U.S. Gov't
Language:
English
Journal Info:
Publisher: American Chemical Society Country of Publication: United States NLM ID: 101230060 Publication Model: Print Cited Medium: Print ISSN: 1549-9596 (Print) Linking ISSN: 15499596 NLM ISO Abbreviation: J Chem Inf Model Subsets: MEDLINE
Imprint Name(s):
Original Publication: Washington, D.C. : American Chemical Society, c2005-
Substance Nomenclature:
0 (Ligands)
0 (Macromolecular Substances)
0 (Proteins)
0 (Receptors, G-Protein-Coupled)
Entry Date(s):
Date Created: 20050728 Date Completed: 20060524 Latest Revision: 20061115
Update Code:
20250114
DOI:
10.1021/ci0500029
PMID:
16045274
Database:
MEDLINE

Weitere Informationen

A modified version of the k-means clustering algorithm was developed that is able to analyze large compound libraries. A distance threshold determined by plotting the sum of radii of leaf clusters was used as a termination criterion for the clustering process. Hierarchical trees were constructed that can be used to obtain an overview of the data distribution and inherent cluster structure. The approach is also applicable to ligand-based virtual screening with the aim to generate preferred screening collections or focused compound libraries. Retrospective analysis of two activity classes was performed: inhibitors of caspase 1 [interleukin 1 (IL1) cleaving enzyme, ICE] and glucocorticoid receptor ligands. The MDL Drug Data Report (MDDR) and Collection of Bioactive Reference Analogues (COBRA) databases served as the compound pool, for which binary trees were produced. Molecules were encoded by all Molecular Operating Environment 2D descriptors and topological pharmacophore atom types. Individual clusters were assessed for their purity and enrichment of actives belonging to the two ligand classes. Significant enrichment was observed in individual branches of the cluster tree. After clustering a combined database of MDDR, COBRA, and the SPECS catalog, it was possible to retrieve MDDR ICE inhibitors with new scaffolds using COBRA ICE inhibitors as seeds. A Java implementation of the clustering method is available via the Internet (http://www.modlab.de).