Dagmar Stumpfe,
Dilyana Dimova and
Jürgen Bajorath*
Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113 Bonn, Germany. E-mail: bajorath@bit.uni-bonn.de; Fax: +49-228-2699-341; Tel: +49-228-2699-306
First published on 27th March 2017
The concept of molecular scaffolds is widely applied in medicinal and computational chemistry to represent core structures of compounds and series. A hierarchical organization of compounds has long dominated scaffold design and generation. Recently, so called ‘analog series-based’ (ASB) scaffolds have been introduced as an alternative category of scaffolds. ASB scaffolds are designed to represent analog series and take reaction information into account, and do not follow a molecular hierarchy. We report a large-scale comparison of ASB scaffolds representing more than 15000 analog series with activity against more than 1200 targets and their corresponding hierarchical scaffolds. Most ASB and conventional hierarchical scaffolds were structurally distinct. However, many ASB scaffolds contained conventional scaffolds as substructures or shared smaller substructures with these scaffolds. Although ASB scaffolds and their corresponding hierarchical scaffolds often shared the same target annotations, ASB scaffolds further distinguished between closely related compound series with different activities that yielded the same conventional scaffolds. Taken together, the findings reported herein reveal that ASB scaffolds further extend current core structure representations for the analysis of structure–activity relationships.
Although scaffolds have been described and represented in different ways,1–3 most popular approaches have applied a hierarchical organization of compounds.5–7 The hierarchy distinguishes ring systems as core structures from substituents and aliphatic linker fragments and also involves molecular decomposition steps to reduce compounds to scaffolds and further abstract from scaffolds.5–7 Hierarchical organization of compounds and scaffolds taking activity annotations into account has enabled systematic SAR exploration and the identification and prioritization of molecular core structures for the design of new active compounds.6,7
The original hierarchical definition of scaffolds that has paved the way for systematic computer-aided exploration of scaffolds and become a mainstay in medicinal chemistry was introduced by Bemis and Murcko.5 According to this generally applicable definition, scaffolds are extracted from compounds by removing all substituents while retaining ring systems and linker fragments between rings. It follows that so defined Bemis and Murcko (BM) scaffolds must contain rings structures and that the addition of a ring to a compound generates a new scaffold. During the past two decades, BM scaffolds have become the gold standard for scaffold analysis in medicinal chemistry and chemoinformatics.
Recently, a new category of scaffolds has been introduced designed to complement the hierarchical view of scaffolds by further increasing their medicinal chemistry focus.8 These analog series-based (ASB) scaffolds are derived from series of analogs, i.e. multiple compounds, whereas BM scaffolds are obtained from individual compounds. In addition, ASB scaffolds are non-hierarchical and account for synthetic relationships between analogs.8 Thus, the design of ASB and BM scaffolds fundamentally differs and hence these scaffold categories are conceptually distinct.
In this work, we have systematically analyzed structural and activity relationships between ASB and BM scaffolds derived from a wide spectrum of bioactive compounds. The analysis uncovered a variety of relationships between these scaffold categories. In addition, ASB scaffolds were capable of distinguishing between different activities of closely related analog series, which was not possible on the basis of BM scaffolds. ASB scaffolds also revealed chemical modifications that rendered analogs active against different targets.
From all 51308 compounds yielding ASB scaffolds, Bemis and Murcko (BM) scaffolds were then extracted. All calculations were carried out using in-house Perl and Python scripts with the aid of KNIME14 protocols and the OpenEye chemistry toolkit.15
Different from ASB scaffolds, analog series (AS) can contain one or more BM scaffolds. For 7971 AS producing ASB scaffolds, a single BM scaffold was obtained, resulting in 6771 unique BM scaffolds. Accordingly, in these cases, there was a 1:1 correspondence of ASB and BM scaffolds. Furthermore, for 7654 AS with ASB scaffolds, multiple BM scaffolds were obtained, with two to 34 scaffolds per AS, yielding a total of 17830 unique BM scaffolds. Overall, only 3322 unique BM scaffolds (15.0%) corresponded to more than one ASB scaffold from different AS, indicating that ASB scaffolds captured series-specific chemical information. Otherwise, a higher degree of correspondence between BM and multiple ASB scaffolds would be anticipated.
Table 1 reports the results of systematic structural comparisons of ASB and BM scaffolds at the level of individual AS. If an AS produced multiple BM scaffolds, combinations of different relationships were also possible.
Structural relationship(s) | #ASs | % | |
---|---|---|---|
a The distribution of structural relationships between ASB and BM scaffolds according to Fig. 2 is reported. Relationships were detected on the basis of individual analog series (AS). Combinations of different relationships (e.g. 1 + 2) were possible if an AS yielded more than one BM scaffold. Three dominant relationships are shown in bold. | |||
1 | BM is a substructure of ASB | 5734 | 36.7 |
2 | BM and ASB share a smaller substructure | 5436 | 34.8 |
3 | BM and ASB are identical | 155 | 1.0 |
4 | ASB is a substructure of BM | 220 | 1.4 |
1 + 2 | BM is a substructure of ASB, BM and ASB share a smaller substructure | 3929 | 25.1 |
Other | 1 + 3, 2 + 3, 2 + 4, and 3 + 4 | 151 | 1.0 |
Identical ASB and BM scaffolds were only detected in 155 cases (1.0% of all ASB scaffolds), confirming that most ASB and BM scaffolds derived from the same AS were structurally distinct. However, in 5734 instances (36.7% of all ASB scaffolds), a BM scaffold was a substructure of the ASB scaffold. By contrast, the alternative scenario that an ASB scaffold was a substructure of a BM scaffold was only rarely observed (1.4%). Thus, more than a third of ASB scaffolds contained invariant substituents from AS that were removed when BM scaffold(s) were generated, as illustrated by example 1 in Fig. 2. If compounds comprising an AS contain conserved substituents, the ASB scaffold takes this information into account and – as a consequence – represents a higher degree of chemical exploration than a corresponding BM scaffold.
As a second dominant relationship, 5436 ASB and BM scaffolds (34.8%) shared a smaller substructure. In these cases, the BM scaffold had to contain at least one additional ring that was not conserved within the AS and therefore not contained in the corresponding ASB scaffold, as illustrated by example 2 in Fig. 2. This frequent relationship reflected a conceptual weakness of BM scaffolds for the representation of core structures: because the additional ring was not invariant, it was not part of the common core but rather a substituent distinguishing different analogs within the AS.
For AS yielding more than one BM scaffold, the combination of these relationships, i.e. a BM scaffold was a substructure of the ASB scaffold and another BM scaffold and the ASB scaffold shared a smaller substructure, was also frequently observed, with 3929 instances (25.1% of all ASB scaffolds). By contrast, combinations of other structural relationships were only rarely detected (Table 1).
Four different activity relationships between annotated ASB and BM scaffolds were examined. First, the ASB and one or more BM scaffolds might have identical target annotations (Fig. 3a). Second, at least one BM scaffold (originating from different AS) might have more target annotations than the ASB scaffold as shown, for example, in Fig. 3c and d. Third, the ASB scaffold might have more target annotations than at least one BM scaffold, as shown in Fig. 3b. Fourth, relationships were considered variable if at least one BM scaffold originating from multiple AS had more target annotations than a corresponding ASB scaffold and one or more other BM scaffolds had fewer annotations than the corresponding ASB scaffold.
Table 2 reports the distribution of these activity relationships between ASB and BM scaffolds. The majority of ASB scaffolds (70.0%) and corresponding BM scaffolds had identical target annotations including 7737 and 3202 ASB scaffolds associated with single- and multi-target activities, respectively. In addition, for 18.7% of all ASB scaffolds (1760 with single- and 1160 with multi-target activities), there was at least one corresponding BM scaffold with additional target annotations from different AS. Furthermore, 8.6% of ASB scaffolds had more target annotations than at least one corresponding BM scaffolds. Variable activity relationships with multiple BM scaffolds were only detected for 2.7% of the ASB scaffolds.
a The distribution of activity relationships between ASB and BM scaffolds is reported. Relationships were detected on the basis of individual analog series (AS). For each of four possible relationships, the number of ASB scaffolds and corresponding percentage of all ASB scaffolds are given. Relationships were considered variable if at least one BM scaffold originating from multiple AS had more target annotations than a corresponding ASB scaffold, whereas at least one other BM scaffold had fewer annotations than the corresponding ASB scaffold. | |
---|---|
1 | BM and ASB have identical target annotations |
• 10939 (7737 single-target AS and 3202 multi-target AS) | |
• 70.0% | |
2 | BM with more target annotations than ASB |
• 2920 (1760 single-target AS and 1160 multi-target AS) | |
• 18.7% | |
3 | ASB with more target annotations than BM(s) |
• 1350 (multi-target AS only) | |
• 8.6% | |
4 | Variable |
• 416 (multi-target AS only) | |
• 2.7% |
Thus, despite abundant structural differences between ASB and corresponding BM scaffolds, differences in target annotations were overall only limited at the level of individual AS, even when annotations for BM scaffolds originating from more than one AS were combined.
In Fig. 3c, two AS with activity against distinct enzymes are displayed that yielded the same benzimidazolidine BM scaffold. The ASB scaffold of each series contained the benzimidazolidine and specific substituents that were characteristic of each AS. These chemically more differentiated ASB scaffolds exclusively represented analogs that were either active against D-amino-acid oxidase or histone deacetylase 1. Similarly, in Fig. 3d, compounds from three AS are shown that were active against distinct targets but yielded the same BM scaffold. The ASB scaffold of each AS contained invariant substituents at different phenyl ring positions and distinguished between compounds of each AS and their specific activities. Hence, in these cases, multi-target SARs were resolved at the level of ASB scaffolds, which represented distinct core structures characteristic of related AS with different activities.
Herein we have presented a large-scale comparison of ASB and BM scaffolds to investigate relationships between scaffolds of different design and their utility for SAR exploration. To enable a direct comparison, corresponding ASB and BM scaffolds were extracted from more than 15000 analog series (AS) with activity against more than 1200 targets. The vast majority of corresponding ASB and BM scaffolds were structurally distinct but formed systematic structural relationships. However, about a third of all ASB scaffolds contained corresponding BM scaffolds as substructures and another third shared smaller substructures with BM scaffolds. These relationships and their combination involved nearly all ASB scaffolds (97%). We also found that the majority of ASB and BM scaffolds shared the same target annotations. However, ASB scaffolds typically provided a higher-resolution view of SARs than BM scaffolds and further differentiated between related AS with different activities sharing the same BM scaffold(s). Distinct ASB scaffolds of related AS exclusively represented compounds having the same activity.
Taken together, the results of our analysis suggest that ASB scaffolds represent an attractive extension of current core structure representations and further increase the utility of scaffolds for SAR exploration.
This journal is © The Royal Society of Chemistry 2017 |