Fragment-based screening can catalyze drug discovery by identifying novel scaffolds, but this approach is limited by the small chemical libraries studied by biophysical experiments and the challenging optimization process. To expand the explored chemical space, we employ structure-based docking to evaluate orders-of-magnitude larger libraries than those used in traditional fragment screening. We computationally dock a set of 14 million fragments to 8-oxoguanine DNA glycosylase (OGG1), a difficult drug target involved in cancer and inflammation, and evaluate 29 highly ranked compounds experimentally. Four of these bind to OGG1 and X-ray crystallography confirms the binding modes predicted by docking. Furthermore, we show how fragment elaboration using searches among billions of readily synthesizable compounds identifies submicromolar inhibitors with anti-inflammatory and anti-cancer effects in cells. Comparisons of virtual screening strategies to explore a chemical space of 1022 compounds illustrate that fragment-based design enables enumeration of all molecules relevant for inhibitor discovery. Virtual fragment screening is hence a highly efficient strategy for navigating the rapidly growing combinatorial libraries and can serve as a powerful tool to accelerate drug discovery efforts for challenging therapeutic targets.
Drug discovery efforts have generated millions of diverse compounds that are exploited in screening campaigns for novel therapeutic targets. However, high-throughput screening (HTS) of these libraries often yields low hit rates and weakly active compounds that require extensive optimization1. For these reasons, major focus has been put on development of techniques that can expand the accessible chemical space2,3,4. Advances in synthetic chemistry have enabled screens of increasingly larger chemical libraries and >30 billion compounds recently became available in make-on-demand catalogs5. These ultralarge compound collections are four orders of magnitude larger than those tested by HTS, providing new opportunities for drug discovery.
Two central questions for drug discovery are how to efficiently find chemical probes in ultralarge libraries, and to what extent access to these collections improves chemical space coverage. Emerging techniques such as DNA-encoded libraries and large-scale virtual docking have identified potent leads from screens of several billion drug-like compounds6,7,8. However, considering that the total number of drug-like molecules has been estimated to range between 1023 and 1060, these libraries still only cover a very small fraction of chemical space9,10. The fact that the number of possible molecules grows exponentially with molecular size supports the use of fragment-based screening to initiate drug discovery. In fragment-based lead discovery, libraries of low-molecular-weight compounds (typically <250 Da or 9-16 heavy atoms) are first screened for ligands that bind to subpockets of the target protein. Fragment hits bind weakly to the protein, yet form high-quality interactions with the binding site. In the second step, the affinity and selectivity of the fragments are improved by increasing the size and complexity of the molecules11,12,13. As the estimated number of fragment-like compounds is only in the order of 1011, chemical space is more efficiently sampled in fragment screens than will ever be possible using drug-like collections14,15.
Established techniques for fragment-screening, such as NMR, SPR, and X-ray crystallography are limited to collections of hundreds to thousands of molecules and restricted to physically available compounds11. The millions of fragments that are available in make-on-demand libraries are therefore inaccessible to traditional screening approaches. In contrast, structure-based virtual screening could rapidly evaluate large make-on-demand libraries and thereby reach further into chemical space. However, considering the small size and weak affinities of fragments, one potential caveat is that scoring functions may not have the accuracy needed to predict the affinities or binding modes of such compounds16,17,18,19. Fragment-based lead generation can also prove challenging and extensive chemical elaboration is generally required12. In this step, the >30 billion make-on-demand compounds could potentially be a valuable resource by providing access to readily available elaborations of fragments20. However, it is not clear if the number of fragment analogs and the diversity of these will be sufficient to enable successful fragment-to-lead optimization.
In this work, we explore the potential of vast chemical libraries using virtual fragment screening. Our approach is applied to discover inhibitors of 8-oxoguanine DNA glycosylase (OGG1), an enzyme that is part of the DNA damage response pathway. OGG1 recognizes the presence of the oxidized nucleobase 7,8-dihydro-8-oxoguanine (8-oxoG) in DNA and initiates repair by excision of the damaged base21. Recent studies demonstrate that inhibition of OGG1 is a promising strategy for the development of drugs against cancer and inflammation22,23,24. However, DNA-binding proteins are challenging drug targets due to their polar and flexible binding sites25,26,27 and only a few OGG1 inhibitors have been identified to date22,28,29,30. We dock ultralarge compound libraries to the OGG1 active site and evaluate top-ranked compounds experimentally to identify starting points for inhibitor development. Crystal structures of OGG1-fragment complexes combined with docking of tailored chemical libraries enable rapid discovery of potent inhibitors displaying efficacy in cell models of cancer and inflammation. In addition, comparisons of different virtual screening strategies reveal efficient routes to identify chemical probes in vast chemical libraries.
The determination of the crystal structure of mouse OGG1 in complex with a small molecule inhibitor (TH5675) enabled structure-based virtual screens for novel scaffolds (Fig. 1)22. At the time of the study, this was the only available crystal structure of OGG1 in complex with an inhibitor. A subsequently solved structure of inhibitor TH5487 showed that the active sites of the mouse and human OGG1 were close to identical23. TH5675 blocks the binding of the oxidized DNA substrates by occupying both the nucleobase- and furanose-binding regions of the active site. The active site is highly polar and adopts different shapes in the complexes with the inhibitor and DNA, which are properties characteristic of challenging drug targets (Fig. 1a, b)26. The molecular docking performance on the crystal structure was evaluated by redocking TH5675 to the active site and assessing if the scoring function could enrich inhibitors over property-matched decoys31. The docking calculations were performed using DOCK3.7, which successfully reproduced the binding mode of TH5675 and identified the inhibitor scaffold among decoys (Supplementary Fig. S1)32.
a Structure of human OGG1 bound to DNA (PDB accession code: 3KTU). OGG1 is shown as a purple cartoon with key residues depicted as sticks. The oligonucleotide is shown in orange and the fluorinated 8-oxoguanosine is depicted as ball and sticks. Hydrogen bonds are shown as dashed lines. b Structure of mouse OGG1 in complex with inhibitor TH5675 (PDB accession code: 6G3Y). c The ZINC15 libraries (14 million fragments or 235 million lead-like compounds) were docked to the OGG1 active site. d Compounds were evaluated in a thermal shift assay (differential scanning fluorimetry). e Compounds displaying thermal shift of OGG1 were evaluated in an enzyme inhibition assay. In this experiment, OGG1 cleaves 8-oxoadenine, followed by phosphodiester hydrolysis at the abasic site by the enzyme APE1, leading to disentanglement of the fluorophore-containing oligonucleotide. f Confirmation of the predicted binding modes by high-resolution crystal structures of mouse OGG1. The complexes predicted by docking (protein and fragments 1–4 are shown as white cartoons and green sticks, respectively) are depicted together with crystal structures (protein and fragments 1–4 are shown as purple cartoons and yellow sticks, respectively). Selected side chains are shown as sticks, and hydrogen bonds are shown as dashed lines. The accuracy of the predicted binding mode was quantified using the heavy atom root mean square deviation (RMSD) from the crystal pose.
Two ultralarge chemical libraries were docked to the OGG1 structure with the goal of identifying compounds binding to the nucleobase subpocket. The fragment-like (MW < 250 Da) library contained 14 million compounds and the lead-like library was composed of 235 million molecules that were larger in size and complexity (250 ≤ MW < 350 Da). The vast majority of these compounds have never been synthesized and were hence not available for testing by experimental screening methods. Each molecule in the two chemical libraries was represented by multiple conformations, which were scored in thousands of orientations in the active site. In total, 13 trillion fragments and 149 trillion lead-like complexes were evaluated by the docking scoring function. The top-ranked compounds from the screens were buried within the pocket occupied by the 4-iodo-phenylurea group of TH5675. Encouragingly, a large number of molecules containing similar motifs were among the top-ranked compounds. However, as such compounds were expected to bind to this pocket, we excluded all N-acylated six-membered arylamines from the library to bias the screen toward identification of novel chemotypes. For the fragment library, the 10000 top-ranked compounds (corresponding to 0.07% of the library) were clustered by topological similarity to identify a diverse set of candidate molecules. A set of 29 compounds was selected for experimental evaluation from the 500 top-ranked clusters based on visual inspection of the predicted complexes. Similarly, the 100000 top-ranked compounds from the lead-like library (corresponding to 0.05% of the library) were clustered and 36 compounds were selected from the 4000 top-ranked clusters. In the selection step, we inspected the complementarity of the compounds to the binding site and took into account contributions to ligand binding that are poorly described by the docking scoring function, in line with the standard practices of virtual screening33. Compounds were excluded from experimental evaluation based on widely-used criteria such as ligand strain, unsatisfied polar atoms in the binding site or the compound itself, and improbable tautomeric or ionization states34. In the selection of fragments, we focused primarily on compounds in the pocket occupied by the 4-iodo-phenylurea group of TH5675, which was the deepest cavity of the binding site and hence a promising anchoring point for an inhibitor. Whereas the fragments primarily occupied the deep cavity, the lead-like compounds were predicted to bind both in this pocket and other parts of the active site. The 65 selected compounds were available in make-on-demand catalogs and were successfully synthesized in 4–5 weeks (Supplementary Table S1). The compounds were first tested in a thermal shift assay based on differential scanning fluorimetry (DSF) at concentrations of 99 μM for lead-like compounds and 495 μM for fragments, which corresponded to conditions typically used in HTS and fragment screening campaigns, respectively35. Of these, compounds 1 and 2 from the fragment screen induced the largest shifts in thermal stability of OGG1 (2.8 and 1.6 K, respectively, Supplementary Table S2). In contrast, none of the lead-like molecules stabilized OGG1 significantly (∆Tm <1 K). Based on these results, further characterization was focused on the hits from the fragment screen (Supplementary Table S1).
Compounds 1, 2 and four additional fragments showing weaker stabilization of OGG1 (∆Tm ≥ 0.5 K, Supplementary Table S1) were evaluated using protein crystallography, resulting in high-resolution structures of four complexes. Structures of compounds 1 (PDB code: 7QEL), 2 (PDB code: 7ZG3), 3 (PDB code: 8CEX), and 4 (PDB code: 8CEY) bound to mouse OGG1 were determined at resolutions ranging from 2.0 to 2.5 Å (Supplementary Table S3). For each fragment, an unambiguous binding mode could be determined from the electron density (Supplementary Fig. S2). The fragments represented four different chemotypes and formed similar interactions with the binding site. The compounds were anchored in the binding site by heterocyclic rings connected to an amide group, which formed hydrogen bonds to Gly42 and Lys249, and hydrophobic rings were positioned in the pocket at the entrance of the binding site. Two distinct binding site conformations were stabilized by the fragments. Whereas compounds 2 and 3 stabilized a binding site conformation that was similar to the structure used in the docking screen, alternative side chain rotamers for residues Phe319 and His270 were observed in the complexes with compounds 1 and 4. The binding modes obtained by molecular docking agreed remarkably well with the crystallographic structures (Fig. 1f). Three predicted poses (compounds 1, 3, and 4) had root mean square deviations (RMSDs, ligand heavy atoms) to the crystal structure of less than 1 Å and all the key interactions with the binding site were captured in the fourth case (compound 2, RMSD = 2.2 Å).
Our fragment growing approach was based on docking of computationally generated chemical libraries, which were either obtained from searches among billions of make-on-demand compounds or via in silico reactions using commercial building blocks (Fig. 2). As fragment 1 induced the largest thermal shift in the DSF assay, we focused fragment expansion on this scaffold. Fragment 1 also inhibited OGG1 at high concentrations in an enzyme activity assay, but was too weak to determine an IC50 value (Fig. 2a).
a Docking of commercial chemical libraries containing 14 million (14 M) fragments led to identification of fragment 1, a weak OGG1 inhibitor. b Crystal structures of mouse OGG1 bound to inhibitors enabled increasingly complex chemical pattern searches in commercial make-on-demand libraries. By stepwise increasing the size of the fragment, compounds 5 and 7 (IC50 = 58 and 6.6 μM, respectively) were discovered and crystal structures of these were determined (PDB accession codes: 7Z5R and 7ZC7). c Suitable building blocks were retrieved using pattern matching based on the molecular topology of compound 7. A tailored virtual chemical library was constructed based on several coupling reactions, yielding 6720 products. Docking then guided selection of compounds from the virtual library, which were synthesized in-house and led to the discovery of compounds 17 and 23 (IC50 = 600 nM, Table 1). A crystal structure of mouse OGG1 in complex with compound 17 (PDB accession code: 7Z5B) confirmed the computationally designed hydrogen bond to Asp322. Structure-guided elaboration leading to compounds 17 and 23 resulted in a > 165-fold increase of inhibitory potency compared to fragment 1. Source data are provided as a Source Data file.
The potential for elaboration of fragment 1 was first explored by performing substructure searches in make-on-demand libraries for analogs of this compound. However, among the >11 billion available compounds (Enamine REAL database, version November 2019), there were only seven superstructures of fragment 1 and none of these could be docked successfully into the active site due to steric clashes. Based on analysis of the binding mode of fragment 1, we further optimized the search to consider all compounds containing the N-(tetrahydrobenzisoxazole-3-yl)-formamide core, which anchored the compound in the active site. In this case, 42937 make-on-demand molecules matched the search pattern and were docked to OGG1. Elaborations that did not preserve the binding mode of the fragment core were automatically discarded and the remaining top-ranked complexes were inspected. Compounds were selected based on their complementarity with the active site and the potential for growing towards other subpockets. Five iterations of optimization were performed and a total of 62 make-on-demand compounds were experimentally evaluated (Table 1 and Supplementary Table S4). Among the compounds from the first three iterations, 5 and 6 showed improved inhibitory potencies (58 and 36 μM, respectively). A crystal structure of compound 5 in complex with mouse OGG1 corroborated our binding mode predictions (PDB codes: 7Z5R, Fig. 2b), which positioned the core scaffold in the same pocket and orientation as observed for fragment 1. Compounds 5 and 6 both contained a benzyl group that provided a starting point for extending toward other subpockets. Pattern searches in make-on-demand libraries for compounds containing the same topology as 5 and 6 identified 905 compounds. Of the 28 experimentally evaluated compounds from this set, compound 7 showed the largest improvement of potency with an IC50 value of 6.6 μM and a thermal shift of 5.5 K at 99 μM in the DSF assay (Fig. 2b and Supplementary Table S5). A crystal structure of this inhibitor bound to mouse OGG1 (2.3 Å resolution, PDB code: 7ZC7) confirmed that the overall binding mode observed in the complexes with compounds 1 and 7 was maintained. Unexpectedly, the structure showed that the pyrazole ring of compound 7 extended into a subpocket formed by His270, Phe319, Leu323, and Asp322. The pyrazole formed pi-stacking interactions with His270, which disrupted its salt-bridge to Asp322, and this reorganization revealed opportunities for further potency optimization. However, searches for elaborations of compound 7 that could stabilize interactions in this pocket did not identify any suitable molecules in commercial chemical libraries. In total, a > 15-fold improvement of inhibitory potency compared to fragment 1 was achieved by optimization using make-on-demand chemical libraries.
As commercial chemical space lacked elaborations of our most potent inhibitor, further optimization was driven by reaction-based enumeration of virtual libraries using commercial building blocks. Reagents compatible with Chan-Lam, Suzuki, Ullmann, or amide couplings were identified and reacted in silico to afford molecules with similar topologies as compound 7 (Supplementary Fig. S3). The resulting synthetically accessible virtual library contained 6720 molecules that were not available in make-on-demand catalogs (Fig. 2c). Visual inspection of the complexes predicted by docking guided the selection of building blocks required to synthesize the target molecules. As the docking was performed to the active site conformation of OGG1 observed in the complex with inhibitor 7, several compounds were designed to extend into the additional subpocket identified in this structure. In total, 16 compounds (8-23, Supplementary Table S5) were successfully synthesized in-house and detailed synthetic procedures are provided in the Supplementary Information. Twelve compounds showed comparable or better potencies than compound 7 (Supplementary Table S5). Five of these were submicromolar inhibitors (17-20, and 23) and stabilized OGG1 by 7.4-9.0 K at 99 μM (Supplementary Table S5). Crystal structures of complexes were determined for two inhibitors (8 and 17). Compound 17 (IC50 = 600 nM, ∆Tm = 8.1 K at 99 μM) was designed to form an additional hydrogen bond to Asp322 by repositioning of a nitrogen atom in the five-membered aromatic ring and this interaction was confirmed by the crystal structure (2.4 Å resolution, PDB code: 7Z5B, Fig. 2c). The activity displayed by the most potent compounds (17 and 23, IC50 = 600 nM, Fig. 2c) corresponded to a > 165-fold improvement of activity compared to fragment 1. These compounds therefore exhibited inhibitor potencies comparable to TH5487 (IC50 = 340 nM, Fig. 3a). Notably, compounds 17 and 23 (ΔTm = 8.1 and 9.0 K, respectively) also showed greater thermal stabilization in the DSF assay than TH5487 (ΔTm = 4.3 K), which may be due to the inhibitors representing different scaffolds36. Finally, compounds 17 and 23 exhibited physicochemical properties that were closer to the ideal profile of a lead compound, such as lower molecular weight and LogP, and higher lipophilicity ligand efficiency (Supplementary Table S6)37,38,39.
Selectivity for OGG1 was assessed by measuring the inhibition of four other DNA glycosylases and base excision repair enzymes (APE1, NEIL1, MTH1 and SMUG1, see Methods)22,30. Five inhibitors (9, 11, 17, 18, and 23) were evaluated and three of these (9, 11, and 23) did not display any significant inhibition of the four enzymes (IC50 > 99 µM, Fig. 3b). For compound 23, inhibitor interaction with OGG1 in cells was evaluated in two assays. Target engagement was confirmed using a cellular thermal shift assay (CETSA), which indicated a strong thermal stabilization of OGG1 by compound 23 in cells (Fig. 3c). Recruitment of OGG1-GFP to laser-induced DNA damage sites was impaired in U2OS cells treated with compound 23, confirming an intracellular target engagement (Fig. 3d, e). In this assay, compound 23 showed activity comparable to the reference inhibitor TH5487.
a Chemical structures of OGG1 inhibitors TH5487 and compound 23. b Inhibition of DNA repair enzymes (OGG1, SMUG1, APE1, NEIL1, MTH1, see Methods for abbreviations) by five OGG1 inhibitors. Mean IC50 values from three independent experiments are shown. Grey boxes indicate IC50 > 99 μM. c Shift in thermal stabilization of OGG1 in HL60 cells treated with OGG1 inhibitors. Mean values ± SEM from three independent experiments (dot plots) are shown. d Maximum fluorescence intensity of OGG1-GFP accumulation at laser-induced DNA damage sites after indicated treatments. Mean values of 15 cells for each condition from three independent experiments are shown. Statistical significance was determined using One-way Anova (p = 0.0092 for 10 μM compound 23, p < 0.0001 for 50 μM compound 23, and p = 0.0048 for TH5487). e Recruitment kinetics of OGG1-GFP to laser-induced DNA damage sites in U2OS cells after 1 h pre-treatment with compound 23 and TH5487. Results of 15 cells for each condition from three independent experiments are shown ± SEM. f Anti-inflammatory effect of TH5487 and compound 23 in NF-κB activation assay. Mean values ± SD from three independent experiments are shown. g The percentage cell viability of transformed and non-transformed cell lines upon treatment with different OGG1 inhibitors (100 μM). A2780, A549, and HCT116 represent ovarian, lung and colon cancer, respectively, and BJhTERT is a non-transformed control cell line. Mean values from two independent experiments are shown. h Cytotoxicity of TH5487 and compound 11 (Table 1) in A2780 and BJhTERT cell lines. Mean values from two independent experiments are shown. Source data are provided as a Source Data file.
The anti-inflammatory and anti-cancer effects of the OGG1 inhibitors were evaluated in disease-relevant cell models. Inhibition of OGG1 limits binding to 8-oxoG in G-rich promoters of pro-inflammatory genes, which impedes the loading of the nuclear factor kappa-light-chain-enhancer of activated B cells (NF-κB) transcription factor. By blocking pro-inflammatory gene transcription, downstream inflammatory responses are attenuated21. The anti-inflammatory effect was evaluated in a TNF-α-induced NF-κB activation assay in HEK293T pGF NF-κB cells (Fig. 3f). All the tested compounds (8-12, 14, and 17-23) showed dose-dependent inhibition of the inflammatory effect with EC50 values ranging from 6.3 to 36 μM (Supplementary Table S5), and compound 23 was the most potent compound (EC50 = 6.3 μM).
Cancer cells inherently have elevated levels of DNA damage, which can be attributed to disturbed redox homeostasis or oncogene-induced replication stress. For this reason, cancer cells are heavily reliant on functional DNA damage response and repair pathways40. We determined effects on the cell viability of three cancer cell lines (A2780, A549, and HCT116; representing ovarian, lung and colon cancer, respectively) and one non-transformed control cell line (BJhTERT, immortalized normal fibroblasts) for a subset of our OGG1 inhibitors (compounds 8-12, 14, 17-23). Compound 11 induced the largest loss of viability in A2780 cancer cells (EC50 = 5.4 μM) and was well tolerated (EC50 > 70 μM) in non-transformed cells (Fig. 3g, h).
Compounds 17 and 23 were also evaluated for their in vitro pharmacokinetic properties, with the previously discovered inhibitor TH5487 used as a reference (Supplementary Table S6). Compound 17 had slightly better cell permeability than TH5487 and compound 23. In human liver microsomes, compounds 17 and TH5487 exhibited high metabolic stability (intrinsic clearance CLint = 23.9 and 4.6 μL/min/mg, respectively) whereas compound 23 was moderately stable (CLint = 39.7 μL/min/mg). However, the low CLint value of TH5487 is likely a consequence of the very high protein binding of this compound, as compared to 17 and 23. Previous studies have identified high plasma protein binding and low solubility as a major limitation of TH548722. These observations were confirmed in our ADME assays, which also showed that compounds 17 and 23 have considerably better properties. The thermodynamic solubility of compound 17 (667 μM) was >600-fold higher than for TH5487 (< 1 μM), which would be expected to precipitate at the highest concentrations used in biological experiments. In addition, TH5487 had a very low unbound fraction in the plasma protein binding assay (fu = 0.1%), which was consistent with the reduction of inhibitory potency observed in the presence of serum albumin23. In contrast, 17 and 23 exhibited lower plasma protein binding levels (fu = 1.4 and 4.8%, respectively), i.e. the compound concentration available to inhibit OGG1 could be considerably higher in vivo compared to TH5487 at the same dose.
To assess the efficiency of different virtual screening strategies in more general terms, we explored paths in chemical space to identify our OGG1 inhibitors. The chemical space coverage of the fragment-based approach and virtual screens of multi-billion-scale libraries containing lead-like compounds were compared. As docking of several billion compounds is now feasible41,42, we quantified what coverage of chemical space could be achieved with libraries of this size and assessed the impact of access to structural data during fragment elaboration.
We first analyzed the chemical space containing all molecules up to the size of the lead-like compound 17 (26 heavy atoms) and a fragment representing the core scaffold of the inhibitor (13 heavy atoms) (Fig. 4a). In order to estimate the size of each chemical space, we implemented an open-source version of the Generated Database (GDB) algorithm, which can enumerate all chemically stable molecules composed of H, C, N, O, S, and halogen atoms14. The algorithm was first used to generate all chemically stable molecules containing up to 11 heavy atoms, and this library contained 21.2 million unique compounds, which is in agreement with previous estimates (Supplementary Table S7)14. Due to the combinatorial explosion of chemical space for larger molecules, the size of the libraries with up to 13 and 26 heavy atoms were calculated by extrapolation15, resulting in estimates of 109 and 1022 molecules for the fragment and lead-like space, respectively. A docking screen of one billion compounds could hence evaluate every possible fragment but is restricted to only 10-11% of the lead-like chemical space.
a Strategies to discover compound 17 based on screens of chemical libraries were analyzed. Screens of large chemical libraries with up to 26 heavy atoms (HAs) were compared to a fragment-based design approach using the tetrahydrobenzisoxazole core of compound 17 as a starting point. b Activated substituents can be attached to a core scaffold through five distinct mechanisms: introduction of a single (sand vectors), double (green vectors), or triple bond substituent (purple vectors), fusion of two ring fragments (blue vectors), and the spiro-cyclization of two ring fragments (orange vectors). c Generation and activation of all chemically stable molecules containing up to 11 heavy atoms. Curves represent the number of unique compounds in successive steps of molecule differentiation. d Generation of a superstructure by introducing substituents with a specified numbers of heavy atoms onto an activated scaffold. e Access to protein-ligand complexes enables exclusion of unsuitable (red) growing vectors due to steric hindrance (orange surface). Regions of the binding pocket accessible to suitable (blue) growing vectors are depicted as a purple surface. f Bar chart of estimated sizes of chemical spaces. The theoretical chemical spaces with up to 26 and 13 heavy atoms contain 1022 (orange) and 109 (red) compounds. The core scaffold of compound 17 was estimated to have 1013 elaborations (raspberry), of which 109 were compatible with the binding site (purple). Source data are provided as a Source Data file.
To further explore the fragment-based path to identify lead-like compound 17, we developed the UniverseGenerator tool to generate all possible elaborations of a specific scaffold using our database of enumerated substituents. To construct a chemical space with a common-substructure, the library containing all possible molecules with up to 11 heavy atoms was prepared for connection onto a scaffold using activation-tags (Fig. 4b and Supplementary Fig. S4). Substituents could be attached through five distinct mechanisms: introduction of a single, double, or triple bond substituent, fusion of two ring structures, and the spiro-cyclization of two ring structures. In total, the 21.2 million substituents could be attached to a scaffold in more than 385 million distinct arrangements (Fig. 4c). Analogous to the activation of substituents, connection sites (i.e. growing vectors) on the fragment scaffold were first identified, followed by combinatorial enumerations to find all the 7154 unique configurations in which multiple substituents could be introduced (Supplementary Fig. S5). To finally construct all superstructures, an atom-distributing algorithm determined all possible combinations to attach differently sized substituents over a particular set of growing vectors (Fig. 4d and Supplementary Fig. S6). We then explicitly generated compounds with up to 8 additional heavy atoms and estimated by extrapolation that the fragment has in the order of 1013 superstructures with up to 26 heavy atoms. To estimate the impact of access to structural information, we introduced elaboration constraints based on visual inspection of protein-fragment complexes determined for OGG1. Excluding growing vectors leading to steric clashes with the binding site (Fig. 4e) reduced the number of superstructures by four orders of magnitude, resulting in a database size (109 compounds) that is feasible to evaluate explicitly by molecular docking (Fig. 4f).
By combining the results of our analysis of the theoretical chemical space, we can compare the two virtual screening strategies. What are the odds of identifying our lead-like OGG1 inhibitor? By using a fragment-based approach, all theoretically possible fragment-like molecules up to 13 heavy atoms (one billion compounds) and every relevant elaboration of a single scaffold (an additional one billion compounds) could be evaluated by docking of two billion compounds in two steps. In contrast, a selection of two billion compounds from the lead-like chemical space (1022) cannot be expected to contain any relevant representatives of this scaffold.
We assessed if our approach could rapidly identify promising compounds of other drug targets. Three unrelated protein targets linked to cancer or inflammation (SMYD3, NUDT5, and PHIP) were selected43,44,45. Experimental fragment screening had been conducted for each target, and crystal structures of protein-fragment complexes were available. For each selected drug target, we evaluated if make-on-demand chemical libraries could enable efficient fragment elaboration (Fig. 5a).
a For each of the three targets (SMYD3, NUDT5, and PHIP), a SMARTS chemical pattern was generated based on the bound fragment, followed by searches in make-on-demand libraries containing billions of compounds for superstructures. The subset of matching compounds that was compatible with the accessible vectors was also identified (constrained superstructures). For example, moieties forming key hydrogens in the crystal structure of the protein-fragment complex (highlighted in green) were maintained in the constrained set. Molecular docking of the matching compounds enabled identification of suitable candidates for synthesis and experimental evaluation. b Normalized frequency distributions of molecular docking scores for SMYD3, NUDT5, and PHIP. Docking scores of the fragments are represented by dashed lines whereas scores of random molecules, superstructures, constrained superstructures, and superstructures that maintain the fragment binding mode are represented by blue, green, yellow and red curves, respectively. c Examples of predicted binding modes of elaborated fragments. The experimentally determined protein structures and bound fragments are shown as purple cartoons and yellow sticks, respectively. Key hydrogen bonds between the protein and fragments are indicated as yellow dashed lines. The predicted binding modes of top-scoring superstructures are shown as green sticks.
The chemical structures of the co-crystallized fragments were used to perform substructure searches in the make-on-demand libraries, followed by docking of the matching compounds to the binding sites. Whereas only a small number of analogs was available in stock for each target (3-212 compounds), large sets of readily synthesizable fragment elaborations were identified by searches in make-on-demand libraries (9914-737407 compounds, Supplementary Table S8). Many of these had improved docking scores compared to the initial fragment (Fig. 5b) with maintained overall binding mode (814-45679 viable analogs, Supplementary Table S8). As in the case of OGG1, more efficient retrieval of suitable elaborations could be achieved by tailoring the search pattern through the exclusion of growing vectors incompatible with the binding site (Fig. 5b, c, Supplementary Table S8). For each target, compounds with both improved energy and maintained binding mode were retained while the number of compounds to dock was reduced, e.g. by 145-fold for NUDT5. For comparison, we also docked chemical libraries containing random molecules to each target, and substantially worse docking scores were typically obtained for these sets. The fragment-based approach had 6- to 125-fold higher enrichment of compounds with improved docking scores (Supplementary Table S8). Our approach therefore enables rapid identification of the most promising fragment elaborations for experimental evaluation.
Our efforts to identify enzyme inhibitors by structure-based virtual screening of vast chemical libraries resulted in three main observations. First, molecular docking of 14 million fragments identified four compounds binding to the OGG1 active site and crystal structures of complexes confirmed the computational predictions. Second, fragment elaboration was guided by docking calculations that made effective use of virtual libraries containing billions of readily synthesizable compounds. The combination of molecular docking and crystal structure determination led to selective submicromolar inhibitors with anti-inflammatory and anti-cancer activity. Finally, the results of the prospective virtual screens combined with analyses of the theoretical chemical space demonstrate the efficiency of a fragment-based docking approach.
The number of commercially available compounds continues to increase rapidly and the computational cost to perform virtual screens of these databases is becoming prohibitive. However, it should be noted that the fraction of the libraries that is most subject to inflation is populated by drug-like molecules. In fact, 97% of the >30 billion compounds in the largest make-on-demand library have a molecular weight exceeding 350 Da. In contrast, there are only 50 million fragments in the database (MW < 250 Da), which are feasible to screen with available computational resources. Focusing on fragments will not only lead to substantially better chemical space coverage, but should also improve hit rates because drug-like molecules are less likely to bind due to their intrinsically higher molecular complexity46. In agreement with these ideas, we did not identify any OGG1 inhibitors with relevant activity from our docking screen of 235 million lead-like compounds. This library only contained two lead-like superstructures of the most potent fragment and none of these were top-ranked, illustrating that coverage of specific scaffolds is still limited in these libraries. The fragment docking campaign identified four diverse ligands and each of these represent starting points for development of unique inhibitor scaffolds.
Fragment-based drug discovery relies on efficient elaboration of hits, which can require synthesis of a large number of analogs12. The make-on-demand libraries enable rapid design-make-test cycles, but despite the billions of purchasable compounds, the availability of analogs to screening hits can still be scarce. As only a few elaborations of fragment 1 were commercially available, we used the predicted binding mode to identify the scaffold forming key interactions with the active site. Searches based on these molecular patterns enabled us to navigate to relevant regions of commercial chemical space. However, as the size and complexity of the optimized inhibitors increased, the number of relevant analogs in commercial space dwindled. At this point, chemical space coverage was further increased by generating tailored libraries of analogs made from accessible building blocks and this strategy proved efficient. Only 16 compounds were synthesized in-house to identify submicromolar OGG1 inhibitors, corresponding to a > 165-fold increase of potency compared to fragment 1.
The efficiency of our virtual screening approach is also illustrated by comparing to previous drug discovery efforts for OGG1 based on traditional approaches22. In the campaign by Visnes et al., two inhibitors representing a single scaffold were discovered from an HTS, corresponding to a hit rate of 0.01%. This was followed by extensive in-house medicinal chemistry efforts without any access to enzyme-inhibitor structures, leading to the discovery of the submicromolar inhibitor TH5487. In total, this drug discovery campaign involved the evaluation of >18,100 compounds experimentally. In comparison, we tested only 29 compounds from the fragment docking in experimental assays. As expected, the discovered fragments were less potent than the HTS hits, but represented four distinct scaffolds that each could serve as starting-points for optimization. Only 78 compounds, of which a majority originated from readily available make-on-demand libraries, were then synthesized to identify our submicromolar inhibitors. Our strategy therefore yielded a > 1000-fold higher screening hit rate and required evaluation of >150-fold fewer compounds to identify inhibitors with potencies comparable to TH5487. Furthermore, our inhibitors also had more promising physicochemical and in vitro pharmacokinetic properties, such as considerably higher solubility and lower plasma protein binding. Together, these results highlight the advantages of our virtual screening technique and are consistent with the idea that a fragment-based approach results in lead compounds of higher quality compared to traditional approaches35,37.
One of the central aims of this study was to compare strategies to screen the rapidly growing space of readily synthesizable molecules, which will surpass trillions in the near future. To address this question, we developed the UniverseGenerator, which enabled us to estimate the size of theoretical libraries containing all possible stable compounds. Out of the 1013 possible compounds representing our scaffold, we evaluated only a few thousand computationally and tested less than 100 experimentally, illustrating the limited coverage of commercial catalogs. Our chemical space analysis demonstrates that molecular docking enables evaluation of all theoretically possible fragments up to 13 heavy atoms, whereas virtual screens of libraries with billions of drug-like compounds (e.g. from make-on-demand catalogs) remain limited to sampling infinitesimal fractions of chemical space. By leveraging structural information, it is even possible to computationally evaluate all relevant elaborations of a fragment, which may become accessible for experimental testing as methods to synthesize complex molecules improve.
The use of structure-based virtual screening to identify and optimize fragments has been debated16,17,18,19, and our study reveals strengths and weaknesses of this approach. In contrast to common belief, our results support the use of molecular docking as a tool to identify fragment ligands. Crystal structures revealed that the binding modes predicted by docking were strikingly accurate, and the hit rate was 5 to 10-fold higher than experimental fragment screening27. Similar observations were made in docking screens of fragment libraries for ligands of viral enzymes and G protein-coupled receptors47,48,49,50. Notably, our results from the fragment optimization step highlighted a well-known problem of molecular docking: the lack of binding site flexibility. Several distinct conformations of the active site were observed in crystal structures solved with different fragments. In two of these structures, the subpocket that our most potent inhibitor occupies was blocked due to changes in the side chain conformations. As the docking algorithm did not take protein flexibility into account, access to multiple structures of OGG1 representing different active site conformations was crucial for successful fragment elaboration. Fragment expansion should therefore consider several structures of the target if these are available or be performed using a flexible receptor algorithm51.
The rapid growth of make-on-demand chemical libraries has stimulated the development of several novel virtual screening strategies20,52,53,54,55. For example, the V-SYNTHES technique uses an iterative approach to screen combinatorial compound libraries containing billions of compounds. Instead of screening all the compounds in the library, fragment-sized compounds representing the scaffolds available in the library are first docked to the binding site. In the second step, the top-scoring fragments are identified and libraries of larger molecules representing the same scaffold are docked to identify compounds for experimental evaluation. Our strategy and V-SYNTHES both employ a divide-and-conquer approach, which is a cornerstone of fragment-based drug discovery, and have complementary advantages. In the first step, both methods dock libraries of fragment-like compounds to the binding site. Whereas our approach uses commercially available libraries, V-SYNTHES is based on computationally generated fragments. A key benefit of our strategy is that false positives are identified experimentally at this stage, allowing us to focus on fragments with verified activity. In contrast, V-SYNTHES needs to rely on the accuracy of the docking scoring function. In the second step, V-SYNTHES depends on the predicted binding modes, whereas we iteratively solved structures of protein-ligand complexes. For OGG1, access to multiple experimental structures was essential because the challenging target binding site exhibited considerable induced-fit effects. Finally, V-SYNTHES is constrained to predict compounds that are available in the combinatorial library, which we found was a limitation in the optimization of the OGG1 inhibitors. The creation of tailored chemical libraries was required to obtain submicromolar OGG1 inhibitors, which has also been the case for other challenging targets7. Our strategy may be more suitable for difficult targets with flexible binding sites, but does require access to crystallography and sensitive experimental methods to detect weak binding. V-SYNTHES is preferable for targets with more well-defined binding sites and if structure determination is challenging, such as in the case of G protein-coupled receptors.
Access to chemical probes is essential for understanding biological systems and can enable drug discovery for novel therapeutic targets. The DNA repair enzyme OGG1 has been identified as an interesting drug target, but only a few inhibitors have been identified despite considerable efforts22,28,29,30. Here, we demonstrate how virtual fragment screening can rapidly identify inhibitors of challenging targets and achieve potencies sufficient to demonstrate activity in cell models. Our most advanced inhibitors are attractive leads for further development and can accelerate drug discovery efforts for this emerging target.
Human plasma was collected from two voluntary healthy blood donors (non-smoking, citric acid) as part of routine procedures by Uppsala Academic Hospital. Written informed consent was obtained from both donors at time of collection. Human plasma samples were pooled before use in this study. As all human samples were anonymised, IRB approval at Uppsala University was not required.
A crystal structure (PDB accession code: 6G3Y, chain A) of OGG1 bound to an inhibitor (TH5675) was used in the docking screens22. Crystallographic waters and other solvent molecules were removed from the structure, except for the binding site water molecule with residue number 504. This water molecule was deeply buried and coordinated by several polar protein atoms, suggesting that displacing it would be challenging. Therefore, the water was treated as part of the binding site, enabling docked compounds to form hydrogen bonds with the oxygen atom. The atoms of TH5675, with the exception of the amino-benzimidazolone ring system, were used to generate 45 matching spheres in the deepest subpocket of the active site. DOCK3.7 uses a flexible ligand algorithm that superimposes rigid segments of a molecule’s pre-calculated conformational ensemble on top of the matching spheres32. Histidine protonation states were assigned manually based on visual inspection of local hydrogen bonding networks. For example, His54 was protonated at the Nδ atom because of the hydrogen bonding interactions with the backbone carbonyl of Pro52 and amide proton of Leu16. Histidines 10, 13, 54, 97, 112, 179, 185, 195, 270, 276, and 282 were protonated at the Nδ atom, whereas histidines 119 and 237 were protonated at the Nε atom. The remainder of the enzyme structure was protonated by REDUCE56 and assigned AMBER57 united atom charges. The dipole moments of polar residues involved in recognition of TH5675 were increased to favor interactions with these. This technique is common practice for users of DOCK3.7 to improve docking performance and has been used in previous virtual screens33. The partial atomic charges of the backbone amide of residue Gly42 were increased without changing the net charge of the residue. The atoms of the co-crystallized inhibitor were used to create a set of low protein dielectric spheres within 2 Å of the ligand and located near the protein surface to define the boundary between solute and solvent. Scoring grids were pre-calculated using QNIFFT58 for Poisson-Boltzmann electrostatic energies, SOLVMAP59 for ligand desolvation energies, and CHEMGRID60 for AMBER van der Waals energies. Property-matched decoys of OGG1 ligands were generated using in-house scripts31. The obtained control sets were used to evaluate the performance of the docking grids by means of ligands-over-decoys enrichments. Enrichment values and predicted poses of ligands were used to select the final grid parameters.
DOCK3.7 was used to dock the fragment (MW ≤ 250 Da) and lead-like (250 Da >900 rpm) on a Kisker rotational incubator at 37 °C for 4 h to achieve equilibrium. Prior to LC-MS/MS analysis the plasma and buffer samples were treated with the addition of Methanol (1:3) containing Warfarin as an internal standard to precipitate proteins. The standard curve was created using the plasma standard. The plate was then sealed, and centrifuged and the supernatant was analyzed by liquid chromatography coupled to triple quadrupole mass spectrometry (LC-MS/MS).
Caco-2 cell monolayers (passage 94-105) were grown on permeable filter support and used for transport study on day 21 after seeding. Prior to the experiment a drug solution of 10 µM was prepared and warmed to 37 °C. The Caco-2 filters were washed with pre-warmed HBSS prior to the experiment, and thereafter the experiment was started by applying the donor solution on the apical or basolateral side. The transport experiments were carried out at pH 7.4 in both the apical and basolateral chamber. The experiments were performed at 37 °C and with a stirring rate of 500 rpm. The receiver compartment was sampled at 15, 30, and 60 min, and at 60 min also a final sample from the donor chamber was taken to calculate the mass balance of the compound. The samples (100 µL) were transferred to a 96-well plate containing 100 µL methanol and Warfarin as IS and was sealed until analyzed by liquid chromatography coupled to triple quadrupole mass spectrometry (LC-MS/MS).
The test compounds were optimized on a Waters Acquity UPLC XEVO TQ-S microsystem (Waters Corp.) operating in multiple reaction monitoring (MRM) mode with positive or negative electrospray ionization. Compounds were optimized by using the QuanOptimize software (Waters Corp.). The following MS conditions were used:
Transition m/z
Dwell time (s)
Cone voltage
Collision energy
324.4 > 128.04
0.028
10
60
324.4 > 171.03
0.028
10
50
For chromatographic separation, a C18 BEH 1.7 µm column was used with a general gradient of 5% to 1000% of mobile phase B over a total running time of 2 min. Mobile phase A consisted of 0.1% formic acid in purified water, and mobile phase B of 0.1% formic acid in 100% acetonitrile. The flow rate was set to 0.5 mL/min and 5 µL of the sample was injected.
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
The ZINC15 library is available at https://zinc15.docking.org. The PDB entry for the mOGG1 crystal structure used for molecular docking calculations is 6G3Y. All compounds tested are listed in in the Supplementary Information and Source Data File. Chemical identities, purities (LC/MS), yields and spectroscopic analysis (1H,13C NMR) for active compounds are provided in Supplementary Information. The crystallographic data generated in this study have been deposited in the PDB database under accession codes 7QEL, 7ZG3, 8CEX, 8CEY, 7Z5R, 7ZC7, 7Z3Y, and 7Z5B. Source data are provided with this paper.
The UniverseGenerator source code is freely available and can be found on the following GitHub repository (https://github.com/carlssonlab/UniverseGenerator). The original code has deposited on Zenodo (https://zenodo.org/records/14460126). Other scripts to process molecular docking results can be found on the following GitHub repository (https://github.com/carlssonlab/frag2lead).
Macarron, R. et al. Impact of high-throughput screening in biomedical research. Nat. Rev. Drug Discov. 10, 188–195 (2011).
Article CAS PubMed Google Scholar
Grygorenko, O. O. et al. Generating multibillion chemical space of readily accessible screening compounds. iScience 23, 101681 (2020).
Article ADS CAS MATH PubMed PubMed Central Google Scholar
Goodnow, R. A., Dumelin, C. E. & Keefe, A. D. DNA-encoded chemistry: enabling the deeper sampling of chemical space. Nat. Rev. Drug Discov. 16, 131–147 (2017).
Article CAS MATH PubMed Google Scholar
Hoffmann, T. & Gastreich, M. The next level in chemical space navigation: going far beyond enumerable compound libraries. Drug Discov. Today 24, 1148–1156 (2019).
Article CAS PubMed Google Scholar
Korn, M., Ehrt, C., Ruggiu, F., Gastreich, M. & Rarey, M. Navigating large chemical spaces in early-phase drug discovery. Curr. Opin. Struct. Biol. 80, 102578 (2023).
Article CAS PubMed Google Scholar
Lyu, J. et al. Ultra-large library docking for discovering new chemotypes. Nature 566, 224–229 (2019).
Article ADS CAS MATH PubMed PubMed Central Google Scholar
Luttens, A. et al. Ultralarge virtual screening identifies SARS-CoV-2 main protease inhibitors with broad-spectrum activity against coronaviruses. J. Am. Chem. Soc. 144, 2905–2920 (2022).
Article CAS MATH PubMed PubMed Central Google Scholar
Gironda-Martínez, A., Donckele, E. J., Samain, F. & Neri, D. DNA-encoded chemical libraries: A comprehensive review with succesful stories and future challenges. ACS Pharmacol. Transl. Sci. 4, 1265–1279 (2021).
Article PubMed PubMed Central Google Scholar
Bohacek, R. S., McMartin, C. & Guida, W. C. The art and practice of structure-based drug design: a molecular modeling perspective. Med. Res. Rev. 16, 3–50 (1996).
<3::AID-MED1>3.0.CO;2-6" data-track-item_id="10.1002/(SICI)1098-1128(199601)16:1<3::AID-MED1>3.0.CO;2-6" data-track-value="article reference" data-track-action="article reference" href="https://doi.org/10.1002%2F%28SICI%291098-1128%28199601%2916%3A1%3C3%3A%3AAID-MED1%3E3.0.CO%3B2-6" aria-label="Article reference 9" data-doi="10.1002/(SICI)1098-1128(199601)16:1<3::AID-MED1>3.0.CO;2-6">Article CAS MATH PubMed Google Scholar
Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 46, 3–26 (2001).
Article CAS PubMed Google Scholar
Erlanson, D. et al. Twenty years on: the impact of fragments on drug discovery. Nat. Rev. Drug Discov. 15, 605–619 (2016).
Article CAS MATH PubMed Google Scholar
Ferenczy, G. G. & Keserű, G. M. How are fragments optimized? A retrospective analysis of 145 fragment optimizations. J. Med. Chem. 56, 2478–2486 (2013).
Article CAS MATH PubMed Google Scholar
Woodhead, A. J. et al. Fragment-to-lead medicinal chemistry publications in 2022. J. Med. Chem. 67, 2287–2304 (2024).
Article CAS MATH PubMed Google Scholar
Ruddigkeit, L., van Deursen, R., Blum, L. C. & Reymond, J. L. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J. Chem. Inf. Model. 52, 2864–2875 (2012).
Article CAS PubMed Google Scholar
Polishchuk, P. G., Madzhidov, T. I. & Varnek, A. Estimation of the size of drug-like chemical space based on GDB-17 data. J. Comput. Aided Mol. Des. 27, 675–679 (2013).
Article ADS CAS PubMed Google Scholar
Chen, Y. & Shoichet, B. K. Molecular docking and ligand specificity in fragment-based inhibitor discovery. Nat. Chem. Biol. 5, 358–364 (2009).
Article CAS PubMed PubMed Central Google Scholar
Teotico, D. G. et al. Docking for fragment inhibitors of AmpC beta-lactamase. Proc. Natl. Acad. Sci. USA 106, 7455–7460 (2009).
Article ADS CAS MATH PubMed PubMed Central Google Scholar
Verdonk, M. L. et al. Docking performance of fragments and druglike compounds. J. Med. Chem. 54, 5422–5431 (2011).
Article CAS MATH PubMed Google Scholar
Hubbard, R. E., Chen, I. & Davis, B. Informatics and modeling challenges in fragment-based drug discovery. Curr. Opin. Drug Discov. Devel. 10, 289–297 (2007).
CAS MATH PubMed Google Scholar
Metz, A. et al. Frag4Lead: growing crystallographic fragment hits by catalog using fragment-guided template docking. Acta Crystallogr. D Struct. Biol. 77, 1168–1182 (2021).
Article ADS CAS MATH PubMed PubMed Central Google Scholar
Bruner, S. D., Norman, D. P. & Verdine, G. L. Structural basis for recognition and repair of the endogenous mutagen 8-oxoguanine in DNA. Nature 403, 859–866 (2000).
Article ADS CAS PubMed Google Scholar
Visnes, T. et al. Small-molecule inhibitor of OGG1 suppresses proinflammatory gene expression and inflammation. Science 362, 834–839 (2018).
Article ADS CAS MATH PubMed PubMed Central Google Scholar
Visnes, T. et al. Targeting OGG1 arrests cancer cell proliferation by inducing replication stress. Nucleic Acids Res. 48, 12234–12251 (2020).
Article CAS PubMed PubMed Central Google Scholar
Tanner, L. et al. Small-molecule-mediated OGG1 inhibition attenuates pulmonary inflammation and lung fibrosis in a murine lung fibrosis model. Nat. Commun. 14, 643 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Bushweller, J. H. Targeting transcription factors in cancer — from undruggable to reality. Nat. Rev. Cancer 19, 611–624 (2019).
Article CAS MATH PubMed PubMed Central Google Scholar
Radaeva, M., Ton, A.-T., Hsing, M., Ban, F. & Cherkasov, A. Drugging the ‘undruggable’. Therapeutic targeting of protein–DNA interactions with the use of computer-aided drug discovery methods. Drug Discov. Today 26, 2660–2679 (2021).
Article CAS PubMed Google Scholar
Michel, M. et al. Druggability assessment of the NUDIX hydrolase protein family as a workflow for target prioritization. Front. Chem. 8, 443 (2020).
Article ADS CAS MATH PubMed PubMed Central Google Scholar
Tahara, Y. K. et al. Potent and selective inhibitors of 8-oxoguanine DNA glycosylase. J. Am. Chem. Soc. 140, 2105–2114 (2018).
Article CAS MATH PubMed PubMed Central Google Scholar
Donley, N. et al. Small molecule inhibitors of 8-oxoguanine DNA glycosylase-1 (OGG1). ACS Chem. Biol. 10, 2334–2343 (2015).
Article CAS MATH PubMed PubMed Central Google Scholar
Wallner, O. et al. Optimization of n-piperidinyl-benzimidazolone derivatives as potent and selective inhibitors of 8-oxo-guanine DNA glycosylase 1. ChemMedChem 18, e202200310 (2023).
Article CAS MATH PubMed Google Scholar
Mysinger, M. M., Carchia, M., Irwin, J. J. & Shoichet, B. K. Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking. J. Med. Chem. 55, 6582–6594 (2012).
Article CAS PubMed PubMed Central Google Scholar
Coleman, R. G., Carchia, M., Sterling, T., Irwin, J. J. & Shoichet, B. K. Ligand pose and orientational sampling in molecular docking. PLoS One 8, e75992 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Bender, B. J. et al. A practical guide to large-scale docking. Nat. Protoc. 16, 4799–4832 (2021).
Article CAS MATH PubMed PubMed Central Google Scholar
Fischer, A., Smieško, M., Sellner, M. & Lill, M. A. Decision making in structure-based drug discovery: visual inspection of docking results. J. Med. Chem. 64, 2489–2500 (2021).
Article CAS PubMed Google Scholar
Rees, D. C., Congreve, M., Murray, C. W. & Carr, R. Fragment-based lead discovery. Nat. Rev. Drug Discov. 3, 660–672 (2004).
Article CAS PubMed Google Scholar
Gao, K., Oerlemans, R. & Groves, M. R. Theory and applications of differential scanning fluorimetry in early-stage drug discovery. Biophys. Rev. 12, 85–104 (2020).
Article MATH PubMed PubMed Central Google Scholar
Leeson, P. D. & Springthorpe, B. The influence of drug-like concepts on decision-making in medicinal chemistry. Nat. Rev. Drug Discov. 6, 881–890 (2007).
Article CAS MATH PubMed Google Scholar
Hann, M. M. Molecular obesity, potency and other addictions in drug discovery. Med. Chem. Commun. 2, 349–355 (2011).
Article CAS MATH Google Scholar
Teague, S. J., Davis, A. M., Leeson, P. D. & Tudor, O. The design of leadlike combinatorial libraries. Angew. Chem. Int. Ed. 38, 3743–3748 (1999).
<3743::AID-ANIE3743>3.0.CO;2-U" data-track-item_id="10.1002/(SICI)1521-3773(19991216)38:24<3743::AID-ANIE3743>3.0.CO;2-U" data-track-value="article reference" data-track-action="article reference" href="https://doi.org/10.1002%2F%28SICI%291521-3773%2819991216%2938%3A24%3C3743%3A%3AAID-ANIE3743%3E3.0.CO%3B2-U" aria-label="Article reference 39" data-doi="10.1002/(SICI)1521-3773(19991216)38:24<3743::AID-ANIE3743>3.0.CO;2-U">Article CAS MATH Google Scholar
Jackson, S. P. & Bartek, J. The DNA-damage response in human biology and disease. Nature 461, 1071–1078 (2009).
Article ADS CAS MATH PubMed PubMed Central Google Scholar
Fink, E. A. et al. Structure-based discovery of nonopioid analgesics acting through the α2A-adrenergic receptor. Science 377, eabn7065 (2022).
Article CAS PubMed PubMed Central Google Scholar
Lyu, J., Irwin, J. J. & Shoichet, B. K. Modeling the expansion of virtual screening libraries. Nat. Chem. Biol. 19, 712–718 (2023).
Article CAS MATH PubMed PubMed Central Google Scholar
Hamamoto, R. et al. SMYD3 encodes a histone methyltransferase involved in the proliferation of cancer cells. Nat. Cell Biol. 6, 731–740 (2004).
Article CAS MATH PubMed Google Scholar
Zhang, L. Q. et al. Lowered nudix type 5 (NUDT5) expression leads to cell cycle retardation in HeLa cells. Mol. Cell. Biochem. 363, 377–384 (2012).
Article CAS PubMed Google Scholar
De Semir, D. et al. Pleckstrin homology domain-interacting protein (PHIP) as a marker and mediator of melanoma metastasis. Proc. Natl Acad. Sci. USA 109, 7067–7072 (2012).
Article ADS MATH PubMed PubMed Central Google Scholar
Hann, M. M., Leach, A. R. & Harper, G. Molecular complexity and its impact on the probability of finding leads for drug discovery. J. Chem. Inf. Comput. Sci. 41, 856–864 (2001).
Article CAS MATH PubMed Google Scholar
Schuller, M. et al. Fragment binding to the Nsp3 macrodomain of SARS-CoV-2 identified through crystallographic screening and computational docking. Sci Adv 7, eabf8711 (2021).
Article ADS MATH PubMed PubMed Central Google Scholar
Gahbauer, S. et al. Iterative computational design and crystallographic screening identifies potent inhibitors targeting the Nsp3 macrodomain of SARS-CoV-2. Proc. Natl. Acad. Sci. USA 120, e2212931120 (2023).
Article CAS PubMed PubMed Central Google Scholar
de Graaf, C. et al. Crystal structure-based virtual screening for fragment-like ligands of the human histamine H1 receptor. J. Med. Chem. 54, 8195–8206 (2011).
Article ADS MATH PubMed PubMed Central Google Scholar
Ranganathan, A. et al. Ligand discovery for a peptide-binding GPCR by structure-based screening of fragment- and lead-like chemical libraries. ACS Chem. Biol. 12, 735–745 (2017).
Article CAS MATH PubMed Google Scholar
Fischer, M., Coleman, R. G., Fraser, J. S. & Shoichet, B. K. Incorporation of protein flexibility and conformational energy penalties in docking screens to improve ligand discovery. Nat. Chem. 6, 575–583 (2014).
Article CAS PubMed PubMed Central Google Scholar
Sadybekov, A. A. et al. Synthon-based ligand discovery in virtual libraries of over 11 billion compounds. Nature 601, 452–459 (2022).
Article ADS CAS MATH PubMed Google Scholar
Gentile, F. et al. Deep docking: A deep learning platform for augmentation of structure based drug discovery. ACS Cent. Sci. 6, 939–949 (2020).
Article CAS MATH PubMed PubMed Central Google Scholar
Cheng, C. & Beroza, P. Shape-aware synthon search (SASS) for virtual screening of synthon-based chemical spaces. J. Chem. Inf Model. 64, 1251–1260 (2024).
Article CAS MATH PubMed Google Scholar
Carlsson, J. & Luttens, A. Structure-based virtual screening of vast chemical space as a starting point for drug discovery. Curr. Opin. Struct. Biol. 87, 102829 (2024).
Article CAS MATH PubMed Google Scholar
Word, J. M., Lovell, S. C., Richardson, J. S. & Richardson, D. C. Asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation. J. Mol. Biol. 285, 1735–1747 (1999).
Article CAS MATH PubMed Google Scholar
Weiner, S. J. et al. A new force field for molecular mechanical simulation of nucleic acids and proteins. J. Am. Chem. Soc. 106, 765–784 (1984).
Article CAS MATH Google Scholar
Gallagher, K. & Sharp, K. Electrostatic contributions to heat capacity changes of DNA-ligand binding. Biophys. J. 75, 769–776 (1998).
Article ADS CAS MATH PubMed PubMed Central Google Scholar
Mysinger, M. M. & Shoichet, B. K. Rapid context-dependent ligand desolvation in molecular docking. J. Chem. Inf. Model. 50, 1561–1573 (2010).
Article CAS PubMed Google Scholar
Meng, E. C., Shoichet, B. K. & Kuntz, I. D. Automated docking with grid-based energy evaluation. J. Comput. Chem. 13, 505–524 (1992).
Article CAS Google Scholar
Sterling, T. & Irwin, J. J. ZINC 15–ligand discovery for everyone. J. Chem. Inf. Model. 55, 2324–2337 (2015).
Article CAS MATH PubMed PubMed Central Google Scholar
Baell, J. B. & Holloway, G. A. New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. J. Med. Chem. 53, 2719–2740 (2010).
Article CAS MATH PubMed Google Scholar
RDKit: Open-source cheminformatics. https://www.rdkit.org
Schomburg, K., Ehrlich, H. C., Stierand, K. & Rarey, M. From structure diagrams to visual chemical patterns. J. Chem. Inf. Model. 50, 1529–1535 (2010).
Article CAS PubMed Google Scholar
Enamine REAL space: https://enamine.net/compound-collections/real-compounds
Begnini, F. et al. Importance of binding site hydration and flexibility revealed when optimizing a macrocyclic inhibitor of the Keap1-Nrf2 protein-protein interaction. J. Med. Chem. 65, 3473–3517 (2022).
Article CAS MATH PubMed PubMed Central Google Scholar
McKay, B. D. & Piperno, A. Practical graph isomorphism, II. J. Symb. Comput. 60, 94–112 (2014).
Article MathSciNet MATH Google Scholar
Sadowski, J. & Gasteiger, J. From atoms and bonds to three-dimensional atomic coordinates - automatic model builders. Chem. Rev. 93, 2567–2581 (1993).
Article CAS MATH Google Scholar
Winter, G., Lobley, C. M. & Prince, S. M. Decision making in xia2. Acta Crystallogr. D Biol. Crystallogr. 69, 1260–1273 (2013).
Article ADS CAS MATH PubMed PubMed Central Google Scholar
Parkhurst, J. M. et al. Robust background modelling in DIALS. J. Appl. Crystallogr. 49, 1912–1921 (2016).
Article ADS CAS MATH PubMed PubMed Central Google Scholar
Evans, P. Scaling and assessment of data quality. Acta Crystallogr. D Biol. Crystallogr. 62, 72–82 (2006).
Article ADS MATH PubMed Google Scholar
Collaborative Computational Project, Number 4. The CCP4 suite: programs for protein crystallography. Acta Crystallogr. D Biol. Crystallogr. 50, 760–763 (1994).
McCoy, A. J. et al. Phaser crystallographic software. J. Appl. Crystallogr. 40, 658–674 (2007).
Article ADS CAS MATH PubMed PubMed Central Google Scholar
Emsley, P. & Cowtan, K. Coot: model-building tools for molecular graphics. Acta Crystallogr. D Biol. Crystallogr. 60, 2126–2132 (2004).
Article ADS MATH PubMed Google Scholar
Murshudov, G. N., Vagin, A. A. & Dodson, E. J. Refinement of macromolecular structures by the maximum-likelihood method. Acta Crystallogr. D Biol. Crystallogr. 53, 240–255 (1997).
Article ADS CAS MATH PubMed Google Scholar
Baykov, A. A. et al. Inorganic pyrophosphatase as a label in heterogeneous enzyme immunoassay. Anal. Biochem. 171, 271–276 (1988).
Article CAS MATH PubMed Google Scholar
Hanna, B. M. F., Helleday, T. & Mortusewicz, O. OGG1 inhibitor TH5487 alters OGG1 chromatin dynamics and prevents incisions. Biomolecules 10, 1483 (2020).
Article CAS PubMed PubMed Central Google Scholar
Download references
A.L. was supported by a postdoctoral scholarship of the Knut and Alice Wallenberg Foundation (KAW2022.0347). J.C. received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement: 715052), the Swedish Cancer Society (22 2473 Pj), the Swedish Strategic Research Program eSSENCE, and the Swedish Research Council (2021-03464). This work was supported by grants from the Swedish Research Council (2015-00162) and the Swedish Cancer Society (24 3694 Pj) to T.H. This work was supported by grants from the Swedish Research Council (2022-03681) and the Swedish Cancer Society (20 1287 PjF) to P.S. The computations were enabled by resources provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS), partially funded by the Swedish Research Council through grant agreement no. 2022-06725, and the Berzelius resource provided by the Knut and Alice Wallenberg Foundation. A.L. and J.C. thank OpenEye Scientific Software for the use of OEToolkits at no cost. E.S., J.D. and S.K. thank the beamline scientists at Diamond Light Source (UK), MAXIV (Sweden) and PETRA3 (Germany) for their support with X-ray diffraction data collection. The authors thank the Uppsala University Drug Optimization and Pharmaceutical Profiling Platform (UDOPP) for the determination of in vitro pharmacokinetic properties. This project was supported by the Compound Center at the Chemical Biology Consortium Sweden (CBCS). CBCS is funded by the Swedish Research Council (2021-00179) and SciLifeLab. This project has received funding from the Innovative Medicines Initiative 2 Joint Undertaking (JU) under grant agreement No 875510 (M.M., E.H., and E.W.). The JU receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA and Ontario Institute for Cancer Research, Royal Institution for the Advancement of Learning McGill University, Kungliga Tekniska Högskolan, Diamond Light Source Limited. This article reflects the views of the authors and the JU is not liable for any use that may be made of the information contained herein.
Open access funding provided by Uppsala University.
Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, BMC, Box 596, SE-751 24, Uppsala, Sweden
Andreas Luttens, Duc Duy Vo, Flavio Ballante & Jens Carlsson
Institute for Medical Engineering & Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Andreas Luttens
Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
Andreas Luttens
Department of Biochemistry and Biophysics, Stockholm University, SE-106 91, Stockholm, Sweden
Emma R. Scaletti, Jonathan Davies, Sara Košenina, Geoffrey Masuyer & Pål Stenmark
Science for Life Laboratory, Department of Oncology-Pathology, Karolinska Institute, SE-171 77, Stockholm, Sweden
Elisée Wiita, Ingrid Almlöf, Olov Wallner, Liuzhen Meng, Maeve Long, Oliver Mortusewicz, Maurice Michel, Evert Homan, Martin Scobie, Christina Kalderén, Ulrika Warpman Berglund & Thomas Helleday
Enamine Ltd., 02094, Kyiv, Ukraine
Andrii V. Tarnovskiy, Dmytro S. Radchenko & Yurii S. Moroz
Taras Shevchenko National University of Kyiv, Kyiv, 01601, Ukraine
Yurii S. Moroz
Chemspace LLC, Kyiv, 02094, Ukraine
Yurii S. Moroz
Department of Chemistry-BMC, Uppsala University, SE-751 23, Uppsala, Sweden
Jan Kihlberg
Sheffield Cancer Centre, Department of Oncology and Metabolism, University of Sheffield, Sheffield, UK
Thomas Helleday
Search author on:PubMed Google Scholar
Search author on:PubMed Google Scholar
Search author on:PubMed Google Scholar
Search author on:PubMed Google Scholar
Search author on:PubMed Google Scholar
Search author on:PubMed Google Scholar
Search author on:PubMed Google Scholar
Search author on:PubMed Google Scholar
Search author on:PubMed Google Scholar
Search author on:PubMed Google Scholar
Search author on:PubMed Google Scholar
Search author on:PubMed Google Scholar
Search author on:PubMed Google Scholar
Search author on:PubMed Google Scholar
Search author on:PubMed Google Scholar
Search author on:PubMed Google Scholar
Search author on:PubMed Google Scholar
Search author on:PubMed Google Scholar
Search author on:PubMed Google Scholar
Search author on:PubMed Google Scholar
Search author on:PubMed Google Scholar
Search author on:PubMed Google Scholar
Search author on:PubMed Google Scholar
Search author on:PubMed Google Scholar
Search author on:PubMed Google Scholar
A.L. and J.C. designed the study. A.L. performed the molecular docking calculations, wrote code for generating and analyzing chemical spaces, and designed compounds under the supervision of J.C. A.L. and F.B. performed substructure and similarity calculations. D.D.V. performed compound synthesis under the supervision of J.K. E.R.S, J.D., S.K, and G.M. determined crystal structures of inhibitors under the supervision of P.S. Biological assays were performed under the supervision of T.H. E.W. performed and analysed biochemical experiments. I.A. performed and analysed cellular assays. L.M and M.L performed and analysed target engagement assays under the supervision of O.M. O.W., M.S., E.H., M.M, C.K, and U.W.B. contributed to coordination of the project, acquisition of reference compounds for the assays, data analysis, and compound synthesis. A.V.T., D.S.R., and Y.S.M. provided the Enamine REAL space and analytical data for the synthesized compounds. A.L. and J.C. wrote the manuscript with contributions from the other authors.
Correspondence to Jens Carlsson.
J.C. is a founder of DareMe Drug Discovery Consulting. O.W. and T.H. are listed as inventors on U.S. patent no. WO2019166639 A1, covering a different class of OGG1 inhibitors than those described in this work. The patent is fully owned by a non-profit public foundation, the Helleday Foundation, and T.H. is a member of the foundation board. U.W.B., C.K. and M.S. are employees of Oxcia, a company developing OGG1 inhibitors towards the clinic. T.H. is a member of the board of Directors of Oxcia. D.S.R. and Y.S.M. are employed with Enamine Ltd. Y.S.M. serves as a scientific advisor to Chemspace LLC. The remaining authors declare no competing interests.
Nature Communications thanks R Stephen Lloyd, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Reprints and permissions
Luttens, A., Vo, D.D., Scaletti, E.R. et al. Virtual fragment screening for DNA repair inhibitors in vast chemical space. Nat Commun 16, 1741 (2025). https://doi.org/10.1038/s41467-025-56893-9
Download citation
Received: 27 April 2024 Accepted: 28 January 2025 Published: 18 February 2025 Version of record: 18 February 2025 DOI: https://doi.org/10.1038/s41467-025-56893-9
Received: 27 April 2024
Accepted: 28 January 2025
Published: 18 February 2025
Version of record: 18 February 2025
DOI: https://doi.org/10.1038/s41467-025-56893-9
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
Cheminformatics DNA repair enzymes Structure-based drug design Virtual drug screening X-ray crystallography
SLICE (SMARTS and Logic In ChEmistry): fast generation of molecules using advanced chemical synthesis logic and modern coding style Stefi Nouleho Ilemo Victorien Delannée Nadya I. Tarasova Journal of Cheminformatics (2025) Ultra-large virtual screening unveils potent agonists of the neuromodulatory orphan receptor GPR139 Israel Cabeza de Vaca Boris Trapkov Jens Carlsson Nature Communications (2025)
Stefi Nouleho Ilemo Victorien Delannée Nadya I. Tarasova
Journal of Cheminformatics (2025)
Israel Cabeza de Vaca Boris Trapkov Jens Carlsson
Nature Communications (2025)