Erasmus Mundus Joint Master - ChEMoinformatics+ : Deep Docking: a brief introduction

by: Dina Khasanova Track «Chemoinformatics and Materials Informatics», Bar Ilan-Strasbourg, 2022

Drug discovery is an extensive and rigorous process. It takes a long time to bring a molecule “from a bench to a bedside”. Virtual screening can significantly enhance drug discovery, but conventional docking methods are regarded as computationally expensive as the size of available chemical libraries is growing exponentially. In order to address this challenge some approaches are developed. One of them is Deep Docking, which suits for docking billions of molecular structures without significant loss of potential drug candidates [1], according to authors.

The protocol includes eight steps: (1) molecular descriptors calculations, (2) receptor preparation, (3) random sampling of the chemical library, (4) ligand preparation in 3D, (5) molecular docking, (6) statistical model training, (7) model inference and (8) repeat at point (3) biasing the sampling toward more potent active molecules. The procedure can be completely automated on high performance computing centers.

More in details (Figure 1).
1. For each molecule in a chemical library, the molecular descriptors are computed.
2. The raw PDB structures are prepared all-atom, fully parameterized and docking calculations are initialized.
3. A dataset is randomly sampled from the chemical library
4. Each chemical structure from this dataset is prepared in 3D with a physical model.
5. Prepared ligands are docked into the protein target using a conventional docking protocol. Best scored molecules are labeled "actives" and the others, "inactives".
6. A QSAR model is optimized, trained and validated on a dataset to discriminate between "actives" and "inactive" instances based on the molecular descriptors.
7. The resulting QSAR deep model is used to categorize all the molecules in the chemical library as "actives" and "inactives".
8. The algorithm repeats from step (3), with a dataset added with compounds that are more likely to be "active", and bit more stringent definition of the categories "actives" and "inactives".

**DD_pipeline**

Figure 1: Workflow of the DD pipeline adapted from ref. [2]

Gentile, F., Agrawal, V., Hsing, M. et al. Deep Docking: A Deep Learning Platform for Augmentation of Structure Based Drug Discovery. ACS Central Science 6, 939-949 (2020). https://doi.org/10.1021/acscentsci.0c00229

These steps are repeated up to a maximum number of iterations. At the end, compounds categorized as "actives" are the hits of the virtual screening [2]. The dataset used to build the QSAR model evolves at each iteration, and reinforce the performances of the QSAR model: at each iteration it is more predictive as suggested by the enrichment values measured on the test datasets.

This open source project is available on GutHub and is provided with a graphical user interface DD-GUI, that simplifies the access to this tool. It can be installed for Linux, Mac and Windows platforms [3].

References
[1] Gentile, F., Yaacoub, J.C., Gleave, J. et al. Artificial intelligence–enabled virtual screening of ultra-large chemical libraries with deep docking. Nat Protoc 17, 672–697 (2022). https://doi.org/10.1038/s41596-021-00659-2
[2] Gentile, F., Agrawal, V., Hsing, M. et al. Deep Docking: A Deep Learning Platform for Augmentation of Structure Based Drug Discovery. ACS Central Science 6, 939-949 (2020). https://doi.org/10.1021/acscentsci.0c00229
[3] Yaacoub, J.C., Gleave, J., Gentile, F. et al. DD-GUI: A graphical user interface for deep learning-accelerated virtual screening of large chemical libraries (Deep Docking). Bioinformatics 38, 1146-1148 (2022). https://doi.org/10.1093/bioinformatics/btab771