August 2022
Rishal Aggarwal received his MS in Biology. His research work was supervised by Prof. Deva Priyakumar. Here’s a summary of his research work on Ligand Binding Site Detection and Inverse Design of Molecules using Deep Learning:
Drug discovery involves the process of designing molecules that interact well with a given target protein structure in order to modulate to its function. With large-scale availability of 3D structures of proteins, structure based drug design has been developed as a rational approach towards de novo drug design. In its core, it is about developing drug molecules that bind well with the receptor at its binding site and also portray ideal drug-like properties for Absorption, Distribution, Metabolism and Excretion (ADME). A prerequisite to this is finding druggable and functionally relevant binding sites on the 3D structure of the protein. Once the binding site is identified large libraries of molecules are screened against it to identify top candidate molecules, a process more commonly known as virtual screening. However, these chemical libraries represent only a minute portion of the larger chemical space where more ideal molecules may reside. To access this larger chemical space generative model need to be designed that can learn a distribution over this space for efficient sampling. Such a situation specifically beckons for inverse molecule design wherein generative models sample molecules conditioned to have specific properties and chemical structures. The rapid adoption and success of deep learning algorithms in various sections of structural biology motivates the usage of such algorithms for important tasks in drug design. In this thesis, we present three studies, namely DeepPocket, Apobind and MolGPT, that address important tasks in drug design such as binding site detection, molecular generation, and optimization using deep learning. DeepPocket uses 3D convolutional neural networks to identify and segment out druggable binding sites on the surface of the protein. However, DeepPocket was initially trained on only annotated ligand-bound (holo) protein structures since only such datasets were available for binding site detection. Therefore, Apobind a dataset of annotated ligand-unbound (Apo) structures is mined to circumvent this limitation and also act as an useful resource for other tasks like binding affinity prediction and molecular docking. Finally, MolGPT, a novel molecular generative model based on the attention-based transformer architecture is discussed while and its efficacy in molecular optimization and inverse molecule design is demonstrated.