[month] [year]

K Ganesh Chandan

K Ganesh Chandan supervised by Dr. Deva Priyakumar U  received his Master of Science – Dual Degree  in Computational Natural Science (CNS). Here’s a summary of his research work on Identifying Latent Biases In Drug Design and AdvancingMolecular Structure Generation using Deep Learning algorithms:

Integrating Artificial Intelligence (AI) into drug design has revolutionized the field at various levels, ranging from simple molecular property prediction tasks to complex de novo drug design tasks. This is still a relatively young field and is rapidly evolving. In this thesis, we identify various challenges AI models face due to data constraints and analyze hidden biases in these methods; we improve upon existing molecular generation models for aligning the drug design to desired complex properties; we propose a novel framework for structure elucidation from molecular spectra. The second chapter explores various biases in protein-ligand datasets that affect the performance of deep learning models in binding affinity prediction and virtual screening. It demonstrates how random data splitting leads to overly optimistic results and introduces more nuanced data splitting methods that account for sequence, pocket structure, and protein-ligand interaction similarities. The research identifies biases in these datasets related to protein-only and ligand-only information. It proposes improvements in dataset construction and model design to enhance the generalizability and accuracy of AI-based protein-ligand scoring functions. In the third chapter, we develop and refine Molecular generative models for more accurate conditional generation. We propose optimization methods aligning the generated molecules’ properties with the desired ones. The fourth chapter introduces the Spectra and Molecule Encoder Network (SMEN) for scoring molecules against target spectra. The model aims to aid library ranking tasks for Infrared (IR) Spectrum based structure elucidation. SMEN learns spectral and molecular embeddings, accurately representing both entities in a high-dimensional latent space. Additionally, the SMILES Decoder (SD) generates SMILES strings from the latent space, enabling accurate one-shot predictions from IR spectra to molecule structures and showcasing promising accuracies for rapid, precise structure elucidation. In summary, This thesis explores the potential of ML models to predict binding affinities and generate molecular structures with desired properties. It begins by identifying and addressing latent biases in popular protein-ligand datasets, affecting binding affinity predictions’ accuracy. This research introduces more refined data-splitting methods that enhance model generalizability by analyzing biases in sequence-based and structure-based datasets. The thesis then presents a generative model for generating molecules with specific characteristics, such as binding affinity, and further optimizes the model’ ability to produce molecules that meet predefined criteria. Lastly, a contrastive learning framework is developed for generating molecular structures from infrared spectra, demonstrating the model’s versatility in handling various molecular generation tasks. Each chapter addresses foundational issues, setting the stage for future innovations in the field. 

October 2024