Love AI or Hate AI But Biochemists Can’t Ignore AI -

Prof. Deva Priyakumar discusses the drug discovery process and sheds light on the role played by computations in making it more efficient.

Before delving into how AI supports the drug discovery process, it’s essential to understand the evolution of drug discovery as we know it. And no better way to start than with the story of the earliest known drug – Aspirin. It is also one of the most widely consumed drugs today with close to 100 billion pills consumed across the world. Ancient history and written texts reveal that extracts of the bark of the willow tree were used to treat inflammation and fever. Its pain-relieving properties were well known and Hippocrates too recommended willow for fever, pain and even childbirth. Much later in the 19th century, the willow’s anti-inflammatory properties were traced to an active ingredient, a compound called salicylic acid. However, it was used sparingly because its long-term usage came to be associated with damage to the stomach in the form of internal bleeding. A modified form of the compound was created synthetically marking not just the birth of the first official drug known to us but also the genesis of the pharmaceutical industry. It was marketed and patented by Bayer as a non-steroidal anti-inflammatory drug. In fact, in 1918, Aspirin became to Spanish flu what acetaminophen was to the Covid-19 pandemic. With the introduction of competing analgesics like Acetaminophen in 1959 and Ibuprofen in 1969, the monopoly of aspirin as the only pain-relieving drug in the market came to an end. However, there was a gradual revival in its fortunes during the ‘70s and ‘80s when aspirin began to be used as a blood thinner for treating vascular diseases like heart attacks and strokes.

Understanding How Aspirin Works
The first step towards an understanding of the drug happened with the discovery of the cyclooxygenases (COX) as the target of non-steroidal anti-inflammatory drugs. Very simplistically, it means that in addition to binding to one particular protein that causes inflammation, it also binds to another that causes gastrointestinal bleeding. It was while looking at the structures of the proteins and understanding their binding affinity that the binding locations were discovered. That explains the discovery of different drugs that could bind to the two different enzymes (COX-1 and COX-2). To summarize the early drug discovery process, it can be pictured in 3 overall stages – the empirical where aspirin began to be used as a pain reliever based on observation or experience, the next stage when there was an understanding of the components of the drug. Incidentally, the market at the time was dominated by the textile and dye industry which had easy access to chemicals. Finally, there was an understanding of not only the drug but also its receptors. An in-depth exploration began at the organism level and moved onto the tissue, cellular and finally the molecular level.

The Drug Discovery Process
When there’s a disease that needs to be treated, identifying pathways involved in it – whether it is an infection or a non-communicable disease – is the first step. The next is target identification which refers to the process of identifying molecular targets for compounds to bind to and is at the heart of the drug discovery process. It is when the actual drug design begins. To test whether the binding takes place, trials are conducted ‘in vitro’ and ‘in vivo’. When the trials are successful, a marketable drug can be produced. The failure rate of this process is extremely high and so are the corresponding costs involved.

AI In Drug Discovery Cycle
Every time you ingest a pill, it travels to the digestive system where it is absorbed before getting distributed or spread throughout the body, metabolised or processed by the body into subsequent compounds, and finally excreted. In effect, any drug molecule that is discovered needs to possess the above-mentioned properties. In the drug discovery process, once the target protein molecule has been identified, it comes down to shortlisting molecules that can bind to the target. In the initial stages, this means beginning with a large library of drug-like molecules where each shortlisted molecule is painstakingly evaluated. This is where computational methods play a big role. It’s more about rejecting bad candidates for binding than about finding the right ones. Physics-based methods to predict molecular properties are time-consuming and very expensive. Due to these limitations, AI algorithms are being used at every stage of the drug discovery cycle. The first HIV drug that targeted the protease owes its success to significant contribution from computational technologies. Over the last few years, AI is playing an increasing role not just in the initial stages of drug discovery but throughout the pipeline – from identifying the target or analysing biological pathways right down to predicting clinical trial outcomes.

The number of startups using AI for drug discovery has only grown exponentially over the last few years. However, scientists are still hesitant about embracing technology because AI is a big black box for them. The hesitancy arises from not knowing how it works. However, we have started to explore these methods in the following situations.

When Physics is well known but is inefficient:
One of the first steps in the drug discovery pipeline is to take a large database of drug-like molecules and then screen them. It essentially means shortlisting all the molecules with the properties of absorption, distribution, metabolism and excretion. Machine learning models make this process more efficient. The exercise refers to choosing the top hits from the library that fulfil the conditions laid down. In this case, we typically start with zero data, or begin evaluations with minimal data explicitly generated to acquire compounds that are more useful in treating the particular disease. The percentage of effort put in is inversely proportional to the returns you get when you use computational methods.

For problems not yet solved by Physics-based methods: One of the trickiest parts of drug discovery is predicting protein structures which will help in determining binding affinity. We have come a long way in the understanding of protein folding during the last few decades, however it is still elusive through exclusive physics-based methods. ML has provided the best answer yet in form of Alphafold.

When the solution is known, but it’s hard to code: It needs an expert to figure out how to make a drug-like molecule starting from available reagents. However, given the large number of possible synthetic methodologies, it is hard to make rules so that a code could be written. We use reinforcement-based learning to figure out given a compound, what is the pathway one should take, or steps to be followed to make the same compound in a lab.

Essentially, the ML model is working backwards in this case from a target compound to develop multiple pathways.

Inverse Problems: Typically in drug discovery, you begin with a molecule and proceed to ask if it fulfils the 4 properties. But in this case, you begin with the 4 properties you want in a molecule and use them as inputs in a machine learning model. You cannot use Physics-based methods to handle these questions. A Generative AI model will respond by generating molecules based on the criteria used as inputs. We have also used GPT architecture to generate new molecules with specific properties of interest.

Summary
AI and ML have established themselves as the fourth paradigm in scientific research with these methods proving to be an excellent addition to the chemist’s toolkit. Their use in bridging the human-machine intelligence gap will revolutionise the way we do Science. That said, there are bottlenecks in the adoption of AI for drug discovery such as lack of datasets, inherent biases, interpretability of models, generalizability and so on.

This article was initially published in the July edition of TechForward Dispatch

Prof Deva Priyakumar

Prof Deva Priyakumar is a Professor and Head of the Center for Computational Natural Sciences and Bioinformatics at IIIT Hyderabad. He is also the Project Director of IHub-Data, the Technology Innovation Hub under the National mission for Interdisciplinary Cyber Physical Systems. His research interests lie in the areas of applying computational methods for studying chemical and biological systems/processes. Recently, his group has made significant contributions in applying modern AI/ML techniques for molecular science research.

IIITH Study On The Visual Gaze Shows How Attire Is Irrelevant For Objectification Of Women »

« International Linguistics Olympiad: One Silver, Two Bronze Medals And An Honourable Mention For Team India