AI in Drug Discovery and Repurposing: Benefits, Approaches,

According to McKesson, a company with a two-hundred-year history delivering a third of all drugs across North America, you need around six months to start a pharmacy and another seven to nine months to see any revenue. If this seems too long and complex, just make a comparison with a drug development process.

It takes at least ten years and $2.6 billion to get a new medicine to the market. Meanwhile, only one-third of more than 20,000 known diseases have an adequate cure so far. All these prompt the pharma industry to seek help from new technologies — namely, artificial intelligence (AI) which holds a promise to speed up and otherwise improve drug discovery.

What is drug discovery?

Drug discovery is the first stage of the drug development flow that pinpoints new substances with strong potential to fight a targeted disease. Researchers sift through thousands or even millions of active compounds to identify the most effective ones. The study involves numerous tests in laboratories and on animals long before human trials.

Drug discovery process

The entire drug discovery process comprises five main steps:

Target identification and validation.
The initial screening and hit identification.
Hit to lead selection.
Lead optimization.
Preclinical research.

Key steps of drug discovery.

Target identification and validation. A target here is a molecule in the human body that reacts to a drug compound and produces the expected effect — say, preventing a disease. To understand whether a particular target is sensitive to a medication, scientists perform tests on cells, tissues, and animal models.

Hit identification. A hit is a small molecule able to effectively bind to and impact a validated target. Today, researchers usually apply high throughput screening (HTS) to select promising hits from a large library of chemical and biological compounds. HTS automates testing against targets, employing sensitive detectors, robotics, data processing software, and other technologies.

Hit-to-lead (H2L) selection. Hits selected by the previous step undergo a range of experiments that confirm (or deny) their efficacy. At this stage, researchers reduce the number of candidates to the most suitable ones, called leads.

Lead optimization. The aim of this step is to enhance effectiveness, reduce toxicity, increase absorption, and otherwise improve the lead molecules. After additional research, lead molecules can be repurposed for other targets (diseases).

Preclinical research. At the end of drug discovery, researchers perform in vivo and in vitro assays — tests in test tubes and living organisms (animals) respectively — to select drug candidates for clinical trials. On average, only one of 25 leads is pushed for further development.

Now, let’s see how drug development fits into the end-to-end drug development process.

Drug development process

The entire drug development lasts 10 to 15 years, covering the above-mentioned drug discovery, clinical trials, FDA approval, and post-market safety monitoring.

What it takes to develop a new drug.

Drug discovery alone takes five to six years (excluding target identification), with only ten of 10,000 drug candidates making it to the clinical trials.

Clinical trials last another five to seven years and include three phases. Phase one aims at testing drug safety and dosage on 20 to 100 volunteers. Phase 2 studies efficacy and side effects, engaging several hundreds of people with the disease. Phase three is the longest one, with up to 3,000 people involved. No more than 10 percent of molecules entering the clinical trial stage survive through all three phases and get approved by FDA.

A drug review involves doctors, chemists, microbiologists, and other experts who evaluate whether a new treatment is safe and efficient enough to be sold to patients. If the review team decides that the advantages of a medicine outweigh potential risks, the FDA allows it on the market. Yet, this is not the end of the story.

Post-market safety monitoring is conducted to gather additional information on drug effects. If the FDA or other regulatory organization spots problems, it initiates changes to the recommended dosage, adds cautions to the usage information, or takes other measures. In the case of serious issues, a drug can be pulled from the market.

Why pharma research needs AI

Given the low success rate and huge cost of bringing a new medicine to patients, even minor improvements can save companies millions of dollars. Not to mention the lives of people waiting for an innovative cure. So, it comes as no surprise that all large biopharma companies are investing in AI, particularly in deep learning, which has the potential to make the hunt for drugs cheaper, faster, and more precise.

Big biopharma companies investing in AI. Source: Deloitte

It’s worth noting that regulatory bodies treat the use of machine learning in healthcare with caution. The closer AI is to real patients the more limitations it faces. That’s why, for now, smart algorithms see fewer restrictions and wider adoption in the drug discovery phase that happens prior to tests on people.

When applied to drug discovery, smart algorithms have already proved their ability

to find new candidates in months rather than in years,
to make more accurate predictions on the efficacy and safety of a drug candidate, and
to enhance the success rate of the entire drug development pipeline.

But how exactly does AI bring these and other benefits at the preclinical stage? Below we’ll review the most popular AI use cases in drug discovery.

AI use cases in drug discovery

The Intelligent Drug Discovery report by Deloitte highlights five major use cases of AI in drug discovery:

target identification (28 percent of all solutions);
screening small molecular libraries to find new candidates (40 percent);
de novo drug design (8 percent),
drug repurposing (17 percent), and
preclinical studies (7 percent).

How AI is used across the drug discovery process.

Target identification: AstraZeneca collaborates with an AI startup to explore disease mechanisms

A key challenge of drug discovery is a limited list of proven targets for medicines. By now, FDA-approved drugs cover nearly 400 human genes while 80 percent of them are unstudied as potential targets or considered resistant to cure.

AI can help scientists better understand the mechanism that drives a particular disease and identify novel genes or proteins to be affected by medicines. This process often employs natural language processing to extract relevant information from scientific documents.

Datasets or inputs for target prediction algorithms are typically organized in knowledge graphs also known as semantic networks. They represent various biomedical entities (genes, their products — DNA and proteins, diseases, adverse effects, biological functions, etc.) and complex relationships between them in a simple-to-navigate, traceable form. One of the leading instruments to construct large-scale knowledge graphs and explore novel targets with deep learning is QIAGEN Biomedical Knowledge Base.

Biomedical knowledge graph example.

An example of a biomedical knowledge graph. Source: Drug target discovery using knowledge graph embeddings.

Real-life examples. Since 2019, the British-Swedish pharma giant AstraZeneca has been collaborating with BenevolentAI, a UK bioinformatics startup, to explore large amounts of biological data with graph neural networks and identify novel drug targets. In particular, they concentrate on two complex conditions — chronic kidney disease (CKD) and idiopathic pulmonary fibrosis (IDF). AstraZeneca already added a few AI-generated targets to its portfolio for further validation and development.

Another recent intelligent hunt for targets was initiated by drug discovery company Insilico Medicine. In collaboration with researchers from the Mayo Clinic, the University of Zurich, Harvard Medical School, and other scientific centers, they found 28 targets for amyotrophic lateral sclerosis (ALS), using the proprietary AI engine. Eighteen of those targets were already validated in animal models for further development.

Screening molecular libraries: AI found the first anticancer drug candidate

Once the target is identified, the next step is to search for molecules (hits) that can effectively impact it. Figuratively, the task is to find keys that have a high potential for opening the lock (provoking the desired response from the target).

How a hit molecule unlocks a disease.

As we’ve mentioned above, hit identification involves exploring large molecular libraries with high-throughput screening (HTS). But this method relies on brute force and a bit of good luck. No wonder, only a tiny part of the catch finally gets to clinical trials. AI serves as a sonar that detects objects with required parameters in the endless sea of possibilities.

The popular deep learning architectures at this stage are convolutional neuron networks (CNNs) which excel in image recognition and computer vision in healthcare. In the biomedical domain, they serve to predict which compounds will precisely fit into a target molecule with a known 3D structure. To train models, researchers can use commercial and public datasets from shared molecular libraries — such as PubChem, ChEMBL, ChemBridge, DrugBank, and more.

Real-life example. In April 2021, an Oxford-based biotech startup Exscientia and German drug discovery company Evotec announced the co-invention of the first AI-detected anti-cancer drug candidate allowed for clinical trials. With the help of Excientia’s platform, researchers sort through millions of small molecules seeking a remedy for solid tumors. The use of AI shortened the drug discovery period from five-six years to just eight months.

De novo drug design: a medicine against fibrosis was generated in just 21 days

De novo is Latin for “anew” or “from the beginning.” In drug discovery, it means creating molecules based on information about the target structure instead of finding them via screening. AI in this case predicts elements of a drug candidate (such as atoms, types of bonds, etc.) and enables generating chemical entities for a specific target.

Among deep learning algorithms employed for de novo design are

recurrent neural networks (RNNs) capable of forecasting the next fragment in the compound represented as a sequence of elements;
convolutional neural networks (CNNs), which show good results when working with 3D-structural information; and
generative adversarial networks (GANs), consisting of two competing neural networks - generator and discriminator. The former generates new, synthetic data, and the latter checks whether it looks real.

Also, researchers have high hopes for a reinforcement learning approach that aims at defining the best sequence of decisions (in our case — elements of a molecule) to get a long-term reward (an expected reaction from the target).

Real-life example. Major biopharma companies are already experimenting with AI-driven designs of new chemical entities for different conditions. For example, drug discovery company Insilico Medicine from Hong Kong managed to create a new, drug-like molecule from scratch in just 21 days. It took another 25 days to validate the compound meant to fight fibrosis. All in all, the entire process was 15 times faster than traditional discovery. For drug design, the company used a combination of GAN and reinforcement learning.

Drug repurposing: a treatment for arthritis found a use for COVID-19

Drug repurposing finds new applications for already available medicines, drugs under development, or candidates that didn’t go through FDA review. Approved or failed, the active molecules can be redesigned for other uses.

Repurposing cuts the development time frame as in most cases the safety of a compound was already tested. Around 30 percent of repurposed drugs finally make it to patients — huge progress compared with the 10 percent success rate of the traditional process.

Until recently, discoveries of new targets for existing medications were either accidental or fueled by hypotheses. The classical example is Viagra, synthesized by Pfizer to treat angina. During its development, researchers noticed an unexpected side effect. And now mankind has an effective (though, of short duration) panacea for erectile dysfunction.

AI gives researchers insights into polypharmacology — or the ability to impact different targets, associated with one or several conditions. As with target identification, a common approach here is to build knowledge graphs that reflect relationships between genes, diseases, and drugs. Then, graph neural networks (GNNs) can be applied to predict previously unknown connections.

Real-life example. The above-mentioned BenevolentAI used the proprietary knowledge graph and AI algorithms to detect existing medicines that could address COVID-19. It took researchers just 48 hours to identify the best candidate — baricitinib, a drug created for rheumatoid arthritis. Global clinical trials confirmed the antiviral effect of the medication. Finally, it was approved by the FDA for the treatment of hospitalized adults with coronavirus.

Preclinical research: deep learning models could potentially replace animal trials

Preclinical studies in animals involve repetitive analysis tasks, with subtle changes to be detected at each iteration. The resource-consuming process often ends up in failure since mice don’t precisely reflect how a drug would behave in a human body. As a result, we have time and money wasted, not to mention unnecessary suffering.

AI is here to improve the situation by predicting cross-species differences and identifying animal models that could produce more accurate results for a certain disease. Moreover, deep learning could potentially replace mice.

Real-life example. In April 2022, the first drug tested with AI rather than on animals entered the human trial stage. This achievement belongs to Quris AI headquartered in Israel and Boston and its Bio-AI Clinical Prediction Platform. The drug in question targets the root cause of Fragile X syndrome, the most common genetic condition leading to intellectual disabilities with no cure to date.

How to implement AI in drug discovery

AI technologies in drug discovery are still in their nascent stage. Yet, the huge benefits they promise prompt a growing number of biopharma companies to implement or at least consider novel approaches long before they reach maturity. Here are several things to think over if you want to be among early AI adopters.

Ensure access to reliable data

Improving drug discovery with AI requires access to large collections of relevant data. Above, we mentioned some commercial and public databases with information on molecules, their properties, known targets, adverse effects, and relations between them (PubChem, ChEMBL, ChemBridge, etc.) You may also check our article on drug data APIs to get familiar with the main publicly available sources of information on medications.

Besides that, you can establish partnerships with academia and research centers to get access to proprietary data and deep expertise.

Find external AI partners

Partnering with AI biotech startups and companies that deliver AI services is an effective way to accelerate AI adoption. The challenge here is to evaluate their work and the ready-made algorithms they provide. Many of those companies are quite small and new, so, in most cases, you’ll have limited information on the effectiveness of their solutions. Key indicators to watch out for are relevant scientific publications, collaborations with research centers and other biopharma companies, and the number of medicines being discovered or optimized with their technology.

Build AI talent in-house

The lack of relevant skills and talent in-house is one of the major barriers to successful AI implementation in the biopharma industry. You need to engage software engineers and create an internal data science team to make machine learning work for you. You may also augment your existing talent pool with third-party experts to work on particular tasks.

How a data science team is structured.

But AI-driven drug discovery needs not only IT and data specialists. The key people in this process are medical scientists familiar with machine learning and analytical approaches. They will participate in running AI projects and ensure the model interpretability (the ability to explain data and reasoning behind the prediction) to gain trust in outputs.

Pick up projects to evaluate AI impact

Clearly define where exactly to use AI in the first place and what you expect to reap — be it saving time, the discovery of several new targets, or progress in addressing previously incurable diseases. Prioritize several projects in which prediction tools could significantly enhance outcomes and test the AI impact on them for 12 to 24 months.

To clearly understand AI effectiveness, your internal team has to design KPIs that include machine learning metrics for evaluating how current algorithms cope with their tasks. This will allow you to timely address problems and maintain the AI performance.

AI in Drug Discovery and Repurposing: Benefits, Approaches, and Use Cases