Patient data is one of the most contentious terms in healthcare. Add on the words ‘artificial intelligence (AI)’ and now you have a recipe for a lot of sighs and groans in industry conversations. Pharma has been inundated by these terms for a number of years now, and for good reason: patient data is transformative – across pharma’s entire value chain. This is why 90% of large pharmaceutical firms have already initiated AI projects last year. 

The premise sounds simple. By collecting, structuring, and extracting insights from a vast quantity of patient data, pharma can reduce the cost of drug discovery trials and accelerate the time to market for new drugs. The underlying challenge in using patient data can be summed up it two words – complexity and scale. Imagine parsing through a massive pile of scanned documents turned into PDFs, unstructured text data from clinical trials and doctors’ notes, tissue imaging data, and genomic information, all coming from siloed facilities with data privacy concerns. If you’re having a bad day, the data may even come in the form of printed documents or a CD-ROM (good luck finding a drive for one of those). It’s like introducing four of your friends from different countries to one another, with each only speaking their native language. The unstructured nature of these datasets coupled with the quantity makes data ingestion problematic, but this is also what makes it extremely valuable. It is estimated that healthcare generated roughly 25,000 petabytes of data last year, with a compound annual growth rate of 36% through 2025. If you’re having trouble wrapping your head around that number: 1 petabyte of storage could hold roughly 11,000 4K movies, enough for 2.5 years of nonstop binge watching.  

This is where AI comes in. More specifically, Machine Learning (ML), its most prevalent form. An ML algorithm studies voluminous data in intense detail, able to pinpoint patterns that would otherwise escape human researchers. It can parse through multiple types of patient data from a range of sources and combine them together. Using ML tools, researchers can now produce research-grade data than can lead to better predictive models across the drug development chain, from efficacy and risk-benefit analyses to post-marketing surveillance.

Clinical trial matching has become a practical test bed for AI in pharma

The first low-hanging fruit application is clinical trial design. By looking at electronic health records (EHRs), patient demographics, and omics datasets, trials can be more patient centric. Analyzing a combination of patient genomic data with medical records allows researchers to identify participants who are more likely to benefit from the drug candidates in trials. We’re already seeing this in play, with Janssen partnering with Komodo Health for patient matching using claim codes and other data sources. The ability to generate research-grade data from patient-reported outcomes will also allow more remote trials to be conducted, decreasing the burden on patients while providing researchers with high quality information. Imagine removing the need for patients to keep a medical journal or to travel long distances just so we can get a snapshot of their health status. 

Beyond clinical trials, it’s easy to visualize ML’s expansion into generating real-world evidence (RWE) for pharma. This will also be increasingly utilized to incorporate data from different platforms, such as genomic data, imaging, and clinical records to get a better understanding of drug response even post-commercialization. As RWE’s status is elevated by regulatory bodies, its use in monitoring post-market safety and adverse events are becoming more important for making regulatory decisions. AI can even be used to improve patient service programs by enhancing virtual assistants and capturing more data points. All of these applications are transforming AI’s status from a ‘nice to have’ to an essential part of the pharma arsenal. 

‘Federated Learning’ may transcend limitations on data bias, privacy, and security

Faster and cheaper drug development and approval sound like a dream, but patient data and AI are not quite peanut butter and jelly. AI does not take into account bias. Simply put, the quality of the result is directly related to the quality of data – garbage in, garbage out. The models you can generate is only as good and unbiased as the datasets used to train them. An emerging solution is the use of Federated Learning (FL), which allows training of algorithms collaboratively without exchanging data itself. This removes the limitations on access and privacy by allowing the use of data for training without moving the patient data beyond institutional firewalls. Successful implementation of FL can bypass governance and privacy concerns while creating models that are superior relative to existing solutions.  

Moving forward, we will see more pharma companies invest in AI primarily through collaborations or by making an equity investment through their venture capital arms to gain a seat on the board. A handful of pharma companies are also setting up their own in-house AI units, such as GSK, Novartis, and Roche. In most cases, it’s clear that current partnerships are driven more by the access to data rather than competencies on algorithms. We have mapped out a selection of partnerships and investments between pharma companies and AI vendors below: 

Fueled by the promise of competitive advantage and capital efficiencies of AI, pharma will definitely dive deep into patient data and hope for a rise in the tides.