AI & Analytics

Pharma’s next AI edge: A data strategy that’s purpose-built

By Kapil Pant, and Nimish Shah

Jan. 14, 2025 | Article | 13-minute read

Pharma’s next AI edge: A data strategy that’s purpose-built

AI & Analytics

Pharma’s next AI edge: A data strategy that’s purpose-built

Kapil Pant

and

Nimish Shah

Jan. 14, 2025 | Article | 13-minute read

Pharma’s next AI edge: A data strategy that’s purpose-built

While pharma and life sciences companies recognize the transformative potential of AI, too many still grapple with the complexities of their data.

In our research, 77% of pharma executives intend to rethink their data strategies, and only 35% believe their current strategies are designed to create competitive differentiation.

They’re looking to construct robust “moats” with their data—borrowing a term from business strategy. In this world, data is a driver of business growth, focusing on the advantage that comes from continuously learning and adapting from its possibilities.

Yet to use data as a growth driver in the age of AI, we must address a fundamental question: Where does data create unique value?

In pharma and life sciences this means looking in each business domain, especially with an eye toward gen AI’s ability to combine previously untapped unstructured data with traditional data sources.

Gen AI is the catalyst, but ask domain questions first

It’s good to remember that enterprise data offices typically guide a company’s data strategy on a three-to-five-year cycle. This approach prioritizes enterprise risk management and compliance for large-scale data, including data access and sharing rules. It’s optimized to reduce the friction that comes with using data, largely third-party syndicated data.

While these strategies have long empowered business units to create their own data solutions, they also have inefficiencies and may even be placing an undue burden on data consumers.

Enter powerful language models. These tools are not just processing data; they’re helping shape how companies think about and use data.

In general, early interest in gen AI for data strategists has been around two basic principles:

Gen AI allows you to create novel solutions with combinations of multimodal data. An example of this is in market research. A question-and-answer style chatbot can use structured market research reports, unstructured call center recordings and clinical notes you’ve structured with useful metadata. It’s a concept that is easily transferrable to other domains.
Gen AI can be “employed” to completely transform how the data strategy is implemented. Consider, for example, how internal enterprise data marketplaces work today. While they offer access to available data, they often fail to address the underlying issue of findability—people know the data exists, they just can’t find it. Generative AI agents and agentic workflows can revolutionize this by automating metadata creation and enhancing search capabilities for human language context. An AI-powered answer engine can understand user personas and natural language queries and return relevant results, reducing the time and effort required to find the right information.

We see a strong desire to use these principles to improve a company’s processes and decision-making through better automation, insights and analytics. And we’ve seen many companies jump on new opportunities with their current data. This isn’t a wrong step; it’s just not purpose-built. These companies are not explicitly considering how data can differentiate their businesses and create unique value.

That omission is also one of the core reasons that many organizations haven’t been able to scale their gen AI use cases beyond productivity-boosting copilots. There’s still a fundamental mismatch between the data capabilities companies have now and what they need to do to link the data strategy with ambitious business outcomes.

To determine if your own data strategy is purpose-built, here’s the order:

Take a domain lens: Recognize how certain features increase data’s potential to be a value-multiplier in any domain. Playing into one or more of these advantages is critical.
Embrace an agile approach: Move your data planning from a monolithic every three-to-five-year exercise toward agile services that allow you to collect, connect and enrich data. This approach focuses on evolving business needs, concrete business outcomes and optimized spending.
Strengthen your position: Work to combine first, second and third-party data to develop proprietary data products and services. Leverage AI and generative AI to continually optimize data management.

Factors that make data a differentiator in the gen AI era

To begin your assessment, you must understand the distinctive features of the data itself and the factors that increase its value for your organization.

These features tend to fall into four categories: uniqueness, volume and variety, complexity and context, and quality and agile governance.

The factors that increase data's value

Click on each below to see their definitions.

Source: ZS

Recognizing how these factors increase data’s potential to be a value-multiplier in your domain is the first step toward advantage. Now, let’s look at some examples.

How to differentiate with data in practice: Three domain examples for life sciences

Example #1 - Clinical data: Harnessing clinical data for more than regulatory submissions

Clinical trial data, with its unique and complex nature, offers significant competitive advantages beyond its primary function of regulatory submission. But the tight coupling with the submission process has created friction in the ability to leverage this data for other purposes, including optimizing new trials and discovering new therapeutic applications for existing drugs.

Today, centering your strategy on one or both of two value-multiplying factors—quality and agile governance and the importance of contextual relevance—can increase the potential for using clinical data beyond the submission process.

First, consider your mindset on quality and agile governance. A focus on making clinical data FAIR—Findable, Accessible, Interoperable and Reusable—holds significant potential value. The goal is to break down data silos and make information accessible to teams across the company. By design, these principles align with the ideals of a connected data ecosystem, enabling broader data utilization in enterprise AI solutions.

FAIR principles guide downstream decisions for data quality and governance. Ultimately, these principles drive changes to the collection processes that happen far upstream from regulatory submission so that the data can be reusable for a variety of analysis needs.

Generative AI can be a useful tool to FAIRify data, as it can automatically generate metadata tags and descriptions for data assets can help data become reusable (regardless of what the reuse objectives really are). These tags can also help users adhere to data governance policies that define access scenarios and guidelines, because with clinical data, access must be managed effectively for ongoing data integrity.

The other multiplying factor you can leverage in the clinical domain relates to how well you can master the rising complexity we’re seeing with multimodal combinations of data.

An oncology unit, for example, might be combining data from tumor biopsies, CT scans and data from EHRs to help assess tumor size and treatment effectiveness. Mastering the combinations of data modalities such as images, transcripts or videos with enrichment services that give the data the right context can be a true value multiplier.

FIGURE 1: Value multipliers for clinical data

Value multipliers for clinical data image

What you can do with this data advantage:

Optimize future trials by learning from past studies
Discover new therapeutic applications for existing drugs
Facilitate bidirectional knowledge transfer between research and clinical development
Augment understanding of disease pathways and progression
Optimize study design and execution through predictive biomarkers
Advance precision medicine initiatives

Case example:

Roche’s Apollo streamlines multimodal analysis of data from clinical trials, pathology, radiology imagery, imaging, electronic health records (EHRs), genomic data and wearables. Built on AWS, it efficiently scales analysis to support 1,300 scientists across 40 different sites.

Example #2 - Commercial data: Triangulating commercial data for healthcare ecosystem engagement

With commercial data, the advantage will be in how companies:

Collect unique data from first-party sources, like patients themselves.
Connect or harmonize data from diverse sets of first, second- and third-party sources to find new insights.
Enrich or contextualize data with relevant metadata. This could be, for example, classifications of risk, such as adherence risk for patients or an HCP’s place on a continuum or journey.

What’s new here? Typically, the focus of commercial data efforts is centered on collecting easily accessible, structured data like demographics and customer measures. While this information provides a basic understanding of a customer, it falls short of revealing what we really want to know about them: Who they really are and why they do what they do.

To get to that level of knowledge, we need to collect contextual and sometimes novel data that sheds light on a customer’s underlying needs, preferences and challenges.

Gen AI, with its ability to analyze unstructured data sources, makes this idea of creating deep context more attainable. In fact, the concept is so powerful, that we believe it will be the foundation of pharma’s next commercial model.

FIGURE 2: Value multipliers for commercial data

Value multipliers for commercial data image

What you gain from this data advantage:

Context-driven customer strategies and plans that are individualized based on all company products and teams that touch that customer.
Understanding the success of context-driven customer strategies, through multiple roles and channels, in a real-time, n=1 way.

Case example:

A U.S. subsidiary of a global biopharmaceutical company is using data to identify patients coping with rare diseases who are experiencing activation delays. By combining patient data from hubs, pharmacies and claims, the company can pinpoint healthcare providers who have patients at high risk of delayed treatment while protecting the identity of the patient. This allows the commercial team to proactively reach out to these providers to understand the reason for the delay and work together to either cancel the referral or expedite patient care.

Example #3 - Manufacturing optimization: From reactive to predictive

We estimate that 75% of manufacturing issues are currently identified after they occur, rather than being predicted beforehand. What’s worse, these issues are typically detected within individual manufacturing sites, rather than being analyzed across the entire manufacturing network.

A more proactive, scaled approach would not only prevent costly breakdowns, but also leave significant opportunities to improve a network’s yield variability, unplanned downtime, process variability or quality nonconformity.

Using manufacturing data on a larger scale requires leaning into unique characteristics specific to each company. It also requires effort to create additional contextual information for much of the data to be useful for analysis.

Specifically, manufacturing data is highly unique for each site. It includes data about the company’s raw materials, how equipment is configured and the operational procedures required to ensure consistent quality, efficiency and compliance. These unique configurations are ideal for digital replicas or digital twins that can be used to perform what-if scenarios to monitor, analyze, control and optimize processes that improve yield and compliance. Generative AI agents can boost the accuracy of these replicas due to their ability to analyze high-velocity data from connected devices, RFID tags and sensors in manufacturing facilities and warehouses.

Manufacturing data also requires specialized knowledge or expertise to find, interpret and contextualize or enrich it for meaning. Everything has context relative to onsite configurations and equipment or in systems up or downstream. It can be siloed in laboratory information management systems (LIMS) that focus on lab operations, for example, or manufacturing execution systems (MES) that monitor real-time progress on factory floors. Both contextual understanding and cross-silo data integration are essential prerequisites for predictive models or process simulations.

FIGURE 3: Value multipliers for manufacturing optimization

Value multipliers for manufacturing optimization image

What you can do with this data advantage:

Develop high-fidelity digital twins for process simulation
Optimize chemistry, manufacturing and controls practices for enhanced drug safety
Leverage historical data for experiment design optimization
Reduce production variability
Achieve sustainability goals

Case example:

A large drug manufacturer is focused on zero waste to landfills as part of its dual cost reduction and environmental sustainability goals. ZS helped the manufacturer develop a unified, data-driven scrap monitoring system deployed across all sites. This system enables the identification of root causes of waste, facilitating waste reduction and efficient scrap repurposing. Over time, generative AI agents can refine analysis and monitoring, streamlining data management tasks.

Find your differentiators now

The possibilities with AI should motivate every data leader to reconsider what it really means to have a purpose-built data strategy.

It starts with where data can deliver unique value. This means looking at each business domain, considering what factors are truly differentiating for the domain’s data and how generative AI can be used to accelerate change.

Instead of a rigid, long-term plan, prioritize flexible services designed to collect, connect and enrich domain data. These services should be adaptable to evolving business needs while delivering tangible results.

And remember, the journey never ends. Embrace automation and AI within data management itself to position you well for anything that comes next.

About the author(s)

Kapil Pant

Principal

Digital & Technology

Kapil Pant

Principal

Digital & Technology

Nimish Shah

Related insights

See more insights

Our top predictions for generative AI and the enterprise

AI & Analytics

Related insights

See more insights

AI & Analytics

Pharma’s next AI edge: A data strategy that’s purpose-built

Pharma’s next AI edge: A data strategy that’s purpose-built

Gen AI is the catalyst, but ask domain questions first

Gen AI is the catalyst, but ask domain questions first

Factors that make data a differentiator in the gen AI era