The pharmaceutical industry is increasingly leveraging single-cell data in drug discovery and development—and with good reason. Research has shown us that many—if not most—of the complex diseases we struggle to treat are multicellular in nature. This is a crucial point, said Thomas Skot Jensen, principal at ZS, because “We need to understand our cells to develop effective drugs that work. Understanding what role cells play in disease etiology and how different cell types interact with one another can revolutionize human medicine.” But to understand those interactions we must first answer a very important question: What is a cell type?
“We should not think of cells as belonging to neat categories of cell types. That is just a convenient abstraction that allows us to perform our analyses.”
Christina Bligaard Pedersen, Lead Bioinformatician
“We should not think of cells as belonging to neat categories of cell types. That is just a convenient abstraction that allows us to perform our analyses,” explained Christina Bligaard Pedersen, lead bioinformatician at ZS. A quite elegant analogy has been posed by Xia and Yanai, who described species-specific “periodic tables” of cells, where cell developmental trajectories are “periods” and differentiation stages are “groups.” The idea of a periodic table for cells quite accurately captures the complexity involved in identifying cell types and provides researchers with a clear, concrete starting point for unlocking the therapeutic potential of single-cell-based approaches.
Nevertheless, building and leveraging these periodic tables is a challenging task requiring both biological acumen and technical know-how. Fortunately, a collection of brilliant minds is already hard at work.
A Bloomberg Terminal for human cell data
Single-cell RNA sequencing (scRNA-seq), which was selected by Nature as Method of the Year 2013, has underpinned efforts to study tissues at cell-level resolution. This method, which sequences the transcriptome of thousands to millions of single cells, has contributed to the development of several “atlases”—biological maps that capture the cellular diversity of organs or organisms and serve as guides for understanding and targeting disease in novel, more holistic ways.
One of the most elegant atlases is the comprehensive map of the adult human brain described in 2023. This Herculean effort resulted in a data set comprising over three million cells (including over two million neurons) clustered iteratively into more than 3,000 cell types—making a significant contribution to our understanding of the brain. Other efforts, such as the Human Cell Atlas and Tabula Sapiens, have also helped identify and classify human cell types to not only reveal which cell types are present, but also in which relative amounts. And just like Dmitri Mendeleev left empty spots for unidentified elements that could be predicted by his periodic table, periodic tables of cell types can similarly be used to predict missing ones leveraging our existing knowledge of development and differentiation.
But analyzing scRNA-seq data is not straightforward. Cell types are functional units, but the same cell type can exist in multiple and often transient states. (Think “isotopes” on the periodic table.) Due to this complexity, scRNA-seq data sets are typically comprised of cell populations containing different types and states, requiring technical skills and biological acuity to accurately categorize cell types and predict what lies in the gaps.
“We essentially need a Bloomberg Terminal for human cell data.”
Pascal Nordgren Timshel, Bioinformatics Manager
The current data sets and cell atlases are not directly comparable, having been built with a range of platforms and a variety of methodological approaches, spread across multiple databases. What if we could put all the tools in the hands of biologists to harness the full potential of human cell data? “We essentially need a Bloomberg Terminal for human cell data,” said Pascal Nordgren Timshel, bioinformatics manager at ZS.
With high-dimensional data comes high-dimensional challenges
While scRNA-seq is an indispensable method for analyzing single cells, the data is noisy, and discerning cell types both precisely and consistently across experiments presents a significant challenge. Additionally, complete cell atlases cannot be built upon only transcriptional data. The proteome, which reveals the functional components of cells, is another critical aspect. Translating scRNA-seq data to the functional level (for example, proteins) is both crucial and tricky.
“At ZS, we strive for innovation, continuously using state-of-the-art approaches and developing novel methods such as translating scRNA-seq data into functional information by estimating surface protein expression levels though scRNA-seq data denoising and network-based imputation,” Francisco Avila Cobos, senior bioinformatician at ZS, explained.
Using a suite of tools that rely heavily on computational approaches, ZS is working closely with our collaborators to annotate de novo cell types and build cell type references that the industry can use. Our collaborators have the biological discernment, which we marry with our technical expertise to enable breakthroughs such as the identification of a new type of adipocyte.
When annotating de novo cell types, we take a two-pronged approach. We first use machine learning to group cells and then, leveraging both our collaborators’ biological expertise and publicly available knowledge, we make inferences about the identities of each group. Using unbiased statistical approaches, we balance what we observe with what we expect to find. A novel cell type—akin to a new element on the periodic table—becomes a potential target for disease treatment when the cell type has been functionally characterized and its distinct properties are understood.
Translating the language of cells into health and disease
Identifying and classifying cell types and states is only the beginning. The ultimate goal of these atlases is a deep exploration of the biology of health and disease; connecting phenotypes to genes and pathways; and translating biological insights into the development of better drugs. To achieve this, additional methods building upon scRNA-seq may be employed.
For the pharma industry, some of the most promising insights can come from studying perturbated single cells. For example, Perturb-seq and sci-Plex facilitate the analysis of the gene expression responses of millions of cells in parallel to perturbations such as genome editing or exposure to a variety of compounds. These analyses can not only reveal links between genes and cellular functions but can also identify heterogeneity of responses to a compound—effects that are often so subtle that only high-throughput single-cell analysis can identify them. Identifying these effects is critical to fully understand a drug’s mechanism of action and why it may work in one subset of patients but not another.
“As we build and leverage cell-level tools to approach disease holistically, we may change the outlook for many diseases with unmet patient needs.”
Thomas Skot Jensen, Principal
Regardless of the tools or approaches, however, it is important to remember that cells do not live in a vacuum. To fully understand disease and how to treat it, we must characterize how cells interact with one another, how they are organized spatially and how those cell-cell relationships are perturbed in disease. “As we build and leverage cell-level tools to approach disease holistically, we may change the outlook for many diseases with unmet patient needs,” said Thomas Skot Jensen.
Add insights to your inbox
We’ll send you content you’ll want to read – and put to use.