Pipelines are the recipes for analyzing data, and they were originally purpose-built for specific tasks. Individual labs and research cores made their own workflows with their institution’s computational equipment—tying them to the software packages and runtime environment used. In the past, researchers seldom shared their analysis pipelines because of a lack of portability—why share a recipe if nobody else can use it?
Worse yet, frequently the originator couldn’t reproduce their own work because a piece of the pipeline had received an update. In science, reproducibility is king, and a discovery made once isn’t truly a discovery.
Computational architectures have evolved as new technologies become commonplace. Virtualization allows us to create reproducible environments that can be kept consistent while the host machine’s software and hardware change. However, virtualizing an entire pipeline requires extensive storage and is computationally expensive, with each workflow required to be “frozen” along with the operating system and runtime environment.
In this white paper we’ll look at:
- The advantages of containerization
- How Nextflow creates and organizes workflows
- Driving success in bioinformatics
"Implementing robust and reproducible bioinformatics pipelines is a priority for us at Boehringer Ingelheim. Partnering with ZS Discovery to build efficient and scalable workflows using Nextflow has been a game-changer for our target discovery initiatives. This has saved us valuable resources, reducing the hands-on time and streamlining the interpretation of our data." — A Boehringer Ingelheim principal investigator
Add insights to your inbox
We’ll send you content you’ll want to read – and put to use.