Skip to main content

7 posts tagged with "nextflow"

View All Tags

From Bash to Nextflow: GATK Best Practice With Nextflow (Part 2)

41 min read
Thanh-Giang Tan Nguyen
Founder at RIVER

In Part 1, we built a complete 16-step GATK variant calling pipeline in bash鈥攑erfect for academic research and 1-10 samples. But what happens when you need to scale to 100+ samples? This is where Nextflow becomes essential.

馃搧 Repository: All code from this tutorial is organized in the variant-calling-gatk-pipeline-best-practice-from-scratch repository. The structure follows best practices with separate directories for bash (workflows/bash/) and Nextflow (workflows/nextflow/) implementations.

Setting Up a Local Nextflow Training Environment with Code-Server and HPC

9 min read
Thanh-Giang Tan Nguyen
Founder at RIVER

Setting up a robust development environment for Nextflow training across local and HPC systems requires a unified solution. Code-server provides a browser-based VS Code interface accessible from any machine, making it perfect for teams collaborating on Nextflow workflows. This guide walks you through configuring a complete Nextflow training environment with code-server, Singularity containers, and Pixi-managed tools.

For a comprehensive introduction to Pixi and package management, see our Pixi new-conda era.

How to Migrate from In-House Pipelines to Enterprise-Level Workflows: A Proven 3-Step Validation Framework

18 min read
Thanh-Giang Tan Nguyen
Founder at RIVER

Whether your lab uses bash scripts, Python workflows, Snakemake pipelines, or custom solutions鈥攜our in-house pipeline works fine locally. It's been running for years. But as your research scales, you face a hard truth: in-house pipelines don't scale, aren't reproducible across teams, and require constant manual fixes.

Containers in Bioinformatics: Community Tooling and Efficient Docker Building

21 min read
Thanh-Giang Tan Nguyen
Founder at RIVER

Docker containers are revolutionizing bioinformatics by automating reproducibility and portability across platforms. But what problems can they actually solve? This post shows real-world applications of containers in bioinformatics workflows, then guides you through the simplest possible ways to use, build and debug them.

Containers in Bioinformatics: Community Tooling and Efficient Docker Building

21 min read
Thanh-Giang Tan Nguyen
Founder at RIVER

Docker containers are revolutionizing bioinformatics by automating reproducibility and portability across platforms. But what problems can they actually solve? This post shows real-world applications of containers in bioinformatics workflows, then guides you through the simplest possible ways to use, build and debug them.

Bioinformatics Cost Optimization For Input Using Nextflow (Part 2)

18 min read
Thanh-Giang Tan Nguyen
Founder at RIVER

Amazon S3 (Simple Storage Service) is built around the concept of storing files as objects, where each file is identified by a unique key rather than a traditional file system path. While this architecture offers scalability and flexibility for storage, it can present challenges when used as a standard file system, especially in bioinformatics workflows. When running Nextflow with S3 as the input/output backend, there are trade-offs to consider鈥攑articularly when dealing with large numbers of small files. In such cases, Nextflow may spend significant time handling downloads and uploads via the AWS CLI v2, which can impact overall workflow performance.On this blog post, we will start with downloading input first. Let鈥檚 explore this in more detail.

Bioinformatics Cost Optimization for Computing Resources Using Nextflow (Part 1)

13 min read
Thanh-Giang Tan Nguyen
Founder at RIVER

Many bioinformatics tools provide options to adjust the number of threads or CPU cores, which can reduce execution time with a modest increase in resource cost. But does doubling computational resources always result in processes running twice as fast? In practice, the speed-up is often less than linear, and each tool behaves differently.