11 posts tagged with "hpc"

Variant Calling at Production Scale: HPC Deployment and Performance Optimization (Part 3)

February 23, 2026 · 20 min read

Founder at RIVER

In Part 1, we built a solid bash baseline. In Part 2, we migrated to Nextflow with MD5 validation. Now it's time to deploy on HPC clusters with SLURM and optimize for production scale: configure executors for small clusters, tune resources per tool, replace bottleneck steps with faster alternatives (fastp + Spark-GATK), and demonstrate scaling from 1 to 100 samples. This practical guide will help you run your variant calling pipeline efficiently on real HPC infrastructure.

Upgrade Your Shell: From Bash to Zsh for a Better Terminal Experience

February 19, 2026 · 9 min read

Thanh-Giang Tan Nguyen

Founder at RIVER

Tired of basic Bash prompts and repetitive typing? Discover how upgrading to Zsh (Z Shell) can transform your terminal experience with smart autocomplete, syntax highlighting, and beautiful themes. Install it without admin access using Pixi!

Setting Up a Local Nextflow Training Environment with Code-Server and HPC

February 15, 2026 · 9 min read

Thanh-Giang Tan Nguyen

Founder at RIVER

Setting up a robust development environment for Nextflow training across local and HPC systems requires a unified solution. Code-server provides a browser-based VS Code interface accessible from any machine, making it perfect for teams collaborating on Nextflow workflows. This guide walks you through configuring a complete Nextflow training environment with code-server, Singularity containers, and Pixi-managed tools.

For a comprehensive introduction to Pixi and package management, see our Pixi new-conda era.

Containers on HPC: From Docker to Singularity and Apptainer

February 11, 2026 · 9 min read

Thanh-Giang Tan Nguyen

Founder at RIVER

Container technologies have revolutionized software deployment and reproducibility in scientific computing. However, traditional Docker faces significant limitations in High-Performance Computing (HPC) environments. This post explores why Docker struggles on HPC systems and introduces modern alternatives like Docker rootless, Singularity, and Apptainer.

Bioinformatics Cost Optimization For Input Using Nextflow (Part 2)

January 19, 2026 · 18 min read

Thanh-Giang Tan Nguyen

Founder at RIVER

Amazon S3 (Simple Storage Service) is built around the concept of storing files as objects, where each file is identified by a unique key rather than a traditional file system path. While this architecture offers scalability and flexibility for storage, it can present challenges when used as a standard file system, especially in bioinformatics workflows. When running Nextflow with S3 as the input/output backend, there are trade-offs to consider—particularly when dealing with large numbers of small files. In such cases, Nextflow may spend significant time handling downloads and uploads via the AWS CLI v2, which can impact overall workflow performance.On this blog post, we will start with downloading input first. Let’s explore this in more detail.

Bioinformatics Cost Optimization for Computing Resources Using Nextflow (Part 1)

January 18, 2026 · 13 min read

Thanh-Giang Tan Nguyen

Founder at RIVER

Many bioinformatics tools provide options to adjust the number of threads or CPU cores, which can reduce execution time with a modest increase in resource cost. But does doubling computational resources always result in processes running twice as fast? In practice, the speed-up is often less than linear, and each tool behaves differently.

Pixi- New conda era

January 17, 2026 · 3 min read

Thanh-Giang Tan Nguyen

Founder at RIVER

Pixi is a fast, modern package management tool built on the conda ecosystem. Whether you're on a personal laptop or an HPC cluster, Pixi makes environment setup simple and reproducible. Here’s a primer on using Pixi. For a full workflow, see: Setting up single-cell RNA-seq analysis with Pixi as collaboration between RIVERXDATA and NGS101

Building a Slurm HPC Cluster (Part 3) - Administration and Best Practices

January 14, 2026 · 13 min read

Thanh-Giang Tan Nguyen

Founder at RIVER

In Part 1 and Part 2, we built a complete Slurm HPC cluster from a single node to a production-ready multi-node system. Now let's learn how to manage, maintain, and secure it effectively.

This final post covers daily administration tasks, troubleshooting, security hardening, and integration with data processing frameworks.

Building a Slurm HPC Cluster (Part 2) - Scaling to Production with Ansible

January 12, 2026 · 9 min read

Thanh-Giang Tan Nguyen

Founder at RIVER

In Part 1, we learned the fundamentals by building a single-node Slurm cluster. Now it's time to scale up to a production-ready, multi-node cluster with automated deployment, monitoring, and alerting.

In this post, we'll use Ansible to automate the entire deployment process, making it reproducible and maintainable.

Building a Slurm HPC Cluster (Part 1) - Single Node Setup and Fundamentals

January 9, 2026 · 8 min read

Thanh-Giang Tan Nguyen

Founder at RIVER

Building a High-Performance Computing (HPC) cluster can seem daunting, but with the right approach, you can create a robust system for managing computational workloads. This is Part 1 of a 3-part series where we'll build a complete Slurm cluster from scratch.

In this first post, we'll cover the fundamentals by setting up a single-node Slurm cluster and understanding the core concepts.

RIVER- A Web Application to Run Nf-Core

January 8, 2026 · 3 min read

Thanh-Giang Tan Nguyen

Founder at RIVER

Simply, I just want a web application. This is an all-in-one application similar to Google Drive, but more advanced. It allows me to select files and put them into a workflow obtained directly from GitHub. For example, rnaseq from nf-core. The web application allows me to run and monitor Nextflow jobs. Then, it puts the results back to Google Drive, where I can view the results. It is useful for sharing results with my team or external teams. I can develop pipelines independently. My data can be stored on the cloud for backup. It is a simple and standard procedure for bioinformatics analysis. But I have not found any solution that fits these requirements is EASY, FREE, OPEN-SOURCE

If you feel the same, RIVER may be your choice.