Skip to main content

11 posts tagged with "hpc"

View All Tags

Variant Calling at Production Scale: HPC Deployment and Performance Optimization (Part 3)

· 20 min read
Thanh-Giang Tan Nguyen
Founder at RIVER

In Part 1, we built a solid bash baseline. In Part 2, we migrated to Nextflow with MD5 validation. Now it's time to deploy on HPC clusters with SLURM and optimize for production scale: configure executors for small clusters, tune resources per tool, replace bottleneck steps with faster alternatives (fastp + Spark-GATK), and demonstrate scaling from 1 to 100 samples. This practical guide will help you run your variant calling pipeline efficiently on real HPC infrastructure.

Setting Up a Local Nextflow Training Environment with Code-Server and HPC

· 9 min read
Thanh-Giang Tan Nguyen
Founder at RIVER

Setting up a robust development environment for Nextflow training across local and HPC systems requires a unified solution. Code-server provides a browser-based VS Code interface accessible from any machine, making it perfect for teams collaborating on Nextflow workflows. This guide walks you through configuring a complete Nextflow training environment with code-server, Singularity containers, and Pixi-managed tools.

For a comprehensive introduction to Pixi and package management, see our Pixi new-conda era.

Containers on HPC: From Docker to Singularity and Apptainer

· 9 min read
Thanh-Giang Tan Nguyen
Founder at RIVER

Container technologies have revolutionized software deployment and reproducibility in scientific computing. However, traditional Docker faces significant limitations in High-Performance Computing (HPC) environments. This post explores why Docker struggles on HPC systems and introduces modern alternatives like Docker rootless, Singularity, and Apptainer.

Bioinformatics Cost Optimization For Input Using Nextflow (Part 2)

· 18 min read
Thanh-Giang Tan Nguyen
Founder at RIVER

Amazon S3 (Simple Storage Service) is built around the concept of storing files as objects, where each file is identified by a unique key rather than a traditional file system path. While this architecture offers scalability and flexibility for storage, it can present challenges when used as a standard file system, especially in bioinformatics workflows. When running Nextflow with S3 as the input/output backend, there are trade-offs to consider—particularly when dealing with large numbers of small files. In such cases, Nextflow may spend significant time handling downloads and uploads via the AWS CLI v2, which can impact overall workflow performance.On this blog post, we will start with downloading input first. Let’s explore this in more detail.

Bioinformatics Cost Optimization for Computing Resources Using Nextflow (Part 1)

· 13 min read
Thanh-Giang Tan Nguyen
Founder at RIVER

Many bioinformatics tools provide options to adjust the number of threads or CPU cores, which can reduce execution time with a modest increase in resource cost. But does doubling computational resources always result in processes running twice as fast? In practice, the speed-up is often less than linear, and each tool behaves differently.

Building a Slurm HPC Cluster (Part 3) - Administration and Best Practices

· 13 min read
Thanh-Giang Tan Nguyen
Founder at RIVER

In Part 1 and Part 2, we built a complete Slurm HPC cluster from a single node to a production-ready multi-node system. Now let's learn how to manage, maintain, and secure it effectively.

This final post covers daily administration tasks, troubleshooting, security hardening, and integration with data processing frameworks.

Building a Slurm HPC Cluster (Part 1) - Single Node Setup and Fundamentals

· 8 min read
Thanh-Giang Tan Nguyen
Founder at RIVER

Building a High-Performance Computing (HPC) cluster can seem daunting, but with the right approach, you can create a robust system for managing computational workloads. This is Part 1 of a 3-part series where we'll build a complete Slurm cluster from scratch.

In this first post, we'll cover the fundamentals by setting up a single-node Slurm cluster and understanding the core concepts.

RIVER- A Web Application to Run Nf-Core

· 3 min read
Thanh-Giang Tan Nguyen
Founder at RIVER

Simply, I just want a web application. This is an all-in-one application similar to Google Drive, but more advanced. It allows me to select files and put them into a workflow obtained directly from GitHub. For example, rnaseq from nf-core. The web application allows me to run and monitor Nextflow jobs. Then, it puts the results back to Google Drive, where I can view the results. It is useful for sharing results with my team or external teams. I can develop pipelines independently. My data can be stored on the cloud for backup. It is a simple and standard procedure for bioinformatics analysis. But I have not found any solution that fits these requirements is EASY, FREE, OPEN-SOURCE

If you feel the same, RIVER may be your choice.