Skip to main content

6 posts tagged with "slurm"

View All Tags

Variant Calling at Production Scale: HPC Deployment and Performance Optimization (Part 3)

· 20 min read
Thanh-Giang Tan Nguyen
Founder at RIVER

In Part 1, we built a solid bash baseline. In Part 2, we migrated to Nextflow with MD5 validation. Now it's time to deploy on HPC clusters with SLURM and optimize for production scale: configure executors for small clusters, tune resources per tool, replace bottleneck steps with faster alternatives (fastp + Spark-GATK), and demonstrate scaling from 1 to 100 samples. This practical guide will help you run your variant calling pipeline efficiently on real HPC infrastructure.

Building a Slurm HPC Cluster (Part 3) - Administration and Best Practices

· 13 min read
Thanh-Giang Tan Nguyen
Founder at RIVER

In Part 1 and Part 2, we built a complete Slurm HPC cluster from a single node to a production-ready multi-node system. Now let's learn how to manage, maintain, and secure it effectively.

This final post covers daily administration tasks, troubleshooting, security hardening, and integration with data processing frameworks.

Building a Slurm HPC Cluster (Part 1) - Single Node Setup and Fundamentals

· 8 min read
Thanh-Giang Tan Nguyen
Founder at RIVER

Building a High-Performance Computing (HPC) cluster can seem daunting, but with the right approach, you can create a robust system for managing computational workloads. This is Part 1 of a 3-part series where we'll build a complete Slurm cluster from scratch.

In this first post, we'll cover the fundamentals by setting up a single-node Slurm cluster and understanding the core concepts.

RIVER- A Web Application to Run Nf-Core

· 3 min read
Thanh-Giang Tan Nguyen
Founder at RIVER

Simply, I just want a web application. This is an all-in-one application similar to Google Drive, but more advanced. It allows me to select files and put them into a workflow obtained directly from GitHub. For example, rnaseq from nf-core. The web application allows me to run and monitor Nextflow jobs. Then, it puts the results back to Google Drive, where I can view the results. It is useful for sharing results with my team or external teams. I can develop pipelines independently. My data can be stored on the cloud for backup. It is a simple and standard procedure for bioinformatics analysis. But I have not found any solution that fits these requirements is EASY, FREE, OPEN-SOURCE

If you feel the same, RIVER may be your choice.