Skip to main content

2 posts tagged with "genomics"

View All Tags

Building a Reproducible GATK Variant Calling Bash Workflow with Pixi (Part 1)

· 30 min read
Thanh-Giang Tan Nguyen
Founder at RIVER

Before you can transform a bash workflow to Nextflow, you need a solid, reproducible bash baseline. This hands-on guide walks through building a complete 16-step GATK variant calling workflow using bash scripts and Pixi for environment management—following GATK best practices with GVCF mode and hard filtering. While this traditional approach works for academic research and proof-of-concept work, scaling to thousands of samples in industry requires the reproducibility and reliability of workflow managers like Nextflow, which we'll cover in Part 2.

Working with Remote Files using bcftools and samtools (HTSlib)

· 18 min read
Thanh-Giang Tan Nguyen
Founder at RIVER

HTSlib-based tools like bcftools and samtools provide powerful capabilities for working with genomic data stored on remote servers. Whether your data is in AWS S3, accessible via FTP, or hosted on HTTPS endpoints, these tools allow you to efficiently query and subset remote files without downloading entire datasets. This guide covers authentication, remote file access patterns, and practical workflows.