Skip to main content

Skills

A curated collection of technical skills and tutorials for bioinformatics, organized by topic. All posts are hands-on guides with practical examples.

Version Control & Git

  1. The Evolution of Version Control - Git's Role in Reproducible Bioinformatics (Part 1)
  2. The Evolution of Version Control - CI/CD in bioinformatics (Part 2)
  3. Running GitHub Actions Locally with act: 5x Faster Development

Containers & Docker

  1. Containers in Bioinformatics: Community Tooling and Efficient Docker Building
  2. Containers on HPC: From Docker to Singularity and Apptainer
  3. Docker Out of Docker: Running Interactive Web Applications for Data Analysis

Package Management & Environment Setup

  1. Pixi - New conda era
  2. Upgrade Your Shell: From Bash to Zsh for a Better Terminal Experience
  3. Setting Up a Local Nextflow Training Environment with Code-Server and HPC

Nextflow & Workflow Management

  1. RIVER - A Web Application to Run Nf-Core
  2. How to Migrate from In-House Pipelines to Enterprise-Level Workflows: A Proven 3-Step Validation Framework
  3. Bioinformatics Cost Optimization for Computing Resources Using Nextflow (Part 1)
  4. Bioinformatics Cost Optimization For Input Using Nextflow (Part 2)

Variant Calling & NGS Pipelines

  1. Building a Reproducible GATK Variant Calling Bash Workflow with Pixi (Part 1) - Academic proof-of-concept implementation
  2. From Bash to Nextflow: GATK Best Practice With Nextflow (Part 2) - MD5 validation and scientific equivalence testing
  3. Variant Calling at Production Scale: HPC Deployment and Performance Optimization (Part 3) - SLURM, resource optimization, and scaling to 100+ samples

Slurm & HPC Clusters

  1. Building a Slurm HPC Cluster (Part 1) - Single Node Setup and Fundamentals
  2. Building a Slurm HPC Cluster (Part 2) - Scaling to Production with Ansible
  3. Building a Slurm HPC Cluster (Part 3) - Administration and Best Practices

CI/CD & Testing

  1. The Evolution of Version Control - CI/CD in bioinformatics (Part 2)
  2. Running GitHub Actions Locally with act: 5x Faster Development
  3. Bioinformatics Workflow Template: Standardizing Python Pipelines with Modular Design

Machine Learning & Data Analysis

  1. Introduction to AI/ML in Bioinformatics: Classification Models & Evaluation
  2. Machine Learning in Bioinformatics Part 1: Building KNN from Scratch

Data Management & Cloud Storage

  1. Working with Remote Files using bcftools and samtools (HTSlib) - S3, HTTP, and cloud file access

Performance Optimization

  1. Unix Pipes in Bioinformatics: How Streaming Data Reduces Memory and Storage
  2. Bioinformatics Cost Optimization for Computing Resources Using Nextflow (Part 1)
  3. Bioinformatics Cost Optimization For Input Using Nextflow (Part 2)