Containers in Bioinformatics: Community Tooling and Efficient Docker Building

February 9, 2026 · 21 min read

Founder at RIVER

Docker containers are revolutionizing bioinformatics by automating reproducibility and portability across platforms. But what problems can they actually solve? This post shows real-world applications of containers in bioinformatics workflows, then guides you through the simplest possible ways to use, build and debug them.

Prerequisites: Get Docker Installed and Running

Before reading this blog, you need Docker installed on your system. This blog assumes you already have Docker up and running and covers how to effectively use containers for bioinformatics work.

Start Here First

If Docker is not yet installed on your machine, visit the official Docker documentation:

📦 https://docs.docker.com/get-started/get-docker/

This guide will walk you through:

Installing Docker Desktop (Mac, Windows, Linux)
Verifying your installation with docker --version
Running your first container with docker run hello-world
Understanding Docker basics (images, containers, registries)

What You Need to Know

After installing Docker, verify it's working:

# Check Docker version
docker --version
# Output: Docker version 24.0.0, build 12345abc

# Run a test container
docker run hello-world
# Output: Hello from Docker! This message shows that your installation appears to be working correctly.

About This Blog

Once Docker is installed and working, this blog will teach you:

Why manual installation fails and how containers solve reproducibility problems
How to use Docker containers for bioinformatics data analysis
How to build efficient Docker images with best practices
How to build containers with Micromamba and Pixi for bioinformatics workflows
How to find pre-built containers before building from scratch

Prerequisites Checklist

Before continuing, verify you have:

✅ Docker installed (docker --version works)
✅ Docker daemon running (or Docker Desktop active)
✅ Can pull images (docker pull hello-world works)
✅ Basic familiarity with command line/terminal
✅ (Optional but helpful) Nextflow installed for pipeline examples

Not ready? Go to https://docs.docker.com/get-started/get-docker/ first, then come back here!

1. Why Manual Installation Fails: The Old Way vs. Modern Containers

Traditionally, bioinformatics labs installed software on shared servers or laptops using commands like:

sudo apt-get install samtools bwa python3

While this seems easy, it creates major reproducibility, portability, and maintenance headaches:

Manual Steps: Each researcher (or lab) performs installs slightly differently. Written instructions, scripts, and troubleshooting steps are often lost, mixed, or outdated.
Not Reproducible: No record of what was installed, which versions, or the exact order. Over time, software updates, conflicts, and broken dependencies accumulate, leading to the "works for me" problem.
Hard to Debug: Version mismatches, library conflicts, and subtle errors are almost impossible to diagnose or fix once the environment drifts.
Not Portable: Scripts break on macOS, Windows, or other Linux variants. HPC systems often restrict sudo/root access, making traditional installs impossible.

Result: Collaboration across labs, institutions, or countries becomes fragile. Reproducing results—even on your own machine—can be a challenge!

The Modern Solution: Containers and Community Tooling

Today, bioinformatics has largely replaced manual installs with containers and environment managers:

BioConda: Over 8000 bioinformatics packages can be installed reproducibly via Conda.
BioContainers: Automatically generates Docker containers for each tool, available at quay.io/biocontainers.
```
docker pull quay.io/biocontainers/samtools:1.15.1--h1170115_0
```
Seqera/Nextflow/nf-core: Pipelines with environment definitions (Conda, Docker, Singularity) are shared and run identically everywhere.

Bottom line: Containers and Conda have solved the nightmare of manual installs, making reproducible science and collaboration routine in bioinformatics.

2. Using Docker Containers for Bioinformatics Data Analysis

One of the biggest advantages of Docker is the ability to run bioinformatics tools directly—without any installation—on your own datasets. Here’s a practical example using FastQC, a widely used quality control tool for sequencing data.

Example: Run FastQC from a Container

Pull the FastQC container (only needed once):

docker pull quay.io/biocontainers/fastqc:<version>
# Replace <version> with the specific tag, e.g. 0.11.9--hdfd78af_1

Run FastQC, mounting your data folder:
```
docker run --rm -v "$PWD:/data" quay.io/biocontainers/fastqc:<version> fastqc /data/my_reads.fastq
```
- --rm: Automatically remove the container after it exits.
- -v "$PWD:/data": Mount your current directory (with your FASTQ file) into the container at /data.
- The container runs the FastQC executable, analyzing the file and writing results back to your folder.

Key Benefits:

No software installation or dependency headaches.
Exact version and environment guaranteed.
Can share or rerun analysis anywhere Docker is available (Linux, Mac, Windows, cloud, HPC).

This approach works for thousands of bioinformatics containers—swap FastQC for Samtools, BWA, STAR, or other tools and run your analysis fully reproducibly and portable.

Running Containers Interactively

Sometimes you want to run tools or explore interactively inside the container, just as if you were launching a shell. Docker makes this easy with the -it flags:

-i keeps STDIN open (interactive input)
-t allocates a pseudo-terminal (TTY)

Example: Start an interactive shell inside the FastQC container:

docker run --rm -it -v "$PWD:/data" quay.io/biocontainers/fastqc:<version> /bin/bash

Now you’re inside the container shell:

fastqc /data/my_reads.fastq
ls /data

Benefits:

Inspect files, experiment with commands, or troubleshoot in real-time
Explore the container environment (dependencies, installed tools)
Still fully reproducible every time you launch

Debugging and Inspecting a Running Docker Container

If your container is already running (perhaps as part of a pipeline or a background process), Docker lets you connect and debug interactively using docker exec:

List running containers:
```
docker ps
```
Note the CONTAINER ID in the output.
Start a bash shell inside the running container:
```
docker exec -it <CONTAINER_ID> /bin/bash
```
Now you can inspect files, check logs, and run additional commands inside the live container.

ENTRYPOINT and Overriding Commands

Many containers run a specific tool as their default ENTRYPOINT—for example, FastQC containers launch FastQC automatically. However, you can temporarily override ENTRYPOINT and start a shell using:

docker run --rm -it --entrypoint /bin/bash quay.io/biocontainers/fastqc:<version>

This starts an interactive shell even if the container’s Dockerfile specifies a different default command. Useful for debugging, troubleshooting, or exploring the container environment.

Summary:

Use docker exec to debug running containers
Override ENTRYPOINT for interactive shells
Enables real-time troubleshooting and inspection, making containers even more flexible for science

Running Containers with Nextflow: Inspecting the Actual Docker Commands

When you run a Nextflow pipeline with Docker containers, Nextflow automatically generates and executes Docker commands for each task. Understanding these commands is useful for debugging, profiling, and understanding exactly what your pipeline is doing.

Where Nextflow Stores Docker Commands

Nextflow creates a work directory structure where each task has its own folder containing:

.command.sh - The shell script to execute
.command.run - The wrapper script that Nextflow uses to run the task
.command.trace - Task execution trace
.exitcode - Exit code of the task
Output files from the task

Directory structure:

work/
├── ab/
│   ├── abc123def456.../
│   │   ├── .command.sh
│   │   ├── .command.run
│   │   ├── .command.trace
│   │   └── ... (output files)
│   └── def789ghi012.../
│       └── ...
└── fe/
    └── 949a78b06c5fd8284aeb2f87d14f7d/
        ├── .command.sh
        ├── .command.run
        └── ... (task outputs)

Viewing the Actual Docker Command

To see the exact Docker command that Nextflow executed for a specific task:

# 1. Find your task's work directory
# For example, in the pipeline output you might see:
#   vep (ensembl-vep:115.2) [fe/949a78b...]

# 2. Navigate to that directory
cd work/fe/949a78b06c5fd8284aeb2f87d14f7d/

# 3. View the command that was executed
cat .command.run

Example: Nextflow Docker Command for VEP (Variant Effect Predictor)

Here's a complete example of how Nextflow runs a Docker container:

docker rm $NXF_BOXID &>/dev/null || true
docker stop $NXF_BOXID

docker run -i \
  --cpu-shares 8192 \
  --memory 14336m \
  -e "NXF_TASK_WORKDIR" \
  -e "NXF_DEBUG=${NXF_DEBUG:=0}" \
  -v <your project working directory>:<your project working directory> \
  -w "$NXF_TASK_WORKDIR" \
  -u $(id -u):$(id -g) \
  --name $NXF_BOXID \
  community.wave.seqera.io/library/ensembl-vep:115.2--90ec797ecb088e9a \
  /bin/bash <your project working directory>/work/fe/949a78b06c5fd8284aeb2f87d14f7d/.command.run nxf_trace

Breaking Down the Docker Command

Let's understand each component:

Component	Purpose
`docker run`	Start a new container
`-i`	Keep STDIN open (interactive)
`--cpu-shares 8192`	Allocate CPU resources (set by Nextflow from cpus directive)
`--memory 14336m`	Limit memory to 14GB (set by Nextflow from memory directive)
`-e NXF_TASK_WORKDIR`	Pass Nextflow task work directory env var
`-e NXF_DEBUG=${NXF_DEBUG:=0}`	Pass debug flag
`-v /project:/project`	Mount project directory for input/output
`-w "$NXF_TASK_WORKDIR"`	Set working directory inside container
`-u $(id -u):$(id -g)`	Run as current user (file permissions)
`--name $NXF_BOXID`	Name the container (cleanup reference)
`image:tag`	Container image to use
`/bin/bash .command.run`	Execute the task script
`nxf_trace`	Nextflow tracing flag

Re-Running a Failed Task Manually

If a task fails in Nextflow, you can manually re-run it for debugging:

# 1. Navigate to the task work directory
cd work/fe/949a78b06c5fd8284aeb2f87d14f7d/

# 2. View the task script
cat .command.sh

# 3. View Nextflow's Docker wrapper
cat .command.run

# 4. Manually run the Docker command (copy from .command.run output)
# and inspect what went wrong
docker run -i \
  --cpu-shares 8192 \
  --memory 14336m \
  -e "NXF_TASK_WORKDIR" \
  -v <your project working directory>:<your project working directory> \
  -w "$NXF_TASK_WORKDIR" \
  -u $(id -u):$(id -g) \
  community.wave.seqera.io/library/ensembl-vep:115.2--90ec797ecb088e9a \
  /bin/bash <your project working directory>/work/fe/949a78b06c5fd8284aeb2f87d14f7d/.command.run

# 5. Check the exit code
cat .exitcode

# 6. View any error logs
cat .command.err  # (if exists)
cat .command.out  # (if exists)

# 7. If you run on HPC, (e.g SLURM ), run submit job
sbatch .command.run

# 8. You can debug manually by running container interactively
docker run -it -v $PWD:$PWD -w $PWD <your container> 

# 9. Modify .command.sh then inside your container, start to debug by running command directly
bash .command.sh

Key Nextflow Docker Features

Feature	How Nextflow Uses It
Resource limits	`--cpus` and `--memory` directives → `--cpu-shares` and `--memory` flags
Volume mounts	Pipeline inputs/outputs → `-v` flags
User permissions	Current user → `-u $(id -u):$(id -g)`
Work directory	Task-specific directory → `-w "$NXF_TASK_WORKDIR"`
Container cleanup	Automatic removal → `docker rm` commands
Environment variables	Pipeline configuration → `-e` flags

Profiling and Optimization

By examining the actual Docker commands, you can:

Check resource allocation:

grep "cpu-shares\|memory" work/*/*)/.command.run

Verify mounted volumes:
```
grep "\-v" work/ab/abc123*/.command.run
```

See container images being used:

grep "docker run" work/*/*)/.command.run | grep -oE "image:[^ ]+"

Profile task execution:

# Check how long each task took
for dir in work/*/*/*/.command.trace; do
    echo "Task: $(dirname $dir)"
    cat "$dir"
done

Summary

Nextflow automatically generates Docker commands for each task
Task commands are stored in work/<task-id>/.command.run
You can inspect and manually re-run failed tasks
Understanding these commands helps with debugging and optimization
Resource limits from Nextflow directives are converted to Docker flags
Each task runs in isolation with its own work directory

3. Efficient Docker Container Building for Bioinformatics

Building efficient Docker containers keeps your images fast to download, runs, and easy to share.

Best Practices

Start with a lightweight, official base image
- Alpine, Debian-slim, Ubuntu LTS
Install only required tools
- Don't add extra utilities or unnecessary packages
Clean cache and remove temp files
- In the same RUN step: rm -rf /var/lib/apt/lists/*
Use multi-stage builds for large or compiled tools
- Build from source in a builder stage, copy binaries into a slim final image

Minimal Example: Install a Tool

FROM ubuntu:24.04
RUN apt-get update && \
    apt-get install -y --no-install-recommends samtools && \
    apt-get clean && rm -rf /var/lib/apt/lists/*
WORKDIR /data
CMD ["/bin/bash"]

Multi-Stage Build Example: Build from Source

FROM ubuntu:24.04 AS builder
RUN apt-get update && apt-get install -y build-essential wget
RUN wget https://github.com/samtools/samtools/releases/download/1.18/samtools-1.18.tar.bz2 && \
    tar -xjf samtools-1.18.tar.bz2 && \
    cd samtools-1.18 && \
    ./configure --prefix=/usr/local && make && make install

FROM ubuntu:24.04
RUN apt-get update && apt-get install -y --no-install-recommends libbz2-1.0 liblzma5 zlib1g
COPY --from=builder /usr/local/bin/samtools /usr/local/bin/
WORKDIR /data
CMD ["/bin/bash"]

Further Optimizations

Sort package lists alphabetically
Use .dockerignore to reduce build context size
Pin exact tool/package versions for reproducibility

4. Building Docker Images with Conda and Micromamba

Building Docker containers from Conda packages is one of the most efficient ways to create reproducible bioinformatics environments. Micromamba is a lightweight, faster alternative to Conda that's perfect for containers.

Why Micromamba for Docker Images?

Micromamba offers significant advantages over traditional Conda:

Fast: 10-100x faster dependency resolution than Conda
Lightweight: Minimal footprint, perfect for containerized deployments
Reproducible: Environment lock files guarantee exact versions
Production-ready: Used extensively in bioinformatics pipelines
Small base image: Official mambaorg/micromamba image is only ~150MB

The Modern Approach: Conda.yml + Micromamba

Instead of writing complex RUN commands with inline package specifications, use a conda.yml environment file:

conda.yml:

name: base
channels:
  - conda-forge
  - bioconda
dependencies:
  - bioconda::fastqc=0.12.1
  - bioconda::samtools=1.18
  - bioconda::bwa=0.7.17
  - conda-forge::procps-ng

Dockerfile:

FROM mambaorg/micromamba:1.5.10-noble

# Copy environment file
COPY --chown=$MAMBA_USER:$MAMBA_USER conda.yml /tmp/conda.yml

# Install packages from environment file
RUN micromamba install -y -n base -f /tmp/conda.yml \
    && micromamba clean -a -y

# Export explicit lock file for reproducibility
RUN micromamba env export --name base --explicit > environment.lock \
    && echo ">> CONDA_LOCK_START" \
    && cat environment.lock \
    && echo "<< CONDA_LOCK_END"

# Set PATH
USER root
ENV PATH="$MAMBA_ROOT_PREFIX/bin:$PATH"

WORKDIR /data
CMD ["/bin/bash"]

Understanding the Dockerfile

Let's break down each component:

1. Base Image:

FROM mambaorg/micromamba:1.5.10-noble

mambaorg/micromamba includes Micromamba pre-installed
1.5.10 is the version (pin for reproducibility)
noble is Ubuntu 24.04 LTS variant
Much lighter than installing Conda from scratch

2. Copy Environment File:

COPY --chown=$MAMBA_USER:$MAMBA_USER conda.yml /tmp/conda.yml

Copies your conda.yml to the container
$MAMBA_USER is the non-root user (set by base image)
Preserves proper file ownership

3. Install with Micromamba:

RUN micromamba install -y -n base -f /tmp/conda.yml \
    && micromamba clean -a -y

-y: Auto-confirm installation
-n base: Install into the base environment
-f /tmp/conda.yml: Use environment file
micromamba clean -a -y: Remove cached packages (reduces image size by ~50%)

4. Generate Lock File (for reproducibility):

RUN micromamba env export --name base --explicit > environment.lock \
    && echo ">> CONDA_LOCK_START" \
    && cat environment.lock \
    && echo "<< CONDA_LOCK_END"

Exports explicit lock file with exact pinned versions and URLs
Echo statements make lock file visible in build logs
This file can be saved and used for exact reproducibility later

5. Set PATH and User:

USER root
ENV PATH="$MAMBA_ROOT_PREFIX/bin:$PATH"

Ensures Micromamba binaries are in PATH
USER root required for some system monitoring (see procps-ng below)

Why procps-ng is Important for Nextflow

Notice conda-forge::procps-ng in the dependencies. Here's why:

dependencies:
  - conda-forge::procps-ng

Nextflow uses procps-ng tools (like ps, top) to monitor:

CPU usage
Memory consumption
Process status

Without it, Nextflow cannot properly track task resource usage. Always include it in bioinformatics containers.

Complete Multi-Tool Example

Here's a full example with multiple bioinformatics tools:

conda.yml:

name: base
channels:
  - conda-forge
  - bioconda
dependencies:
  - bioconda::fastqc=0.12.1
  - bioconda::samtools=1.18
  - bioconda::bwa=0.7.17
  - bioconda::bcftools=1.18
  - bioconda::picard=3.0.0
  - conda-forge::procps-ng
  - conda-forge::parallel

Dockerfile:

FROM mambaorg/micromamba:1.5.10-noble

COPY --chown=$MAMBA_USER:$MAMBA_USER conda.yml /tmp/conda.yml

RUN micromamba install -y -n base -f /tmp/conda.yml \
    && micromamba install -y -n base conda-forge::procps-ng \
    && micromamba env export --name base --explicit > environment.lock \
    && echo ">> CONDA_LOCK_START" \
    && cat environment.lock \
    && echo "<< CONDA_LOCK_END" \
    && micromamba clean -a -y

USER root
ENV PATH="$MAMBA_ROOT_PREFIX/bin:$PATH"

WORKDIR /data
CMD ["/bin/bash"]

Building and Using Your Container

1. Build the image:

docker build -t myregistry.com/bioinformatics/biotools:1.0 .

2. Test it locally:

docker run --rm -v "$PWD:/data" myregistry.com/bioinformatics/biotools:1.0 \
    fastqc /data/sample.fastq

3. Push to registry:

docker push myregistry.com/bioinformatics/biotools:1.0

4. Use in Nextflow:

process FASTQC {
    tag "${meta.id}"
    label 'process_medium'

    conda "${moduleDir}/environment.yml"
    container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
        'https://depot.galaxyproject.org/singularity/fastqc:0.12.1--hdfd78af_0' :
        'biocontainers/fastqc:0.12.1--hdfd78af_0' }"

    input:
    tuple val(meta), path(reads)

    output:
    tuple val(meta), path("*.html"), emit: html
    tuple val(meta), path("*.zip") , emit: zip
    path  "versions.yml"           , emit: versions

    when:
    task.ext.when == null || task.ext.when

    script:
    def args          = task.ext.args ?: ''
    def prefix        = task.ext.prefix ?: "${meta.id}"
    // Make list of old name and new name pairs to use for renaming in the bash while loop
    def old_new_pairs = reads instanceof Path || reads.size() == 1 ? [[ reads, "${prefix}.${reads.extension}" ]] : reads.withIndex().collect { entry, index -> [ entry, "${prefix}_${index + 1}.${entry.extension}" ] }
    def rename_to     = old_new_pairs*.join(' ').join(' ')
    def renamed_files = old_new_pairs.collect{ _old_name, new_name -> new_name }.join(' ')

    // The total amount of allocated RAM by FastQC is equal to the number of threads defined (--threads) time the amount of RAM defined (--memory)
    // https://github.com/s-andrews/FastQC/blob/1faeea0412093224d7f6a07f777fad60a5650795/fastqc#L211-L222
    // Dividing the task.memory by task.cpu allows to stick to requested amount of RAM in the label
    def memory_in_mb = task.memory ? task.memory.toUnit('MB') / task.cpus : null
    // FastQC memory value allowed range (100 - 10000)
    def fastqc_memory = memory_in_mb > 10000 ? 10000 : (memory_in_mb < 100 ? 100 : memory_in_mb)

    """
    printf "%s %s\\n" ${rename_to} | while read old_name new_name; do
        [ -f "\${new_name}" ] || ln -s \$old_name \$new_name
    done

    fastqc \\
        ${args} \\
        --threads ${task.cpus} \\
        --memory ${fastqc_memory} \\
        ${renamed_files}

    cat <<-END_VERSIONS > versions.yml
    "${task.process}":
        fastqc: \$( fastqc --version | sed '/FastQC v/!d; s/.*v//' )
    END_VERSIONS
    """

    stub:
    def prefix = task.ext.prefix ?: "${meta.id}"
    """
    touch ${prefix}.html
    touch ${prefix}.zip

    cat <<-END_VERSIONS > versions.yml
    "${task.process}":
        fastqc: \$( fastqc --version | sed '/FastQC v/!d; s/.*v//' )
    END_VERSIONS
    """
}

Advantages of This Approach

Feature	conda.yml + Micromamba	Traditional Conda Dockerfile
Build speed	Very fast (10-100x)	Slow
Image size	~200MB for tools	~1-2GB
Reproducibility	Explicit lock file	Dependency drift possible
Version pinning	Easy in YAML	Inline in Dockerfile
Readability	Clear, readable YAML	Complex RUN statements
Channel support	Multiple channels	Limited
Maintenance	Update conda.yml easily	Rebuild entire Dockerfile

The generated environment.lock file is extremely valuable. Save it in your project:

# Extract from build output and save
docker build . 2>&1 | grep -A 1000 "CONDA_LOCK_START" | grep -B 1000 "CONDA_LOCK_END" > environment.lock

Later, reproduce the exact environment:

# Rebuild from locked versions
micromamba install -y -n base --file environment.lock

Pixi: The Next Generation Tool for Conda-Based Containers

Pixi is an emerging tool that combines the best aspects of Conda, Mamba, and lock file management into a single, ultra-fast solution. It's gaining traction in bioinformatics for building reproducible containers.

Why Pixi?

Fastest: Even faster than Micromamba for dependency resolution
Automatic lock files: Lock files are created automatically, no manual export needed
Single-step reproducibility: One tool handles everything from installation to reproducibility
Cross-platform: Works seamlessly across different architectures and systems
Modern approach: Designed from scratch with containerization in mind

Example Dockerfile with Pixi:

ARG PIXI_VERSION=0.63.2
ARG BASE_IMAGE=debian:bookworm-slim

FROM --platform=linux/amd64 ubuntu:24.04 AS builder
ARG PIXI_VERSION
RUN apt-get update && apt-get install -y curl
RUN curl -Ls \
    "https://github.com/prefix-dev/pixi/releases/download/v${PIXI_VERSION}/pixi-$(uname -m)-unknown-linux-musl" \
    -o /pixi && chmod +x /pixi
RUN /pixi --version

FROM --platform=linux/amd64 $BASE_IMAGE

# Add appuser with UID 1000
RUN useradd -u 1000 -m -s /bin/bash appuser
USER appuser
WORKDIR /home/appuser

# Copy pixi from builder
COPY --from=builder --chown=appuser:appuser --chmod=0555 /pixi /usr/local/bin/pixi

# Set up pixi with conda channels
ENV PATH="/home/appuser/.pixi/bin:${PATH}"
RUN pixi config set default-channels '["conda-forge", "bioconda"]' --global

# Install tools using pixi
RUN pixi global install fastqc=0.12.1 samtools=1.18 bwa=0.7.17

# Clean cache
RUN pixi clean cache -y

WORKDIR /data
CMD ["/bin/bash"]

Key Pixi Features in the Dockerfile:

Multi-stage build: Builder stage compiles Pixi, final stage is lean
User management: Non-root appuser for security
Global channel setup: Pre-configures conda-forge and bioconda channels
Direct package installation: pixi global install installs and locks versions automatically
Automatic lock file: Lock file is generated implicitly, ensuring reproducibility
Cache cleaning: pixi clean cache -y removes temporary files

Comparing Container Tools:

Feature	Conda	Micromamba	Pixi
Speed	Slow (baseline)	10-100x faster	100x+ faster
Lock file	Manual export	Manual export	Automatic
Image size	Large	Small	Small
Reproducibility	Good	Excellent	Excellent
Complexity	High	Medium	Low
Learning curve	Steep	Medium	Low
Production-ready	Yes	Yes	Emerging

Summary:

Use conda.yml with Micromamba for established, proven workflows
Use Pixi for cutting-edge projects requiring maximum speed and simplicity
Always include procps-ng for Nextflow compatibility (regardless of tool)
Generate/maintain explicit lock files for reproducibility
Pin exact versions to prevent dependency drift
Multi-channel support (conda-forge + bioconda) handles complex dependencies

5. Finding Pre-Built Containers: Before Building from Scratch

Before you spend time building Docker images from scratch, check if your tool already exists in a public container registry. Most bioinformatics tools have pre-built, maintained containers ready to use. This is the fastest and easiest approach when available.

The Container Search Strategy

Always follow this order:

Check BioContainers (https://biocontainers.pro/registry) - Most comprehensive bioinformatics registry
Check Seqera Containers (https://seqera.io/containers/) - Multi-tool combinations, modern builds
Build yourself - Only if tool doesn't exist

1. BioContainers (Primary Source)

BioContainers automatically builds and maintains Docker containers for every bioconda package. This is the most comprehensive source for bioinformatics tools.

URL: https://biocontainers.pro/registry

How to find and use a tool:

# Search for FastQC
docker search quay.io/biocontainers/fastqc

# Pull a specific version
docker pull quay.io/biocontainers/fastqc:0.12.1--hdfd78af_1

# Run the tool
docker run --rm -v "$PWD:/data" quay.io/biocontainers/fastqc:0.12.1--hdfd78af_1 \
    fastqc /data/sample.fastq

biocontainer

Advantages:

✅ Automatically updated when bioconda packages update
✅ Single-tool containers (clean, minimal)
✅ Stable and widely tested
✅ Available for most bioinformatics tools (~10,000+)
✅ Free and publicly accessible

Limitations:

One tool per image (no multi-tool combinations)
Must manually orchestrate multiple containers for pipelines

Example with multiple tools in a workflow:

# Align reads with BWA
docker run --rm -v "$PWD:/data" quay.io/biocontainers/bwa:0.7.17--h5bf99c6_8 \
    bwa mem /data/ref.fa /data/reads.fq > /data/aligned.sam

# Sort BAM with Samtools
docker run --rm -v "$PWD:/data" quay.io/biocontainers/samtools:1.18--hd87286a_0 \
    samtools sort /data/aligned.sam > /data/aligned.bam

2. Seqera Containers (Also support Multi-Tool Option)

Seqera Containers provides a registry of pre-built containers that often combine multiple tools, perfect for complex workflows.

URL: https://seqera.io/containers/

How to find and use containers:

Visit https://seqera.io/containers/
Search for your tool(s)
Add multiple tools if needed (e.g., fastqc + samtools together)
Select your architecture (linux/amd64 or linux/arm64)
Get the container reference

Example:

# Pull a multi-tool container
docker pull wave.seqera.io/library/fastqc:0.12.1-samtools_1.18

# Or use in Nextflow
process analyzeReads {
    container 'wave.seqera.io/library/fastqc:0.12.1-samtools_1.18'
    script:
    '''
    fastqc input.fastq
    samtools view input.bam
    '''
}

seqera_container Advantages:

✅ Combines multiple tools in one image
✅ Modern infrastructure
✅ Web interface for easy discovery
✅ Automatic lock file generation
✅ Multi-architecture support

Limitations:

Fewer pre-built combinations than BioContainers
Relies on Seqera infrastructure

3. When to Build from Scratch

Only build your own container if:

Tool doesn't exist in any public registry
Version not available and you need an older/newer version
Custom modifications required (additional dependencies, scripts)
Performance optimizations needed for specific hardware

Example scenario:

# Check if tool exists
docker pull quay.io/biocontainers/myraretool:latest  # ❌ Not found

# Check Seqera Containers
# Visit https://seqera.io/containers/ - (tool not found)

# → NOW you build it yourself using Section 4 approach
docker build -t myregistry.com/myraretool:1.0 .
docker push myregistry.com/myraretool:1.0

Quick Reference: Where to Find Each Tool

Tool Type	Primary Location	Alternative	Fallback
Common bioinformatics (FastQC, SAMtools, BWA, etc.)	https://biocontainers.pro/registry	https://seqera.io/containers/	Build yourself
R/Bioconductor packages	https://biocontainers.pro/registry	https://seqera.io/containers/	Build yourself
Python bioinformatics	https://biocontainers.pro/registry	https://seqera.io/containers/	Build yourself
Specialized tools	https://biocontainers.pro/registry first	GitHub releases	Build yourself

Practical Workflow: Search → Pull → Use

#!/bin/bash
# 1. Search for tool in BioContainers
TOOL="fastqc"
VERSION="0.12.1"
echo "Searching for $TOOL:$VERSION in BioContainers..."
docker pull quay.io/biocontainers/$TOOL:$VERSION--* 2>/dev/null && \
    echo "✓ Found in BioContainers!" && \
    exit 0

# 2. Try Seqera Containers
echo "Trying Seqera Containers..."
# (Check https://seqera.io/containers/ manually or via API)

# 3. Build yourself
echo "✗ Not found in public registries. Building from scratch..."
# Use Section 4 approach with Micromamba or Pixi
docker build -t myregistry.com/$TOOL:$VERSION .

Best Practices for Using Pre-Built Containers

Pin exact versions:

✓ Good: docker pull quay.io/biocontainers/samtools:1.18--hd87286a_0
✗ Bad:  docker pull quay.io/biocontainers/samtools:latest

Check build hash in Nextflow:

process myTask {
    // Include digest for reproducibility
    container 'quay.io/biocontainers/samtools:1.18--hd87286a_0@sha256:abc123...'
    script: 'samtools view input.bam'
}

Keep local cache for offline work:

docker pull quay.io/biocontainers/samtools:1.18--hd87286a_0
docker tag quay.io/biocontainers/samtools:1.18--hd87286a_0 \
    myregistry.com/samtools:1.18
docker push myregistry.com/samtools:1.18

Summary: Container Registry Decision Tree

Do you need a bioinformatics tool in a container?
│
├─→ Search https://biocontainers.pro/registry (BioContainers)
│   ├─→ Found? Use it ✓
│   └─→ Not found?
│       │
│       ├─→ Search https://seqera.io/containers/ (Seqera Containers)
│       │   ├─→ Found? Use it ✓
│       │   └─→ Not found?
│       │       │
│       │       └─→ Build using Section 4
│       │           (Micromamba or Pixi) ✓

Key Takeaway:

Check existing registries first - saves time and uses community-maintained containers
BioContainers (https://biocontainers.pro/registry) is your first stop - most comprehensive bioinformatics registry
Seqera (https://seqera.io/containers/) for multi-tool combinations - modern approach for complex workflows
Build yourself only as last resort - but you now have the tools (Section 4) to do it well

Prerequisites: Get Docker Installed and Running​

Start Here First​

What You Need to Know​

About This Blog​

Prerequisites Checklist​

1. Why Manual Installation Fails: The Old Way vs. Modern Containers​

The Modern Solution: Containers and Community Tooling​

2. Using Docker Containers for Bioinformatics Data Analysis​

Example: Run FastQC from a Container​

Running Containers Interactively​

Debugging and Inspecting a Running Docker Container​

ENTRYPOINT and Overriding Commands​

Running Containers with Nextflow: Inspecting the Actual Docker Commands​

Where Nextflow Stores Docker Commands​

Viewing the Actual Docker Command​

Example: Nextflow Docker Command for VEP (Variant Effect Predictor)​

Breaking Down the Docker Command​

Re-Running a Failed Task Manually​

Key Nextflow Docker Features​

Profiling and Optimization​

Summary​

3. Efficient Docker Container Building for Bioinformatics​

Best Practices​

Minimal Example: Install a Tool​

Multi-Stage Build Example: Build from Source​

Further Optimizations​

4. Building Docker Images with Conda and Micromamba​

Why Micromamba for Docker Images?​

The Modern Approach: Conda.yml + Micromamba​

Understanding the Dockerfile​

Why procps-ng is Important for Nextflow​

Complete Multi-Tool Example​

Building and Using Your Container​

Advantages of This Approach​

Saving and Sharing Explicit Lock Files​

Pixi: The Next Generation Tool for Conda-Based Containers​

5. Finding Pre-Built Containers: Before Building from Scratch​

The Container Search Strategy​

1. BioContainers (Primary Source)​

2. Seqera Containers (Also support Multi-Tool Option)​

3. When to Build from Scratch​

Quick Reference: Where to Find Each Tool​

Practical Workflow: Search → Pull → Use​

Best Practices for Using Pre-Built Containers​

Summary: Container Registry Decision Tree​