Skip to main content

Develop

How Jobs Execute on HPC and Monitoring

The platform leverages SSH protocol to submit jobs and query job states on your HPC cluster. It automates the entire job execution workflow with a streamlined process.

Job Execution Workflow

Core Process:

  1. Transfer: Job scripts are securely transferred to the remote HPC server
  2. Environment Setup: Automatically installs micromamba, nextflow, singularity, python and essential tools (skips if already installed)
  3. Tool Deployment: Clones tools from GitHub repositories and injects user-defined parameters into execution scripts
  4. Web Applications: Creates secure reverse proxy with authentication for accessing web applications that utilize extensive HPC resources
  5. Pipeline Execution: Executes nf-core pipelines directly through nextflow

Storage Configuration

It will create the params.json file and ingested to the running script

For nf-core workflows:

  • S3 credential for storage will be created automatically and handle I/O by nextflow (AWS CLI v2)

For non nf-core jobs:

  • S3 credential for storage will be used by Goofys for mounting as fuse filesystem via goofys for efficient input/output operations

Example Job Script

#!/bin/bash
#SBATCH --job-name="Thanh-Giang Tan Nguyen 11/04/2025 11:34:02 PM"
#SBATCH --cpus-per-task=1
#SBATCH --output=/home/river/.river/jobs/6c938f33-7c36-4132-b22d-b7f6287d1ef1/job.log
#SBATCH --mem=2G
#SBATCH --time=1:00:00

set -euo pipefail
cd /home/river/.river/jobs/6c938f33-7c36-4132-b22d-b7f6287d1ef1
# === Setup paths ===
RIVER_HOME="/home/river/.river"
BIN_DIR="$RIVER_HOME/bin"
GOOFYS_PATH="$BIN_DIR/goofys"

export PATH=$BIN_DIR:$PATH
export RIVER_HOME=$RIVER_HOME
export RIVER_HOME_TOOLS=$RIVER_HOME/bin
export SINGULARITY_CACHE_DIR=$RIVER_HOME/images/singularities/cache
export NXF_SINGULARITY_CACHEDIR=$RIVER_HOME/images/singularities/images
export NXF_VER=25.04.2
export job_id=6c938f33-7c36-4132-b22d-b7f6287d1ef1

mkdir -p $RIVER_HOME_TOOLS
mkdir -p $SINGULARITY_CACHE_DIR
mkdir -p $NXF_SINGULARITY_CACHEDIR

# === Install pixi ===
which pixi || curl -fsSL https://pixi.sh/install.sh | sh
export PATH=$PATH:$HOME/.pixi/bin
pixi config append default-channels bioconda --global
pixi config append default-channels conda-forge --global
pixi global install nextflow jq git singularity python=3.14

# === Install Goofys if missing ===
if [ ! -f "$GOOFYS_PATH" ]; then
echo "Installing goofys..."
curl -L https://github.com/kahing/goofys/releases/download/v0.24.0/goofys -o "$GOOFYS_PATH"
chmod +x "$GOOFYS_PATH"
else
echo "Goofys already exists at: $GOOFYS_PATH"
fi

# Setup port for quick tunneling
export PORT=$(python -c "import socket; s=socket.socket(); s.bind(('',0)); print(s.getsockname()[1]); s.close()")
echo $PORT > $RIVER_HOME/jobs/6c938f33-7c36-4132-b22d-b7f6287d1ef1/job.port
echo $(hostname) > $RIVER_HOME/jobs/6c938f33-7c36-4132-b22d-b7f6287d1ef1/job.host

# === Export user-defined Environment Variables ===
while IFS== read -r key value; do
export "$key=$value"
done < <(jq -r 'to_entries|map("\(.key)=\(.value|tostring)")|.[]' params.json)

# === Clone Repository ===
repo_name=$(basename -s .git "$git")
owner=$(basename "$(dirname "$git")")
local_dir="$RIVER_HOME/tools/$owner/$repo_name/$tag"

if [ "$owner" = "nf-core" ]; then
echo "Detected nf-core tool. Will run via Nextflow."
else
if [ ! -d "$local_dir" ]; then
echo "Cloning $git into $local_dir"
git clone --branch "$tag" --single-branch "$git" "$local_dir"
else
echo "Repository already cloned at $local_dir"
fi

ln -sf "$local_dir" "$RIVER_HOME/jobs/6c938f33-7c36-4132-b22d-b7f6287d1ef1/analysis"
fi

# === Setup Cloud Storage Mount ===
mount_point="$RIVER_HOME/jobs/6c938f33-7c36-4132-b22d-b7f6287d1ef1/workspace"
mkdir -p "$mount_point"

trap '{ umount "$mount_point" || echo "Warning: S3 bucket was not mounted." ; }' EXIT

# === Run Main Job Command ===
if [ "$owner" = "nf-core" ]; then
# AWS config
cat <<EOF > aws.config
aws {
client {
endpoint = "$AWS_ENDPOINT_URL"
s3PathStyleAccess = true
}
}
EOF
if [ -n "${profile:-}" ]; then
profiles="singularity,$profile"
else
profiles="singularity"
fi
nextflow run "$owner/$repo_name" \
-r "$tag" \
-c aws.config \
-c river.config \
-profile "$profiles" \
-process.executor slurm \
-process.shell 'bash' \
--outdir "s3://$bucket_name/$outdir/6c938f33-7c36-4132-b22d-b7f6287d1ef1" \
-with-report "s3://$bucket_name/$outdir/6c938f33-7c36-4132-b22d-b7f6287d1ef1/report.html" \
-resume
else
"$GOOFYS_PATH" \
--profile "$bucket_name" \
--file-mode=0700 \
--dir-mode=0700 \
--endpoint=$AWS_ENDPOINT_URL \
"$bucket_name" "$mount_point"
bash $local_dir/river/main.sh
fi

Building Tools for the RIVER Platform

This guide provides an end-to-end process for creating a tool compatible with the RIVER platform. Prerequisites: Git configured on your machine.

tip

Step 1: Clone Your Repo (created by template)

This repo nttg8100/demo-river-non-ui-tool.git is generated from the template https://github.com/riverxdata/batch-template-analysis for non-interactive analysis

git clone git@github.com:nttg8100/demo-river-non-ui-tool.git
cd demo-river-non-ui-tool

Required directory structure:

├── LICENSE
├── README.md
├── river
│ └── main.sh
└── schema.json

Step 2: Configure the Schema

The schema defines tool parameters and adapts from nf-core standards. Use either schema.json or nextflow_schema.json (for nf-core pipelines).

Key features:

  • Groups parameters for easier configuration
  • Optional for web-based tools
  • Follows nextflow JSON schema standards

Example schema.json:

{
"$defs": {
"input_output_options": {
"title": "Input output option",
"type": "object",
"description": "",
"default": "",
"required": ["file"],
"properties":
{
"file": {
"type": "string",
"default": "workspace/test.txt",
"format": "file-path",
"fa_icon": "fas fa-dna",
"description": "File path"
},
"dir": {
"type": "string",
"default": "workspace",
"format": "directory-path",
"fa_icon": "fas fa-dna",
"description": "Directory path"
},
"integer": {
"type": "integer",
"default": 1,
"description": "Integer",
"fa_icon": "fas fa-greater-than-equal"
},
"number": {
"type": "number",
"default": 0.3,
"description": "Float",
"fa_icon": "fas fa-greater-than-equal"
},
"string": {
"type": "string",
"default": "string1",
"description": "String",
"fa_icon": "fas fa-greater-than-equal",
"enum": ["string1", "string2", "string3"]
},
"boolean": {
"type": "boolean",
"default": true,
"description": "Boolean",
"fa_icon": "fas fa-greater-than-equal"
}
}
}
}
}
tip

This schema is compatible with the nf-core schema, you can copy this json file content and add to the web-based editor provided by nf-core. After the adjustment, you can copy and put it back Use the nf-core pipeline schema builder to create and modify your schema visually.

Step 3: Add your job script

It will ingest the parameters values to the script on the river/main.sh

Here, we add the example just to print the out.

warning

You should not expose your credential like S3 keys to the console. It can be viewed via log file.

#!/bin/bash
echo ""
echo "============================================="
echo " RIVER ANALYSIS TEMPLATE"
echo "============================================="
echo ""

# Default Environment Variables
echo ">> Default Environment Variables:"
echo "---------------------------------------------"
echo "UUID Job ID : $job_id"
echo "CPU : $cpu"
echo "Memory : $memory"
echo "Time : $time"
echo "RIVER_HOME : $RIVER_HOME"
echo "---------------------------------------------"
echo ""

# Parameters
echo ">> Parameters:"
echo "---------------------------------------------"
echo "File Path : $file"
echo "Folder Path : $dir"
echo "Integer : $integer"
echo "Number(float): $number"
echo "String : $string"
echo "Boolean : $boolean"
echo "---------------------------------------------"
echo ""

# Bootstrap (for Custom Reverse Proxy)
echo ">> Bootstrap: Uncomment if you want to use custom reverse proxy"
echo "---------------------------------------------"
# Custom reverse location.
# Example: RStudio Server automatically redirects to `/` and restricts iframe usage.
echo "$job_id/" > "$RIVER_HOME/jobs/$job_id/job.proxy_location"

# Example: VS Code Server or JupyterLab requires reverse proxy.
echo "$job_id" > "$RIVER_HOME/jobs/$job_id/job.url"

# If the job is only exported via port (basic web app), still create this file but leave it empty.
touch "$RIVER_HOME/jobs/$job_id/job.url"
echo "---------------------------------------------"
echo ""

# Placeholder for Main Script Execution
echo "Put your main script here, using arguments from environment variables."

echo ""
echo "✔ Analysis Completed!"
echo "============================================="

Now, you can run the tools on the HPC. However, to ensure the integration process is smooth, we encourage you to add your self testing on your local environment. Check Testing