Skip to main content
Back to Projects

Airflow Provider SLURM

Open Source2025-PresentCreator & Maintainer
PythonApache AirflowSLURMHPCMLPyPIOpen Source

Overview

An Apache Airflow provider that bridges workflow orchestration with HPC batch computing — enabling data scientists and ML engineers to submit, monitor, and manage SLURM jobs directly from Airflow DAGs without leaving their familiar workflow environment.

Released as a PyPI package (apache-airflow-providers-slurm), it targets the growing intersection of traditional HPC infrastructure and modern ML/data pipelines.

The Problem

HPC clusters running SLURM are powerful but isolated from modern workflow tooling. Data scientists working with Airflow had no native way to dispatch work to SLURM — they had to manually submit jobs via SSH or build brittle custom scripts. This created a hard boundary between the ML platform and the compute infrastructure.

What It Does

  • Native SLURM operators for submitting batch jobs, monitoring status, and handling outputs within Airflow DAGs
  • Job recovery — resumes tracking of in-flight jobs across Airflow restarts
  • Comprehensive error handling — maps SLURM exit codes and failure states to Airflow task states
  • Containerized execution — supports Enroot and Singularity containers for reproducible HPC workloads
  • Airflow 2.5–3.x compatibility across the current Airflow generation

Key Features

  • Drop-in Airflow provider following the official provider interface spec
  • Connection type for SLURM REST API authentication
  • Sensor operator for polling long-running jobs without holding a worker slot
  • Full DAG integration including XCom for passing job IDs between tasks

Release

  • Version: v0.1.0 alpha
  • Package: Available on PyPI
  • Compatibility: Apache Airflow 2.5, 2.6, 2.7, 2.8, 2.9, 3.x
  • Python: 3.9+

Use Cases

Built primarily for the intersection of HPC and modern ML workloads — particularly relevant for:

  • Quantitative finance — submitting Monte Carlo simulations or risk calculations from Airflow orchestration pipelines
  • Bioinformatics — running genome assembly or molecular dynamics jobs from research workflows
  • CFD / scientific computing — integrating ANSYS, OpenFOAM, or WRF runs into reproducible DAG-based pipelines