Running on a Cluster¶

When local laptop time isn't enough, you can take the same project to a SLURM HPC system. There's no separate configuration to learn — the same lc run command works inside an allocation, just with more hardware to spread across.

On NERSC Perlmutter, the filesystem layout (DVS-mounted home, Lustre scratch) and the module load conda workflow add a few site-specific considerations. See NERSC (Perlmutter) for a focused walkthrough.

The big picture¶

lc run always dispatches through a Dask cluster. Three branches:

On your laptop → a LocalCluster sized to the machine.
Inside a SLURM allocation → an in-process scheduler bound to the driver's hostname, with one dask worker per allocated node launched via srun.
With DASK_SCHEDULER_ADDRESS set → connect to whatever scheduler you've pointed at.

You don't pick — lc run detects which case applies. The only thing you do differently on a cluster is request the nodes.

Pre-flight: pick the right container runtime¶

On most HPC sites, docker isn't available on compute nodes. Most allocations of NERSC-style systems support podman-hpc. On a login node:

$EDITOR ~/.lightcone/config.yaml

container:
  runtime: podman-hpc

Then build and migrate the images for your project:

cd my-analysis
lc build

lc build runs podman-hpc build and then podman-hpc migrate, which copies the image into the per-node container cache. Compute nodes can read it without registry access.

If your site has only apptainer / singularity, the Lightcone toolchain doesn't ship explicit support for those today — you can run without containers (runtime: none) for the moment, with the caveat that the manifest's container_image field will record what was declared, not what executed. (See lc run for the provenance warning.)

A typical SLURM workflow¶

1. Get an allocation¶

salloc -N 4 -t 02:00:00 -C gpu                       # interactive
# or
sbatch run.sbatch                                    # batch

run.sbatch looks like:

#!/bin/bash
#SBATCH -N 4
#SBATCH -t 02:00:00
#SBATCH -C gpu

cd $HOME/my-analysis
source .venv/bin/activate
lc run -j 16

2. `lc run` inside the allocation¶

Once SLURM_JOB_ID is set in your environment, lc run does the rest:

Starts an in-process Dask scheduler bound to the SLURM node hostname.
Launches one dask worker per node via srun.
Each worker advertises the node's CPU, memory, and GPU resources.
Snakemake submits each rule via the Dask executor; rules with per-recipe resources: constraints land on workers that can hold them.

3. Per-recipe resource hints¶

Add resource hints in your astra.yaml recipe blocks:

outputs:
  - id: heavy_fit
    type: metric
    recipe:
      command: python scripts/fit.py --output {output[0]}
      resources:
        cpus_per_task: 32
        mem_mb: 64000
        gpus_per_task: 1

The Snakemake-via-Dask executor maps these to per-task resource requests, so a rule that needs a GPU only schedules on nodes that advertise one.

Build images, dry-run, look at status — all fine on a login node without an allocation:

lc build                       # build images (uses podman-hpc on login node)
lc status                      # offline; reads only manifests

The actual lc run should happen inside an allocation, since that's where the worker nodes are.

External Dask schedulers¶

If you have a long-lived Dask cluster (Slurm jobqueue, k8s, etc.) that you'd rather attach to:

export DASK_SCHEDULER_ADDRESS=tcp://my-scheduler:8786
lc run

lc run notices the env var and connects rather than starting its own scheduler. It does not tear the scheduler down on exit.

Troubleshooting¶

dask CLI is not on PATH inside the SLURM allocation. Install lightcone-cli into the venv that your sbatch script activates; dask ships with distributed, which is a transitive dep.
Workers never register. Usually means the SLURM node hostnames aren't resolvable from each other; check SLURMD_NODENAME / gethostname() and confirm the workers can reach the scheduler.
Image not found on compute nodes. Re-run lc build on the login node — the migrate step is the one that actually publishes the image to the per-node cache.

For the wiring detail, see engine/dask_cluster in the maintainer docs.