Are you in a maize of confusion about running EPI2ME workflows with Singularity?
Our workflows are composed of various pieces of bioinformatics software that, simply put, read some input, do some work, and write some output. During that middle work part, intermediate files may be created that are used during the work but are no longer needed once the program has completed. Keeping these intermediate files would be wasteful, so Unix-like systems provide a dedicated location where such temporary files can be routinely cleaned up without risking important data.
However, the default temporary directory provided by your system (which de facto is stored at /tmp
) may be too small, or frequently full (or both); causing programs to crash with errors such as “No space left on device”.
Often, compute clusters will be configured by a system administrator to instead provide a larger, shared temporary location to accommodate many users and their jobs, as this is easier to manage than individual compute nodes having a temporary directory that could potentially fill up and cause programs to crash.
In such cases, running programs need to be instructed to place intermediate files elsewhere to avoid errors.
This post is written for users of Singularity who are aware that they have a non-default temporary directory and need to configure Nextflow appropriately. If you’re a system administrator who has been sent this page, you can find a TL;DR at the bottom, otherwise by the end of this guide you’ll know:
TMPDIR
and SINGULARITY_TMPDIR
environment variables are.TMPDIR
.NXF_SINGULARITY_CACHEDIR
environment variable to re-use Singularity images built by Nextflow, saving disk space and workflow time.These instructions are written with SingularityCE in mind, but we expect them to be relevant to Apptainer users too.
We package up the software required for our workflows inside Docker images, which are effectively snapshots of a computer that have everything required for a workflow (or part of one) preinstalled. Docker and Singularity use these images to start a virtual computer that our workflows run inside, ensuring that the software we’ve packaged runs on your computer just as it did during our testing.
The difference between how Docker and Singularity work is much beyond the scope of this how-to; but the main point here is that Docker containers are automatically configured with their own temporary directory that is not mapped to the system’s temporary directory: meaning that a small or full temporary directory will not cause the programs inside the Docker container to crash. Singularity, however, will make your system’s temporary directory available to the jobs running inside your container by default.
So, if you’re in an environment where you want to use a non-default temporary directory, you will need to provide some additional configuration to avoid potential workflow errors.
Thankfully, a deep dive of Singularity is also beyond the scope of this how-to and you just need to know that among other things, Singularity needs a location for:
We’ll talk about the configuration required for each in turn.
Unix-like systems provide ways for people and programs to ask the system for a temporary file or directory.
You can see this for yourself: for example on the command line, the mktemp
command will return the path of a temporary file for you to use.
Neatly, these utilities will honour the TMPDIR
environment variable, which is used to indicate where temporary files should be placed instead of whatever the system default is (e.g. /tmp
).
Users (or their administrators) can override the location of the temporary directory by setting the TMPDIR
environment variable and most well-behaved programs will then place their temporary files in the TMPDIR
location.
Two things must be configured for intermediate files created by programs during workflow execution to be placed in the desired TMPDIR
correctly:
TMPDIR
environmental variable itselfsingularity.runOptions
to instruct Singularity to make the TMPDIR
location visible to the Nextflow processes running inside Singularity containersYour system administrator may have set TMPDIR
already, but you can set it yourself in the script or shell before calling nextflow run
to run our workflow with:
TMPDIR=/your/path/to/tmp/here
You should consult any documentation you have been provided about your cluster to determine if there is an appropriate TMPDIR
to use.
If in doubt, consult your system administrator.
Nextflow will forward the value of TMPDIR
to all programs inside a workflow, the catch is that Singularity will not automatically make the location of TMPDIR
visible to the containers, causing an ominous “failed to create file via template: No such file or directory” error.
You may well have been linked to this post by us to explain and resolve this very issue!
To instruct Singularity to make the TMPDIR
visible to Nextflow processes running in Singularity containers, you should edit your global Nextflow configuration (which is typically stored at $HOME/.nextflow/config
, but see Nextflow’s documentation for further guidance).
This file may exist already, or you may have to create it.
Either way you will want to add the following line:
singularity.runOptions = "-B \$TMPDIR"
This will adjust Nextflow’s Singularity options to “bind” (i.e. make visible) the location of TMPDIR
to all Nextflow processes, resolving the “No such file or directory” error you may have encountered.
If you have existing singularity.runOptions
in this file, you should append -B \$TMPDIR
inside the double quotes.
Make sure to maintain \$
: this is used to make sure that Nextflow does not expand TMPDIR prematurely to a Nextflow variable.
If you’re on a cluster with a shared temporary directory, you could alternatively ask your system administrator to add a permanent bind in singularity.conf
; this will avoid needing to provide singularity.runOptions
.
Naming things happens to be one of the two most difficult problems in computing (along with cache invalidation and off-by-one errors).
With this in mind, you would be forgiven for thinking that the SINGULARITY_TMPDIR
environment variable would tell programs running inside Singularity containers what TMPDIR
to use.
Instead, SINGULARITY_TMPDIR
tells Singularity itself what temporary directory it should use.
When running one of our Nextflow workflows with the Singularity executor, any container images that are required for the workflow to run will be automatically downloaded from the internet and converted into Singularity images by Nextflow; usually rather seamlessly. During this process, Singularity will read the Docker image as input, and write a Singularity compatible image as output. Just like our bioinformatics tools, Singularity needs to do some intermediate work to unpack the Docker image and build a Singularity one. Singularity requires at least as much space as the size of the resulting image, which will be several gigabytes at a minimum. If the system’s default temporary directory is small, you may encounter obscure errors during this Singularity build step.
Our recommendation is to set the SINGULARITY_TMPDIR
environment variable to your TMPDIR
:
SINGULARITY_TMPDIR=$TMPDIR
This should be set after TMPDIR
but before calling nextflow run
.
Orthogonal to temporary directories but relevant to Singularity and disk space, I would be remiss to not mention the problem of storing the Singularity images built when using Nextflow.
By default, Nextflow will store the Singularity images it builds in a directory inside the work/
directory created by the workflow you are running.
Each time you run a Nextflow workflow, it will have its own work/
directory, which means that each time you run a Nextflow workflow with Singularity, Docker images will be downloaded and converted all over again!
To our rescue comes the NXF_SINGULARITY_CACHEDIR
environment variable, which specifies the location that Singularity images should be saved to after they have been built.
Before downloading and converting images, Nextflow checks the NXF_SINGULARITY_CACHEDIR
for existing images: only downloading and converting images it has not seen before, saving time and disk space.
This is particularly useful on a cluster as any user with access to the NXF_SINGULARITY_CACHEDIR
location can re-use Singularity images that already exist.
Like the other environment variables, you will set this in your shell or in your script before calling nextflow run
:
NXF_SINGULARITY_CACHEDIR=/your/path/to/saved/images/here
You should speak to your system administrator about setting an appropriate location for NXF_SINGULARITY_CACHEDIR
that can be shared between users.
We’ve covered the reason that Singularity is different to Docker when it comes to temporary files, and what Singularity needs to store on your disk. To recap the configuration required for setting a non-default temporary directory, you’ll want to:
TMPDIR
somewhere that nextflow run
will see it (e.g. your shell or job script)TMPDIR
to Singularity containers with singularity.runOptions
in your global Nextflow configSINGULARITY_TMPDIR
NXF_SINGULARITY_CACHEDIR
You may want to put these lines in your shell’s rc file (e.g. ~/.bashrc
), but if you’re still not sure what these lines do, you should speak to your administrator first:
TMPDIR=/your/path/to/tmp/hereSINGULARITY_TMPDIR=$TMPDIRNXF_SINGULARITY_CACHEDIR=/your/path/to/saved/images/here
Don’t forget to update your global Nextflow configuration to update the Singularity run options:
singularity.runOptions = "-B \$TMPDIR"
Now you should be all set to run a Nextflow workflow with the Singularity executor and fill up your non-default temporary directory!
Related Links