...
In some cases, users may have inputs or datasets that comprise of a lot of files that are much smaller than 1 MB each, resulting in hitting a limit on how many files they can store before exhausting the storage space they have been allocated. This limitation can be frustrating, but in the case the files are not needing to be modified, utilization of an archive format is ideal and in fact encouraged. For some this could be a simple tar file and for others they need the ability to access these files on a regular basis without having to unpack the contents of the archive. To create a simple to use, read only archive that can be mounted on Koa on as a folder in your home directory during a jobs execution, we encourage users to consider using a SquashFS archive.
...
Only counts as a single file
Larger file size stored as it can merge multiple smaller files
Reduction in overhead on file access by requiring less network communication to access each file stored in the archive. As a result, possible better performance on access accessing the read-only data
Users can mount/unmount each archive and access it like any other storage location
...
Building a squashfs file
On Koa, users cane can create a squashfs archive using the mksquashfs on a compute node. For example, let us assume we have a folder located at ~/koa_scratch/database which contains 10K files with an average file size of 512K. I could create a single file, let us call it database.sqfs and also save it in ~/koa_scratch with the following command:
...
Note |
---|
Note: the squashfs archive can only be mounted on folder that reside in your home directory or in /tmp on a given node. Also be aware that squashfuse will only mount the archive on the node the command is executed from and would only be accessible on that node. An archive can be mounted on multiple nodes at the same time on the same folder in your user home directoryat the same path for each node that needs access to the archive. |
Code Block |
---|
mkdir /tmp/database squashfuse_ll -otimeout=43200 ~/koa_scratch/database.sqfs /tmp/database |
...