Users of Koa are granted access to two storage locations by default and could be granted access to two other types of storage locations.
The two default storage locations users are given access to is home storage (home) and scratch storage (Koa Scratch).
The two additional storage locations could be granted access to include Lab storage (free) and KoaStore storage (for fee).
Home
Each user is provided a home storage location on Koa. Each user's home is provided 50 GB of space that can be used as the user chooses. Upon initialization user homes are setup with an examples directory, a symlink for Koa Scratch and other basic configurations needed to allow a users to take advantage of Koa.
Performance
Home is not a high performance filesystem. NFS is tried a true but it is not designed to handle the stresses an High Performance Computing cluster can place on it. As a result, home should not be used for writing output from multiple jobs or jobs that can generate a lot of data. Jobs of this nature should take advantage of Scratch instead, which is a file system designed for the stresses an HPC can generate.
Home can be suitable for users with very modest/light needs and files that need to persist on Koa.
Details
Home is utilizing a ZFS over NFS RDMA. Currently home is served out utilizing a single Virtual Host that is connected to Koa at 200 Gbit. ZFS is configured to use zstd-3 compression allowing users to take full advantage of the inline compression and space savings zstd provides. As of this writing, on average users see a 2.16x compression ratio.
File system | Per user storage quota | Per user file limit | Compression | Persistent |
---|---|---|---|---|
ZFS + NFS v4 | 50 GB | N/A | zstd-3 | Yes |
Scratch
Scratch, also known by the symlink “koa_scratch”, provides each user access to a 800TB pool of storage on which files may live 90 days on from the last time they were modified. In total, this file system can support up to 400,000,000 files and directories. Users are not provided an individual quota allowing for flexibility based on need. Scratch directly access the underlying Koa Storage system providing the high performance possible from the storage system.
Performance
Scratch provides direct access to the underlying high performance file system: Koa Storage, which utilizes Lustre. Lustre is designed for situations where many servers and workloads are needing to read and write data as quickly as possible. While Lustre works best with long sequential read/writes, and exhibits poorer performance with small random reads, but methods exist to work around this limitation. Work around would include the use of squashfs as we cover in our documentation or archive formats that combine multiple small files that align with your workflow, such as HDF5.
For scalability and performance, files written to Scratch use a progressive file layout in which at certain size boundaries switch to different types of storage medium or recruits more storage targets to store parts of a file.
Automatic File Purging
Scratch is not a persistent storage location for users data. Scratch provides a 90 day grace period after a file was last written to before the file is removed automatically from the file system.
The purge process cannot be paused for individual users and files that are removed cannot be recovered
Parameters of the automated file purge
Only files under ~/koa_scratch/ or /mnt/lustre/koa/scratch/${USER} will be subject to purge
Files and folders not modified for 90 days will be deleted from scratch
The purge process will run daily
In the case the file system nears 85-90% utilized, ITS-CI will contact users who have large occupancy of scratch to voluntarily reduce their usage. If we are unable to reclaim enough space to drop below 70% utilized, we will purge files from oldest to newest, regardless of time on scratch until we are below 70% usage.
Details
Scratch, or more specifically, Koa Storage utilizes Lustre with ZFS as its underlying file system. For transport to Koa, it also utilizes RDMA using multiple servers all connected at 200Gbit Infiniband. The ZFS components are setup to provide the same zstd-3 compression seen on the home file system providing space savings and in some cases faster access to file as you are needing to read less data from slower storage medium. The underlying storage system utilizes a mixture of spinning enterprise hard drives (HDD) and Solid State storage, (SAS SSD and NVMe).
Metadata
Metadata is stored in separate targets from the files data, which for performance reason is entirely using Solid State Drives. The meta data is split up among multiple targets, which are then served out by different servers. In case of a server failure, “failing-over” the storage to another server is possible allowing for minimal down time. Each folder is assigned to one of the Metadata storage targets and all files under it are also assigned to that target. Load balancing is done in some cases to try and balancing the different meta data targets so they do not grow too out of sync size wise.
Object Data
Object data is stored is storage on a mixture of spinning enterprise hard drives, SAS Solid State Drives and NVMe. Of the current storage (7 PB), about 1/7th of Koa’s object data storage is either SAS SSD or NVMe.
Storage targets are split up into different configurations providing at least a 2 disk parity. Current Koa has 58 Object Storage Targets, of which 10 targets are SAS SSD or NVMe.
Object data is written to Scratch using a progressive file layout (PFL). The currently PFL setup used by scratch follows the following rules:
The first 512K of every file is written to a single solid state (SSD or NVMe) storage target
Next, file data up to 64MB is written to a single HDD target
Next, file data up to 512MB is written to two HDD targets in 4 MB stripes
Next, file data up to 1 GB is written to four HDD targets in 4 MB stripes
Finally, any remaining data for a file is written to eight HDD targets in 4 MB stripes
File system | Storage Quota | Files & Directories (inode) Quota | Compression | Persistent |
---|---|---|---|---|
Lustre + ZFS | 800 TB | 400,000,000 | zstd-3 | No (90 day purge) |