Scripts that integrate RCAC Fortress, MS Teams, etc. to provide a reliable and flexible backup system.
A vault is a storage pool and the necessary protocol to move objects in / out of the pool. (RCAC's Fortress, for instance, can use HPSS, sftp, and Globus as protocols for file transfer.)
Data assets can have concurrent presence in three different namespaces:
- the host file system
- the logical zone (kind of like a "location" when configuring a web server)
- the vault
Backup operations are semantically done in the logical and vault file systems. This allows for (example) data assets to be moved from one host to another, or possibly replicated onto several hosts.
Data assets in the host file system are described as fully qualified (i.e. from the root of the file system) POSIX paths.
Data assets also belong to a logical "resource" zone. In an zone, assets are fully described via an Asset Resource Key (ARK) which is a triple of (site.name, zone.name, asset.name)
Data assets can be fully copied or differentially updated in a vault.
These objects are described using a BLONDE (BLOb Name and Description Encoded), which is a compact, unique name for a backup object that encodes a reference to the source ARK, its lineage, and time of commit.
In the vault, a BLONDE can either be an anchor (full backup) or a differential.
References to assets are done by "badge" - which is a 40 bit chunk from a SHAKE128 hash of the asset's ARK (in CURIE format).
Time is also highly compressed using "Quantim"
Asset Resource Keys - a triple of (site, zone, path) where path is relative to the root of the zone. ARKs are typically represented in CURIE syntax that has a form like "[@{site}:{zone}/{path}]". For example, "[@idifhub:LiDAR/QLX_3DEP_LiDAR_US_SOUTH]"
"Quantim" is a textual representation of time that uses 8 ASCII characters to represent time to near minute resolution for the next 900+ years. Quantim timestamps follow the pattern "YYYDDDTT" where YYY is the three digit century and year, DDD is the current day of the year, and TT is the time of day encoded using base 36.
Since there are 1,296 2-digit base 36 numbers and 1,440 minutes per day, each TT represents 1.11 minutes (or 66.7 seconds).
bastion is run via conda+python and can be run from an interactive shell, or (using "conda run ...") from a cron job. bastion has the following command set ...
- bastion backup site {site}
- bastion backup zone {ark}
- bastion backup asset {ark}
- bastion update site {site}
- bastion update zone {ark}
- bastion update asset {ark}
- bastion refresh keytab {vault}
- bastion restore zone {ark}
- bastion restore asset {ark}
- bastion restore site {site}
- bastion [list|export] zone assets {ark}
- bastion [list|export] site assets {ark}
- bastion [list|export] zones {site}
- bastion [list|export] sites
- bastion [list|export] anchors {ark}
- bastion [list|export] snaps {ark}