Data Storage at UChicago

Table of contents
- Storage Limits
- FileSystems
- LOCALGROUPDISK

Storage Limits

Filesystem Quota Path Backed up? Notes
$home 100GB /home/$user Yes Solid-state filesystem, shared to all worker nodes
$data 5 TB /data/$user No CephFS filesystem, shared to all worker nodes
$scratch n/a /scratch No Ephemeral storage for workloads, local to worker nodes

Filesystems

The UChicago analysis facility has three filesystems with a clearly defined role. Please be aware of each of these roles when running workloads.

Filesystem Function
$home

Your home area is intended to store small files like analysis code, scripts, small samples.
Please store your big data files on the $data filesystem.

$data

This directory is the dedicated shared filesystem to storage data, i.e. the big files,
that is, for example your data samples.

$scratch

This filesystem is an ephemeral storage for workloads and local to worker nodes.
All jobs start in this directory on the worker nodes by default.
Consequently, Output data will need to be staged to the shared filesystem or it will be lost!.
In the next sections you can find examples and more details about this directory and its use.

LOCALGROUPDISK

If you need more space to storage data, need to share it with your teamwork or colleagues who are not necessarily using the UChicago Analysis Facility you can use LOCALGROUPDISK which is a disk resource for all US-ATLAS members. Check the Rucio documentation at RSE Rucio manage quota, type MWT2_UC_LOCALGROUPDISK in the text box:

rse rucio manage quota

and click the select button. If you search your lxplus username you'll see that you have a default quota of 15TB. For additional space if need beyond 15TB here is the Request form. Remember that you need US-ATLAS VO for the grid certificate.

If you are an ATLAS member but can't find your name go to VOMS page and select /atlas/usatlas in the "groups roles" box:

voms groups roles

Transfer datasets to LOCALGROUPDISK

To transfer datasets to LOCALGROUPDISK, check the following 3 options:

  • Using r2d2, make your request in the Rucio page rucio r2d2 request: rucio r2d2 request
  • Adding “-destSE” to your PANDA job.
  • Via Rucio on the command line

How to access datasets

To access datasets, you can choose one of the following 3 options.

  • In grid-based analyses
  • Through XRootD from shared T3’s, check the Data Sharing section.
  • Download locally, remember to use the proper filesystem, eg: to storage large data samples files use $data.