Data Storage at BNL
Table of Contents
- Storage Limits
- LOCALGROUPDISK
- Dataset Replication to LOCALGROUPDISK
- Use the BNL dCache Space
- Use the BNLBox
Storage Limits
Home area | 20GB per user under $HOME |
Data area | 500 GB per user under /atlasgpfs01/usatlas/data/$USERNAME |
dCache area | 5TB per user under /pnfs/usatlas.bnl.gov/users/$USERNAME, should be access via xrootd as explained below |
BNLBox | 50GB space under https://bnlbox.sdcc.bnl.gov, accessiblle from both mobile devices and computers |
LOCALGROUPDISK | 50TB (default) on the grid at BNL. Please check below for more details |
Note:
- In case the subdir /pnfs/usatlas.bnl.gov/users/$USERNAME does
not exist, you can emaill to
"RT-RACF-StorageManagement@bnl.gov"
to help make the subdir.
As a reminder, your home area ($HOME) is intended to store analysis code, and not data.
And there is also a 9TB scratch disk /usatlas/scratch/ shared among all users, where the files can be kept for 30 days. Please make your own subdir /usatlas/scratch/$USER there.
LOCALGROUPDISK
If you need to store data outside of the resources dedicated to the BNL Tier 3 (either due to needing more space, or to share data with colleagues who are not using the BNL Tier 3), consider using LOCALGROUPDISK, which is a resource that all US ATLAS collaborators have access to. You can check at RSE account usage with the RSE BNL-OSG2_LOCALGROUPDISK selected. Every user should have a default quota of 50TB, if you could not find your name there, please check if you have selected /atlas/usatlas in the VO groups/roles. For additional space if need beyond 50TB here is the Request form
Dataset Replication to LOCALGROUPDISK
You can replicate datasets to the RSE BNL-OSG2_LOCALGROUPDISK in the 2 following ways:
- Make the request through r2d2 request
- Make the request using rucio command rucio add-rule. Please check the rucio add-rule wiki page for usage help.
Use the BNL dCache space
dCache is a system for storing and retrieving huge amounts of data, distributed among a large number of heterogeneous server nodes, under a single virtual filesystem tree with a variety of standard access methods.
In order to use it efficiently, please do NOT write output directly to /pnfs path (the dCache space), and also avoid small files. Instead you should use the tool described below or Linux standard cp command. In the following sub-sections, it describes the way to access and replicate datasets to this system.
Access to the datasets on BNL dCache
In addition to Rucio (and DQ2Client), a convenient python script /afs/usatlas/scripts/pnfs_ls.py is provided to generate clist file (list of physicsl file path) for files in given datasets on BNL dCache, including datasets both on BNL rses (such as BNL-OSG2_LOCALGROUPDISK) mentioned above) and under BNL users dCache area.
Please click the following arrow to see the full usage.
run pnfs_ls.py -h to get the full usage
% pnfs_ls.py -h Usage: pnfs_ls.py [options] dsetListFile or pnfs_ls.py [options] dsetNamePattern[,dsetNamePattern2[,more namePatterns]] or pnfs_ls.py -o clistFilename /pnfs/FilePathPattern [morePaths] or pnfs_ls.py -p -o clistFilename [pnfsFilePath | pnfsDirPath] [morePaths] This script generates pfn (physical file name), pnfs-path, or xrootd-path of files on BNL dcache for given datasets or files on PNFS, where wildcard and symlink are supported in pnfsFilePath and pnfsDirPath Options: -h, --help show this help message and exit -v Verbose -V, --version print my version -p, --privateFiles List private non-dataset files on dCache -i, --incomplete Use incomplete sites if complete not available -u, --usersDCache Use datasets under users private dCache -l, --listOnly list only matched datasets under users dCache, no pfn output -o OUTPFNFILE, --outPfnFile=OUTPFNFILE write pfn list into a file instead of printing to the screen -d OUTPFNDIR, --dirForPfn=OUTPFNDIR write pfn list into a directory with a file per dataset -N, --usePNFS using pNFS access, default is xrootd within BNL --useXRootdOutside using xroot from outside BNL: access, default is xrootd within BNL -L LOCALBNLSITE, --localBNLSite=LOCALBNLSITE specify a BNL site, overriding the one choosen by the script
Download datasets to your pNFS area
This section describes how to download datasets from the grid to your pNFS area (/pnfs/usatlas.bnl.gov/users/$USER/). If for some reason you do not have a pNFS area you can fill out a ticket to this email group: RT-RACF-StorageManagement@bnl.gov. If you have an pNFS area your name should appear on this website: [https://www.sdcc.bnl.gov/experiments/usatlas/list-users-institutes] (https://www.sdcc.bnl.gov/experiments/usatlas/list-users-institutes).
Note: You should not use rucio to download datasets!!!
The command to download datasets is:
Or create an alias of rucio-bnl-get to this script for ease of use:
To use this command, do the following on one of the Tier3 interactive nodes.
1. Set up the regular rucio environment:
setupATLAS
lsetup rucio
2. Make your proxy to be usatlas:
voms-proxy-init --old -valid 96:0 -voms atlas:/atlas/usatlas
Please be aware of the option --old. And it is important to use usatlas role.
3. Use the command.
rucio-bnl-get --help Usage: rucio-get-bnl.rb [options] DATASETNAME -h, --help Display help message -d, --db dbfile Specify sqlite3 file -a, --rucio_account ruser Specify rucio account name -b, --base_dir baseDir Specify base directory of your data -t, --target target Specify the destination host -s, --retry Retry failed files -r, --remove Remove the dataset -c, --check Check local file
When you use the script to download a dataset it will download it to /pnfs/usatlas.bnl.gov/users/$USER/rucio (the script will create the rucio directory if it doesn't exist). The subdirectory structure will include the rucio scope first.
Example usage
Let's try to download this dataset, which only 5 files:
rucio-bnl-get data11_7TeV:data11_7TeV.00182346.physics_Muons.merge.NTUP_TOP.f380_m855_p568_p570_tid366712_00
..
FTSID=0f1fc3cc-da75-11e5-89dd-5cf3fc0c7c5c
Notice: If you got error like
Could not find table 'rdatasets' (ActiveRecord::StatementInvalid)
, you
can delete the directory $HOME/.rucio-get-bnl and try again.
If you did not get any problem, then just wait a bit. You can repeat the same command above to get the current FTS job status.
rucio-bnl-get data11_7TeV:data11_7TeV.00182346.physics_Muons.merge.NTUP_TOP.f380_m855_p568_p570_tid366712_00
Delegated Proxy Updated
Last FTS status is FINISHED
So, in this case, it tells you it has already finished. But, it is also possible to get "fail" or "finished dirty" if some transfers fail or are active (ongoing) etc....
If one wants to check physically by looking at the file system, use --check option.
rucio-bnl-get data11_7TeV:data11_7TeV.00182346.physics_Muons.merge.NTUP_TOP.f380_m855_p568_p570_tid366712_00
-c
Delegated Proxy Updated
Last FTS status is FINISHED
x
/pnfs/usatlas.bnl.gov/users/hiroito/rucio/data11_7TeV/data11_7TeV.00182346.physics_Muons.merge.NTUP_TOP.f380_m855_p568_p570_tid366712_00/NTUP_TOP.366712._000001.root.1
x
/pnfs/usatlas.bnl.gov/users/hiroito/rucio/data11_7TeV/data11_7TeV.00182346.physics_Muons.merge.NTUP_TOP.f380_m855_p568_p570_tid366712_00/NTUP_TOP.366712._000002.root.1
x
/pnfs/usatlas.bnl.gov/users/hiroito/rucio/data11_7TeV/data11_7TeV.00182346.physics_Muons.merge.NTUP_TOP.f380_m855_p568_p570_tid366712_00/NTUP_TOP.366712._000003.root.1
x
/pnfs/usatlas.bnl.gov/users/hiroito/rucio/data11_7TeV/data11_7TeV.00182346.physics_Muons.merge.NTUP_TOP.f380_m855_p568_p570_tid366712_00/NTUP_TOP.366712._000004.root.1
x
/pnfs/usatlas.bnl.gov/users/hiroito/rucio/data11_7TeV/data11_7TeV.00182346.physics_Muons.merge.NTUP_TOP.f380_m855_p568_p570_tid366712_00/NTUP_TOP.366712._000005.root.1
total # of files local / rucio : 5 / 5
total size local / rucio : 5829856628 / 5829856628
So, it really has 5 files locally in your T3 area. In this case, it is my area ( /pnfs/usatlas.bnl.gov/users/$USER/rucio ) "x" indicates the existence while "0" indicates missing.
In fact, I can do "ls -l" in T3 area (shown for $USER=hiroito).
ls -l
/pnfs/usatlas.bnl.gov/users/hiroito/rucio/data11_7TeV/data11_7TeV.00182346.physics_Muons.merge.NTUP_TOP.f380_m855_p568_p570_tid366712_00/
total 5693221
-rw-rw-rw- 1 hiroito usatlas 1093230164 Feb 23 16:33
NTUP_TOP.366712._000001.root.1
-rw-rw-rw- 1 hiroito usatlas 1079699651 Feb 23 16:33
NTUP_TOP.366712._000002.root.1
-rw-rw-rw- 1 hiroito usatlas 1337274518 Feb 23 16:33
NTUP_TOP.366712._000003.root.1
-rw-rw-rw- 1 hiroito usatlas 1306566081 Feb 23 16:33
NTUP_TOP.366712._000004.root.1
-rw-rw-rw- 1 hiroito usatlas 1013086214 Feb 23 16:33
NTUP_TOP.366712._000005.root.1
If you want to remove this file, use "--remove" option. That will remove the local files as well as the entries in your own catalog.
eg,
rucio-bnl-get data11_7TeV:data11_7TeV.00182346.physics_Muons.merge.NTUP_TOP.f380_m855_p568_p570_tid366712_00 --remove
After this,
ls -l
/pnfs/usatlas.bnl.gov/users/hiroito/rucio/data11_7TeV/data11_7TeV.00182346.physics_Muons.merge.NTUP_TOP.f380_m855_p568_p570_tid366712_00/
total 0
Now, it is possible that the first attempt to transfer files fails with some (or all) of files due to various reason. But, you can just retry it with --retry option. This will only retry the failed transfers and will use different source sites if available. This client has intentionally does not do auto-retry to avoid complexity. But, it does not prevent you to run it on the cron job for an example if you don't want to check it manually. Also, you don't have to wait the completion of the first dataset to submit the second one. You can submit as many as your space is allowed.
4. Getting statistics of your data.
/afs/usatlas.bnl.gov/lsm/x8664_sl6/rucio/rucio-bnl-usage.rb -h
Usage rucio-bnl-usage.rb [options]
-h, --help Display help message
-d, --db dbfile Specify sqlite3 file
The above script
/afs/usatlas.bnl.gov/lsm/x8664_sl6/rucio/rucio-bnl-usage.rb
has been
aliased as rucio-bnl-usage.
eg.
rucio-bnl-usage data12_8TeV:data12_8TeV.00201289.physics_Egamma.merge.NTUP_COMMON.r4644_p1517_p1562_tid01319778_00 size:694 (GB local)/ 694 (GB rucio) number of files: 763 (local) / 763 (rucio) data11_7TeV:data11_7TeV.00183054.physics_Muons.merge.NTUP_TOP.f383_m872_p568_p570_tid414466_00 size:22 (GB local)/ 22 (GB rucio) number of files: 20 (local) / 20 (rucio) mc15_13TeV:mc15_13TeV.361106.PowhegPythia8EvtGen_AZNLOCTEQ6L1_Zee.merge.DAOD_TOPQ4.e3601_s2576_s2132_r6630_r6264_p2413_tid06436984_00 size:159 (GB local)/ 159 (GB rucio) number of files: 34 (local) / 34 (rucio) data10_7TeV:data10_7TeV.00167680.physics_MinBias.merge.AOD.r1774_p327_p333_tid206966_00 size:43 (GB local)/ 43 (GB rucio) number of files: 10 (local) / 10 (rucio) data11_7TeV:data11_7TeV.00182455.physics_Muons.merge.NTUP_TOP.f381_m861_p568_p570_tid373751_00 size:1 (GB local)/ 1 (GB rucio) number of files: 2 (local) / 2 (rucio) data11_7TeV:data11_7TeV.00182787.debugrec_hltacc.merge.NTUP_2LHSG2.g1_f382_m866_p527_tid377027_00 size:0 (GB local)/ 0 (GB rucio) number of files: 0 (local) / 1 (rucio) data12_8TeV:data12_8TeV.00200863.physics_Egamma.merge.NTUP_PHOTON.r4065_p1278_p1341_p1343_p1345_tid01142890_00 size:0 (GB local)/ 0 (GB rucio) number of files: 0 (local) / 1 (rucio) Total Local Usage: 923 (GB) with 829 files
If you have any questions, just open a new ticket at BNL RT (RT-RACF-USAtlasSharedT3@bnl.gov).
Access your data in your pNFS space
This section shows you how to access data in your pNFS space (/pnfs/usatlas.bnl.gov/users/$USER/....)
BNL supports various interfaces to your area:
1. xrootd (from interactive or any worker nodes)
This might be the most optimum way within your interactive or
panda/condor jobs. All files are accessible via xrootd by prepending the
following
root://dcgftp.usatlas.bnl.gov:1096/
For example:
xrdcp -f root://dcgftp.usatlas.bnl.gov:1096//pnfs/usatlas.bnl.gov/users/hiroito/rucio/data10_7TeV/data10_7TeV.00167680.physics_MinBias.merge.AOD.r1774_p327_p333_tid206966_00/AOD.206966._000004.pool.root.1 /home/hiroito/abc.1
[4.063GB/4.063GB][100%][==================================================][106.7MB/s]
Please notice that xrootd access would require a valid grid proxy, and refer to the batch system on how to copy the grid proxy to batch machines.
2. NFS.4.1 (only from T3 machines - they are not available from BNL
production worker nodes.)
Within T3 machines, files are accessible like normal NFS file. Just use
path.
3. Web (from outside BNL)
Using your browser, you can access your files via browser with your
valid certificate. just point to:
[https://dcgftp.usatlas.bnl.gov:443/pnfs/usatlas.bnl.gov/users/youraccount/xyz/]
(https://dcgftp.usatlas.bnl.gov:443/pnfs/usatlas.bnl.gov/users/youraccount/xyz/).
Use the BNLBox
BNL provides a cloud storage BNLBox, similar to the CERNBox, but based on NextCloud. You can use it to share between computers and mobile devices, among groups. Everyone has a default quota of 50GB.
You can find more details on the SDCC page
It can be accessed from web browsers, [or desktop clients, or mobile apps] (https://nextcloud.com/install/#install-clients).
Use the BNLBox on Web Browsers
The web browser URL is https://bnlbox.sdcc.bnl.gov. You can log in with your BNL account. Please find out the webDAV access URL by clicking on the bottom left Settings on the sidebar, as shown below:
The webDAV URL is something like
https://bnlbox.sdcc.bnl.gov/remote.php/dav/files/BNL-User-8efba3ed-bfc8-4324-9cef-e9f4878c3c8d/
,
where the last part in the path is your unique UUID.
Use the BNLBox on Linux machines
The software cadaver has been installed on spar/acas machines at BNL, and lxplus machines at CERN. It is a command line webDAV client, with ftp-like commands. To save you from typing the username/password everytime, you can prepare a file .netrc under $HOME directory with the following content:
machine bnlbox.sdcc.bnl.gov
login yourLoginEmail
password yourPassword
where you should put your own login and password. And run "chmod 600 $HOME/.netrc" to make this file visible only to yourself.
In addition, you can prepare another file ~/.cadaverrc with the following line:
open https://bnlbox.sdcc.bnl.gov/remote.php/dav/files/Your-LONG-UUID-for-BNLBox/
Please put your own long UUID here.
Then just simply run "cadaver", it will connect to your BNLBox.
spar0101% cadaver WARNING: Untrusted server certificate presented for `*.sdcc.bnl.gov': Issued to: SDCC, Brookhaven National Laboratory, 53 Bell Avenue, Upton, New York, 11973-5000, US Issued by: InCommon, Internet2, Ann Arbor, MI, US Certificate is valid from Tue, 25 Sep 2018 00:00:00 GMT to Thu, 24 Sep 2020 23:59:59 GMT Do you wish to accept the certificate? (y/n) y dav:/remote.php/dav/files/BNL-User-8efba3ed-bfc8-4324-9cef-e9f4878c3c8d/> help Available commands: ls cd pwd put get mget mput edit less mkcol cat delete rmcol copy move lock unlock discover steal showlocks version checkin checkout uncheckout history label propnames chexec propget propdel propset search set open close echo quit unset lcd lls lpwd logout help describe about Aliases: rm=delete, mkdir=mkcol, mv=move, cp=copy, more=less, quit=exit=bye dav:/remote.php/dav/files/BNL-User-8efba3ed-bfc8-4324-9cef-e9f4878c3c8d/>
You can use davix commands (davix-ls, davix-put and davix-get) as well to access to your BNLBox. These commands are available by default on lxplus. At BNL, you need run "setupATLAS -q; lsetup davix" to set up the env. Then specify the full webDAV to the davix commands plus an option -k (Disable SSL credential checks).
spar0101% setupATLAS -q spar0101% lsetup davix spar0101% davix-ls -k https://bnlbox.sdcc.bnl.gov/remote.php/dav/files/BNL-User-8efba3ed-bfc8-4324-9cef-e9f4878c3c8d/ davix: using ~/.netrc to load additional configuration. (match: bnlbox.sdcc.bnl.gov) copy_bnl_box.rb dCache Documents ._.DS_Store .DS_Store Nextcloud%20Manual.pdf Nextcloud.mp4 Nextcloud.png Photos testDir Archive
Use the BNLBox on Mobile Devices
For iOS or Android devices, just install the Nextcloud app, and connect to bnlbox.sdcc.bnl.gov.