Data sharing
US-ATLAS Analysis Facilities are experimenting the following data sharing methods. Once they are matured, they will be deployed at all US-ATLAS AFs.
Use the Xcache servers
Both BNL and SLAC have set up the Xcache servers, to help cache locally the file on the grid or CERN EOS. Currently there are 60TB on the BNL Xcache server, and 20TB on the SLAC Xcache server.
The Xcache servers
- provide rucioN2N feature, enabling users to access any files on the grid without knowing its exact site location and the file path.
- and help cache locally the content of remote files actually read in the first access, thus improves the read performance for sequential access. If only partial content of a file is read, then only that part would cached.
You can run the predefined command Xcache_ls.py to generate a clist file (containing a list of physicsl file paths) for given datasets, then use the clist in your jobs.
Please click the following arrow to see the full usage of Xcache_ls.py.
run Xcache_ls.py -h to get the full usage
% Xcache_ls.py -h Usage: Xcache_ls.py [options] dsetNamePattern[,dsetNamePattern2[,more patterns]] or Xcache_ls.py [optiones] --eos eosPath/ or Xcache_ls.py [optiones] --eos eosPath/filenamePattern or Xcache_ls.py [options] dsetListFile This script generates a list (clist) of Xcache gLFN (global logical filename) access path for given datasets on Atlas grid sites. Wildcard is supported in the dataset name pattern. Options: -h, --help show this help message and exit -v Verbose -V, --version print my version -X XCACHESITE, --XcacheSite=XCACHESITE Specify a Xcache server site of BNL or SLAC (default=BNL) -o OUTCLISTFILE, --outClistFile=OUTCLISTFILE write the list into a file instead of the screen --eos=EOS_PATH, --cerneos=EOS_PATH List files (*.root and *.root.[0-9] on default) on CERN EOS -d OUTCLISTDIR, --dirForClist=OUTCLISTDIR write the list into a directory with a file per dataset
However, for large file inputs on the grid, you are recommended to plan ahead and pre-stage them to BNL using R2D2 request or rucio command.
Work between BNL and CERN
Access to CERN EOS from BNL
The ways to list, write and read files on CERN EOS, documented here, still work at BNL, but you need specify the full EOS server name eosatlas.cern.ch and obtain a CERN Kerberos ticket:
You can obtain and cache a CERN Kerberos ticket (this is also required for the way of using ssh-tunnel below) by:
kinit YourNameAtCERN@CERN.CH
Please be aware that in the above command the realm CERN.CH must be in UPPERCASE.
As convience for the US ATLAS users, we have installed the eos-client and eos-fusex packages on the interactive nodes.
After obtaining your CERN Kerberos ticket, you can access both the ATLAS EOS and USER EOS instances.
To list your files:
ls /eos/atlas/...
ls /eos/user/y/yesw/...
Please replace "y/yesw" with your own username at CERN.
To copy files from EOS:
cp /eos/atlas/YourDir/YourFilename.root .
To copy files to your EOS area at CERN:
cp MyNewFile.xxx /eos/atlas/YourDir/MyNewFile.xxx
You can create new directories in your EOS area at CERN:
mkdir /eos/atlas/YourDir/NewDirectory
In addition, you can also use ssh-tunnel to eosatlas.cern.ch:
ssh -NfL 1094:eosatlas:1094 lxplus.cern.ch
Then you can list files on EOS:
xrdfs eosatlas.cern.ch ls /eos/atlas/..
xrdfs localhost ls /eos/atlas/.. # if using ssh-tunnel
To copy files from EOS:
xrdcp root://eosatlas.cern.ch//eos/atlas/YourDir/YourFilename.root .
xrdcp root://localhost//eos/atlas/YourDir/YourFilename.root.root . # if using ssh-tunnel
Or you make use of the existing script eos-copy.py, which is an alias and should have been defined for you upon login:
% which eos-copy.py
/afs/usatlas.bnl.gov/scripts/eos-copy.py
% eos-copy.py -h
Usage: eos-copy.py [options] eos_source... local_dir
eos-copy.py [options] eos_source... pnfs_dir
widlcard such as "*.root" is allowed in the eos_source.
This script uses xrdcp to copy files/dirs from CERN EOS to a local dir
or a BNL pricate dCache dir. A valid CERN AFS token is required.
Options:
-h, --help show this help message and exit
--verbose Print verbose info
--version Print the script version then exit
Or to read EOS files in ROOT:
TFile *file = TFile::Open("root://eosatlas.cern.ch//eos/atlas/YourDir/YourFilename.root");
TFile *file = TFile::Open("root://localhost//eos/atlas/YourDir/YourFilename.root"); # if using ssh-tunnel
Access to CERN EOS in BNL batch jobs
The method using ssh-tunnel would not work in batch jobs, you need access them directly with root://eosatlas.cern.ch. However, for protected EOS files, you need pass your CERN Kerberos ticket to the batch machines in the following way:
-
First define one envvar KRB5CCNAME prior to running
kinit YourNameAtCERN@CERN.CH
export KRB5CCNAME=$HOME/krb5cc_`id -u`
-
Then add the envvar KRB5CCNAME to your condor batch jobs.
Access to CERN EOS through BNL Xcache server
If you need repeat access the same EOS files, you can make use of the BNL Xcache server to speed up the reading speed for the sequntial access.
Just use the option --eos=EOS_PATH in the script Xcache_ls.py to generate the clist files for your EOS files at CERN. Please run Xcache_ls.py -h for more details.
Access to BNL files from CERN and outside BNL
You or your collaborators may need remote access to files at BNL.
Access to BNL dCache files from CERN
You can use the following scripts (~yesw/public/bnl/bnl_pnfs-ls.py) to generate clist or list files under a given BNL /pnfs directory.
lxplus% ~yesw/public/bnl/bnl_pnfs-ls.py -h Usage: bnl_pnfs-ls.py [-o clistFilename] [options] [pnfsFilePath | pnfsDirPath] [morePaths] This script generates pfn (physical file name), pnfs-path, or xrootd-path of files on BNL dcache for given datasets or files on PNFS, where wildcard is supported in pnfsFilePath and pnfsDirPath Options: -h, --help show this help message and exit -v Verbose -V, --version print my version -l, --listOnly list only matched datasets under users dCache, no pfn output -o OUTPFNFILE, --outPfnFile=OUTPFNFILE write pfn list into a file instead of printing to the screen
For example, you run the above script in the following ways:
lxplus% ~yesw/public/bnl/bnl_pnfs-ls.py -l /pnfs/usatlas.bnl.gov/users/yesw2000/testDir2
lxplus% ~yesw/public/bnl/bnl_pnfs-ls.py -o my.clist /pnfs/usatlas.bnl.gov/users/yesw2000/testDir2
1 files listed into clist file= my.clist
You can use the generated clist file in your job in the following way:
TChain* chain = new TChain(treeName);
TFileCollection fc("fc","list of input root files","my.clist");
chain->AddFileInfoList(fc.GetList());
Access to BNL other file systems from CERN
You can use sshfs to mount the remote BNL files to lxplus machines locally. For example,
lxplus% mkdir /tmp/yesw/data
lxplus% sshfs spar0102:/atlasgpfs01/usatlas/data/yesw2000 /tmp/yesw/data
assuming that you have already set up the ssh configuration as shown in the section of interactive connection to BNL.
To umount the mounted point, just run fusermount -u /tmp/yesw/data.
To list all the sshfs mounted points, just run pgrep -a -f sshfs.
Access to BNL other file systems from other remote computers
For other computers outside of BNL such as your laptop, you can use the say way as that for CERN. You can find the instruction of sshfs installation on different OS at https://linuxize.com/post/how-to-use-sshfs-to-mount-remote-directories-over-ssh/.
Data Sharing Store at SLAC
US ATLAS is experimenting a data sharing store service at SLAC AF. The goal is to enable easy data sharing with your ATLAS colleagues.
Features
This is an object store with the following features:
- Allows ATLAS users to upload/delete files using
root
orhttps
protocols. Anyone can download a file (see Privacy section). This service is not limited to US ATLAS. - Files uploaded there have a lifetime of N-days. After that period, they will be purged without notice. Currently N is set to 60 at SLAC AF.
- The top level directory in the object store is not browsable.
The object store is available at the following URLs.
https://sdf-dtn10.slac.stanford.edu:2094/share
(or)
root://sdf-dtn10.slac.stanford.edu:2094//share
(double slash after :2094)
Privacy
Browsing/Listing of '/share' is disabled in order to provide a level of privacy
that is suitable for sharing low sensitivity data
. For example, if
one copies a data file to
https://sdf-dtn10.slac.stanford.edu:2094/share/random-string/myfile.dat
Others would not know the existance of myfile.dat
unless they were told about
the random-string
. This is because /share
is not searchable. As the owner,
you should write down the random string and keep it secure. Anyone who know the
random string can search for the content under it.
If you lose the random string, you lose access to your data Administrators won't be able to help you since there is no records of ownership in the object store. The data will eventually be pured after expiration.
Upload and Download
You will need an X509 proxy with ATLAS VOMS attibute to upload and delete. No
such requirement for downloading, though some tools will insist to have a X509
proxy before proceeding. So run commmand voms-proxy-init -voms atlas
first
to obtain an X509 proxy with ATLAS VOMS attibute.
Then if you will upload, think of a hard-to-guest random string to be used
after /share
. One secure way to generate a random string is to use Unix
command uuidgen
, and write it down!
There are three set of tools that can be used to upload/download/delete a file. In addition, you can also use your web broswer to download.
Using curl to update/download/delete
curl is available everywhere. To use curl, follow these steps
- Create an alias to type less:
alias mycurl="curl -E /tmp/x509up_u$(id -u) --cacert /tmp/x509up_u$(id -u) --capath /etc/grid-security/certicates"
.
You may need to adjust the proxy location and CA directory location (/etc/grid-security/certicates) in your environment. - Upload:
mycurl -L -X PUT --upload-file /tmp/mydata.file https://sdf-dtn10.slac.stanford.edu:2094/share/random-string/myfile.dat
- Download:
mycurl -L -X GET https://sdf-dtn10.slac.stanford.edu:2094/share/random-string/myfile.dat
- Delete:
mycurl -L -X DELETE https://sdf-dtn10.slac.stanford.edu:2094/share/random-string/myfile.dat
Use gfal2 tools to upload/download/delete
You may need to setup the ATLAS environment (run localSetupRucioClients
) to have
the gfal2 tools in your PATH.
- Upload:
gfal-copy -f /tmp/myfile.dat https://sdf-dtn10.slac.stanford.edu:2094/share/random-string/myfile.dat
- Download:
gfal-copy -f https://sdf-dtn10.slac.stanford.edu:2094/share/random-string/myfile.dat /tmp/myfile.dat
- Delete:
gfal-rm https://sdf-dtn10.slac.stanford.edu:2094/share/random-string/myfile.dat
- You can even do
gfal-copy -f https://cern.ch//SCRATCHDISK/myfile.dat https://sdf-dtn10.slac.stanford.edu:2094/share/random-string/myfile.dat
Gfal2 tools work with both https and root protocols. In the last example, the source and destination can use different protocols.
Use xrootd tools to upload/download/delete
You may need to setup the ATLAS environment (run localSetupRucioClients
) to have
the xrootd tools in your PATH. These tools will mostly work with the root
protocol. Note that in a root URL, there is usuall a double slash after the
port number.
- Upload:
xrdcp -f /tmp/myfile.dat root://sdf-dtn10.slac.stanford.edu:2094//share/random-string/myfile.dat
- Download:
xrdcp -f root://sdf-dtn10.slac.stanford.edu:2094//share/random-string/myfile.dat /tmp/myfile.dat
- Delete:
xrdfs root://sdf-dtn10.slac.stanford.edu:2094 rm /share/random-string/myfile.dat /tmp/myfile.dat
- You can also do
xrdcp -f root://cern.ch//SCRATCHDISK/myfile.dat root://sdf-dtn10.slac.stanford.edu:2094//share/random-string/myfile.dat /tmp/myfile.dat
With additional setting, xrdcp also works with the https protocol.
Use a web broswer
You can use a web broswer to download file and list a directory (except the top level, which is not browsable). To do that, just paste the https URL to your browser.
It is not possible to use a web broswer for upload and deletion.