Data sharing

US-ATLAS Analysis Facilities are experimenting the following data sharing methods. Once they are matured, they will be deployed at all US-ATLAS AFs.

Use the Xcache servers

Both BNL and SLAC have set up the Xcache servers, to help cache locally the file on the grid or CERN EOS. Currently there are 60TB on the BNL Xcache server, and 20TB on the SLAC Xcache server.

The Xcache servers

  • provide rucioN2N feature, enabling users to access any files on the grid without knowing its exact site location and the file path.
  • and help cache locally the content of remote files actually read in the first access, thus improves the read performance for sequential access. If only partial content of a file is read, then only that part would cached.

You can run the predefined command Xcache_ls.py to generate a clist file (containing a list of physicsl file paths) for given datasets, then use the clist in your jobs.

Please click the following arrow to see the full usage of Xcache_ls.py.

run Xcache_ls.py -h to get the full usage

% Xcache_ls.py -h
Usage: 
     Xcache_ls.py [options] dsetNamePattern[,dsetNamePattern2[,more patterns]]
  or
     Xcache_ls.py [optiones] --eos eosPath/
  or
     Xcache_ls.py [optiones] --eos eosPath/filenamePattern
  or
     Xcache_ls.py [options] dsetListFile

  This script generates a list (clist) of 
  Xcache gLFN (global logical filename) access path 
  for given datasets on Atlas grid sites.
  Wildcard is supported in the dataset name pattern.

Options:
  -h, --help            show this help message and exit
  -v                    Verbose
  -V, --version         print my version
  -X XCACHESITE, --XcacheSite=XCACHESITE
                        Specify a Xcache server site of BNL or SLAC
                        (default=BNL)
  -o OUTCLISTFILE, --outClistFile=OUTCLISTFILE
                        write the list into a file instead of the screen
  --eos=EOS_PATH, --cerneos=EOS_PATH
                        List files (*.root and *.root.[0-9] on default) on
                        CERN EOS
  -d OUTCLISTDIR, --dirForClist=OUTCLISTDIR
                        write the list into a directory with a file per
                        dataset

However, for large file inputs on the grid, you are recommended to plan ahead and pre-stage them to BNL using R2D2 request or rucio command.

Work between BNL and CERN

Access to CERN EOS from BNL

The ways to list, write and read files on CERN EOS, documented here, still work at BNL, but you need specify the full EOS server name eosatlas.cern.ch and obtain a CERN Kerberos ticket:

You can obtain and cache a CERN Kerberos ticket (this is also required for the way of using ssh-tunnel below) by:

kinit YourNameAtCERN@CERN.CH

Please be aware that in the above command the realm CERN.CH must be in UPPERCASE.

As convience for the US ATLAS users, we have installed the eos-client and eos-fusex packages on the interactive nodes.

After obtaining your CERN Kerberos ticket, you can access both the ATLAS EOS and USER EOS instances.

To list your files:

ls /eos/atlas/...
ls /eos/user/y/yesw/...

Please replace "y/yesw" with your own username at CERN.

To copy files from EOS:

cp /eos/atlas/YourDir/YourFilename.root .

To copy files to your EOS area at CERN:

cp MyNewFile.xxx /eos/atlas/YourDir/MyNewFile.xxx

You can create new directories in your EOS area at CERN:

mkdir /eos/atlas/YourDir/NewDirectory

In addition, you can also use ssh-tunnel to eosatlas.cern.ch:

ssh -NfL 1094:eosatlas:1094 lxplus.cern.ch

Then you can list files on EOS:

xrdfs eosatlas.cern.ch ls /eos/atlas/..
xrdfs localhost ls /eos/atlas/..   # if using ssh-tunnel

To copy files from EOS:

xrdcp root://eosatlas.cern.ch//eos/atlas/YourDir/YourFilename.root .
xrdcp root://localhost//eos/atlas/YourDir/YourFilename.root.root  .  # if using ssh-tunnel

Or you make use of the existing script eos-copy.py, which is an alias and should have been defined for you upon login:

% which eos-copy.py
/afs/usatlas.bnl.gov/scripts/eos-copy.py

% eos-copy.py -h
Usage: eos-copy.py [options] eos_source... local_dir
   eos-copy.py [options] eos_source... pnfs_dir

     widlcard such as "*.root" is allowed in the eos_source.

   This script uses xrdcp to copy files/dirs from CERN EOS to a local dir
or a BNL pricate dCache dir. A valid CERN AFS token is required.


Options:
  -h, --help  show this help message and exit
  --verbose   Print verbose info
  --version   Print the script version then exit

Or to read EOS files in ROOT:

TFile *file = TFile::Open("root://eosatlas.cern.ch//eos/atlas/YourDir/YourFilename.root");
TFile *file = TFile::Open("root://localhost//eos/atlas/YourDir/YourFilename.root");  # if using ssh-tunnel

Access to CERN EOS in BNL batch jobs

The method using ssh-tunnel would not work in batch jobs, you need access them directly with root://eosatlas.cern.ch. However, for protected EOS files, you need pass your CERN Kerberos ticket to the batch machines in the following way:

  1. First define one envvar KRB5CCNAME prior to running kinit YourNameAtCERN@CERN.CH

    export KRB5CCNAME=$HOME/krb5cc_`id -u`
    
  2. Then add the envvar KRB5CCNAME to your condor batch jobs.

Access to CERN EOS through BNL Xcache server

If you need repeat access the same EOS files, you can make use of the BNL Xcache server to speed up the reading speed for the sequntial access.

Just use the option --eos=EOS_PATH in the script Xcache_ls.py to generate the clist files for your EOS files at CERN. Please run Xcache_ls.py -h for more details.

Access to BNL files from CERN and outside BNL

You or your collaborators may need remote access to files at BNL.

Access to BNL dCache files from CERN

You can use the following scripts (~yesw/public/bnl/bnl_pnfs-ls.py) to generate clist or list files under a given BNL /pnfs directory.

lxplus% ~yesw/public/bnl/bnl_pnfs-ls.py -h
Usage: 
     bnl_pnfs-ls.py [-o clistFilename] [options] [pnfsFilePath | pnfsDirPath] [morePaths]

  This script generates pfn (physical file name), pnfs-path,  
or xrootd-path of files on BNL dcache for given datasets or files on PNFS,
where wildcard is supported in pnfsFilePath and pnfsDirPath

Options:
  -h, --help            show this help message and exit
  -v                    Verbose
  -V, --version         print my version
  -l, --listOnly        list only matched datasets under users dCache, no pfn
                        output
  -o OUTPFNFILE, --outPfnFile=OUTPFNFILE
                        write pfn list into a file instead of printing to the
                        screen

For example, you run the above script in the following ways:

lxplus% ~yesw/public/bnl/bnl_pnfs-ls.py -l /pnfs/usatlas.bnl.gov/users/yesw2000/testDir2
lxplus% ~yesw/public/bnl/bnl_pnfs-ls.py -o my.clist /pnfs/usatlas.bnl.gov/users/yesw2000/testDir2
1  files listed into clist file= my.clist

You can use the generated clist file in your job in the following way:

    TChain* chain = new TChain(treeName);
    TFileCollection fc("fc","list of input root files","my.clist");
    chain->AddFileInfoList(fc.GetList());

Access to BNL other file systems from CERN

You can use sshfs to mount the remote BNL files to lxplus machines locally. For example,

lxplus% mkdir /tmp/yesw/data
lxplus% sshfs spar0102:/atlasgpfs01/usatlas/data/yesw2000 /tmp/yesw/data

assuming that you have already set up the ssh configuration as shown in the section of interactive connection to BNL.

To umount the mounted point, just run fusermount -u /tmp/yesw/data.

To list all the sshfs mounted points, just run pgrep -a -f sshfs.

Access to BNL other file systems from other remote computers

For other computers outside of BNL such as your laptop, you can use the say way as that for CERN. You can find the instruction of sshfs installation on different OS at https://linuxize.com/post/how-to-use-sshfs-to-mount-remote-directories-over-ssh/.

Data Sharing Store at SLAC

US ATLAS is experimenting a data sharing store service at SLAC AF. The goal is to enable easy data sharing with your ATLAS colleagues.

Features

This is an object store with the following features:

  1. Allows ATLAS users to upload/delete files using root or https protocols. Anyone can download a file (see Privacy section). This service is not limited to US ATLAS.
  2. Files uploaded there have a lifetime of N-days. After that period, they will be purged without notice. Currently N is set to 60 at SLAC AF.
  3. The top level directory in the object store is not browsable.

The object store is available at the following URLs.

https://sdf-dtn10.slac.stanford.edu:2094/share (or)
root://sdf-dtn10.slac.stanford.edu:2094//share (double slash after :2094)

Privacy

Browsing/Listing of '/share' is disabled in order to provide a level of privacy that is suitable for sharing low sensitivity data. For example, if one copies a data file to

https://sdf-dtn10.slac.stanford.edu:2094/share/random-string/myfile.dat

Others would not know the existance of myfile.dat unless they were told about the random-string. This is because /share is not searchable. As the owner, you should write down the random string and keep it secure. Anyone who know the random string can search for the content under it.

If you lose the random string, you lose access to your data Administrators won't be able to help you since there is no records of ownership in the object store. The data will eventually be pured after expiration.

Upload and Download

You will need an X509 proxy with ATLAS VOMS attibute to upload and delete. No such requirement for downloading, though some tools will insist to have a X509 proxy before proceeding. So run commmand voms-proxy-init -voms atlas first to obtain an X509 proxy with ATLAS VOMS attibute.

Then if you will upload, think of a hard-to-guest random string to be used after /share. One secure way to generate a random string is to use Unix command uuidgen, and write it down!

There are three set of tools that can be used to upload/download/delete a file. In addition, you can also use your web broswer to download.

Using curl to update/download/delete

curl is available everywhere. To use curl, follow these steps

  1. Create an alias to type less:
    alias mycurl="curl -E /tmp/x509up_u$(id -u) --cacert /tmp/x509up_u$(id -u) --capath /etc/grid-security/certicates".
    You may need to adjust the proxy location and CA directory location (/etc/grid-security/certicates) in your environment.
  2. Upload:
    mycurl -L -X PUT --upload-file /tmp/mydata.file https://sdf-dtn10.slac.stanford.edu:2094/share/random-string/myfile.dat
  3. Download:
    mycurl -L -X GET https://sdf-dtn10.slac.stanford.edu:2094/share/random-string/myfile.dat
  4. Delete:
    mycurl -L -X DELETE https://sdf-dtn10.slac.stanford.edu:2094/share/random-string/myfile.dat

Use gfal2 tools to upload/download/delete

You may need to setup the ATLAS environment (run localSetupRucioClients) to have the gfal2 tools in your PATH.

  1. Upload:
    gfal-copy -f /tmp/myfile.dat https://sdf-dtn10.slac.stanford.edu:2094/share/random-string/myfile.dat
  2. Download:
    gfal-copy -f https://sdf-dtn10.slac.stanford.edu:2094/share/random-string/myfile.dat /tmp/myfile.dat
  3. Delete:
    gfal-rm https://sdf-dtn10.slac.stanford.edu:2094/share/random-string/myfile.dat
  4. You can even do gfal-copy -f https://cern.ch//SCRATCHDISK/myfile.dat https://sdf-dtn10.slac.stanford.edu:2094/share/random-string/myfile.dat

Gfal2 tools work with both https and root protocols. In the last example, the source and destination can use different protocols.

Use xrootd tools to upload/download/delete

You may need to setup the ATLAS environment (run localSetupRucioClients) to have the xrootd tools in your PATH. These tools will mostly work with the root protocol. Note that in a root URL, there is usuall a double slash after the port number.

  1. Upload:
    xrdcp -f /tmp/myfile.dat root://sdf-dtn10.slac.stanford.edu:2094//share/random-string/myfile.dat
  2. Download:
    xrdcp -f root://sdf-dtn10.slac.stanford.edu:2094//share/random-string/myfile.dat /tmp/myfile.dat
  3. Delete:
    xrdfs root://sdf-dtn10.slac.stanford.edu:2094 rm /share/random-string/myfile.dat /tmp/myfile.dat
  4. You can also do xrdcp -f root://cern.ch//SCRATCHDISK/myfile.dat root://sdf-dtn10.slac.stanford.edu:2094//share/random-string/myfile.dat /tmp/myfile.dat

With additional setting, xrdcp also works with the https protocol.

Use a web broswer

You can use a web broswer to download file and list a directory (except the top level, which is not browsable). To do that, just paste the https URL to your browser.

It is not possible to use a web broswer for upload and deletion.