Scripts to provide access to S3 and interface with this SIDIS analysis repository
For more details, see S3 file storage documentation
One top-level script automates all the work:
s3tools/make-fullsim-config.sh
(best to run from top-level directory)- running with no arguments will print the usage guide
- output:
- config file, with file names, Q2 minima, and cross sections; used as input to the analysis macros
- the downloaded full simulation files, if you chose to download from S3
- this script contains some settings such as directory paths to data on S3, for the convenience of SIDIS full simulation analysis
- you may have to run
add-host.sh
first, if you have not yet; set env vars$S3_ACCESS_KEY
and$S3_SECRET_KEY
with the login and password beforehand:export S3_ACCESS_KEY=*****
export S3_SECRET_KEY=*****
- this script calls several other scripts in this directory; read on for their documentation
- if you are not running in the Singularity/Docker container, download and install the MinIO client first
Two options:
- download
hepmc
files from S3 and run Delphes locally- this is automated by
s3tools/make-fastsim-S3-config.sh
; run with no arguments to print the usage guide, and note the arguments are a bit different than those for the full simulation scripts - this script can download
hepmc
files from S3, run Delphes on them, and generate the config file- Delphes will run using one thread per Q2 minimum; edit the script to change this behavior
- similar to full simulations, you need to have the S3 access and secret
environment variables set in order to download
hepmc
files
- this is automated by
- alternatively, if you already have a directory of Delphes output ROOT files,
use use
make-fastsim-local-config.sh
to create a config file
- first, download the MinIO client
- if you are using the Singularity or Docker container, it is already installed
- the main command is
mc
- next, setup your client to connect to our S3 host (ask someone for credentials):
- first set env vars
$S3_ACCESS_KEY
and$S3_SECRET_KEY
with the login and password:export=S3_ACCESS_KEY=*****
export=S3_SECRET_KEY=*****
- then add our S3 host to MinIO client:
add-host.sh
- this only needs to be done once on your machine or container
- first set env vars
- find a directory on S3 with full simulation files; example S3 navigation commands:
- top-level ATHENA directory list:
mc ls S3/eictest/ATHENA
- show directory tree:
mc tree S3/eictest/ATHENA/RECO/acadia-v2.1/DIS/NC/
- list files:
mc ls S3/eictest/ATHENA/RECO/acadia-v2.1/DIS/NC/10x275/minQ2=1
- top-level ATHENA directory list:
- from here, you have two options, both of which are described in the next sections:
- generate a list of S3 file URLs, which will be "streamed" when running the analysis code, and will not be stored locally
- if streaming files from S3 is unsuitable, you can download them instead
using MinIO client:
mc cp S3/.../.../source.root ./your/data/directory/
; see below for a download script for automation and filtering
Next we need to make a "config file", which consists of the file name, and additional columns such as cross section and minimum Q2. Follow the next sections, whether you plan to stream from S3 or download.
The config files require the following columns, in this order:
- file name (relative to the top-level directory, unless you use an absolute path)
- the number of events for the weighting and cross section
- set to
0
for all - this is not related to
Analysis::maxEvents
, which limits how many events to process
- set to
- cross section (can be obtained from Pythia output logs, for example)
- minimum Q2
Patch: the above format is the original format, however, the current minimum Q2
weighting implementation requires a new format. In case we revert to using the
above old format, we temporarily use the script reformat-config.sh
to
transform the above old format into the new format. See comments in
reformat-config.sh
for details. Execute:
s3tools/reformat-config.sh files.config files.new.config
- cross sections are stored in
datarec/xsec/xsec.dat
; useread-xsec-table.sh
to get the cross section for a particular beam energy setting and Q2 minimum - in case you want to update
xsec.dat
:- the script
get-cross-section.sh
will read the cross section fromGenCrossSection
in ahepmc
file; useget-cross-section-ALL.sh
to automate runningget-cross-section.sh
over allhepmc
files in a specific directory tree; this will populatedatarec/xsec/*.xsec
files, one for eachhepmc
file - next use
tabulate-cross-section.py
to read the tree of.xsec
files into a table of cross sections, output todatarec/xsec/xsec.dat
- given the time it takes to run
get-cross-section.sh
, we try to store the most up-to-date version ofxsec.dat
in this repository
- the script
To stream, we need to make a list of URLs.
- run
generate-s3-list.sh
to generate a list of files- running it with no arguments will print the usage and required arguments
- the file list should appear in
stdout
; pipe the output somewhere, for example:- directly to a text file:
generate-s3-list.sh S3/.../... > files.txt
- add columns for
numEvents
(0, for all events), cross section (3e-4), and Q2min (1), to build a "config" file for an analysis:generate-s3-list.sh S3/.../... 0 3e-4 1 > files.config
- use
grep
to remove files that have"OLD"
in their filename:generate-s3-list.sh S3/.../... 0 3e-4 1 | grep -v OLD > files.config
- directly to a text file:
Instead of URLs, we make a list of local files, together with the columns needed to make a config file
- first, to download the files from S3 to a local directory
/my/local/directory
, pipe the output ofgenerate-s3-list.sh
(see above) todownload.sh
, for example:generate-s3-list.sh S3/.../... | grep -v OLD | download.sh /my/local/directory
- this will generate a downloader script
/my/local/directory/get-files.sh
, and execute it; thisget-files.sh
script is useful because it tells you where the files have been downloaded from, and can be easily executed again or editted in case the downloading experience any problems
- next run the script
generate-local-list.sh
, which is very similar togenerate-s3-list.sh
, but uses files from the specified local directory; see above for details and example usage- Important: execute this script from the main repository directory
(
../
), so that the file names are relative to the same directory that the macros are designed to be executed from:s3tools/generate-local-list.sh path/to/data
- or just specify an absolute path, which is more robust
- Important: execute this script from the main repository directory
(