Job Submission

 

EUTelescope, being based on Marlin, requires xml steering files to control the details of the data processing. These files are required for every raw data file (i.e. every run).

Using a job submission tool, the task of adjusting Marlin steering files for individual run settings becomes much easier to configure and control.

Previous EUTelescope versions shipped with two different job submission programs: pysub and simplesub. From version 0.8 onwards, we recommend to replace these tools in your analysis with jobsub. Jobsub offers a similar workflow but gives much more flexiblity and control than the previous programs.

 

The jobsub submission tool

 

1 Overview

jobsub is a tool for the convenient run-specific modification of
Marlin steering files and their execution through the Marlin
processor.

2 Usage

usage: jobsub.py [-h] [--option NAME=VALUE] [-c FILE] [-csv FILE]
                 [--log-file FILE] [-l LEVEL] [-s] [--dry-run]
                 jobtask [runs [runs ...]]

A tool for the convenient run-specific modification of Marlin steering files
and their execution through the Marlin processor

positional arguments:
  jobtask               Which task to submit (e.g. convert, hitmaker, align);
                        task names are arbitrary and can be set up by the
                        user; they determine e.g. the config section and
                        default steering file names.
  runs                  The runs to be analyzed; can be a list of single runs
                        and/or a range, e.g. 1056-1060.

optional arguments:
  -h, --help            show this help message and exit
  --option NAME=VALUE, -o NAME=VALUE
                        Specify further options such as 'beamenergy=5.3'. This
                        switch be specified several times for multiple options
                        or can parse a comma-separated list of options. This
                        switch overrides any config file options.
  -c FILE, --conf-file FILE, --config FILE
                        Load specified config file with global and task
                        specific variables
  --concatenate         Modifies run range treatment: concatenate all runs
                        into first run (e.g. to combine runs for alignment) by
                        combining every options that includes the string
                        '@RunRange multiple times, once for each run of the
                        range specified.
  -csv FILE, --csv-file FILE
                        Load additional run-specific variables from table
                        (text file in csv format)
  --log-file FILE       Save submission log to specified file
  -l LEVEL, --log LEVEL
                        Sets the verbosity of log messages during job
                        submission where LEVEL is either debug, info, warning
                        or error
  -s, --silent          Suppress non-error (stdout) Marlin output to console
  --dry-run             Write steering files but skip actual Marlin execution

3 Preparation of Steering File Templates

Steering file templates are valid Marlin steering files (in xml
format) where single values are replaced by variables in the form
"@SomeVariable@".

When jobsub is run, these placeholders are filled with a
user-defined value that can be specified through any of these
sources (in order of precedence): command-line arguments, a config
file, or a table with a row for each run number processed.

There is only one predefined placeholder, @RunNumber@, which will be
substituted with the current run number (padded with leading zeros
to six digits, e.g. 001234).

4 Configuration

There are only very few predefined options: TemplateFile,
TemplatePath, and LogPath. The former are used to find the correct
steering file template for the current task while the latter sets
the path where the final steering file and job output of each run is
stored (as zip file). The default for the file name is
"TASK-tmp.xml", where TASK corresponds to the taskname given on the
command line. The default path to the template is ".", i.e. the
current directory.

You can modify these options is the same way as placeholders in the
template file, as described below.

4.1 Command Line

Variable substitutions can be specified using the –option or -o
command line switches, e.g.

jobsub.py --option beamenergy=5.3 align 1234

This switch be specified several times for multiple options or can
parse a comma-separated list of options. This switch overrides any
config file options.

4.2 Config File

Config files are text file consisting of sections (indicated by '[]'):

  • a global section called [DEFAULT]
  • task-specific sections

as well as "name: value" or "name=value" entries, where 'name' are
arbitrary steering file variables (case-insensitive).

Some noteworthy features include:

  • comment prefix characters are # and ;
  • interpolation of format strings is supported, for example:
    [My Section]
    foodir: %(dir)s/whatever
    dir=frob
    long: this value continues
      in the next line
    

    would resolve the %(dir)s to the value of dir (frob in this case).

  • some default interpolations are %(home)s and %(eutelescopePath)s
    which are set up with the environment variables $HOME and
    $EUTELESCOPE, respectively.
  • The string "@RunNumber@" will be replaced in the template after
    all other variable strings were filled-in; therefore, you can use
    the @RunNumber@ placeholder inside options, e.g. the file name.
    It will be replaced by the run number padded with leading zeros
    to 6 digits.
  • for more details, see the documentation to the Python module used
    for parsing: http://docs.python.org/2/library/configparser.html

4.2.1 Example

The following is an excerpt from jobsub/examples/datura-noDUT/config.cfg:

[DEFAULT]
# Global section. Settings can be overwritten through task-specific sections
# The python config parser interprets '%(NAME)s' as the corresponding variable NAME.
# The template file name can be set with TemplateFile = file.xml. The default is '[task]-tmp.xml'

BasePath                   = %(eutelescopepath)s/jobsub/examples/datura-noDUT
TemplatePath               = %(BasePath)s/steering-templates

# Set the folder which contains the raw/native data files
# You can find a data sample at /afs/desy.de/group/telescopes/EutelTestData/TestExampleDaturaNoDUT
NativePath         = /data/user/eicht/testbeam/telescope_150mm/native

# Geometry file
GearFile                   = gear_desy2012_150mm.xml

# Path to the geometry file
GeometryPath               = %(BasePath)s

# Output format; @RunNumber@ is the current run number padded with leading zeros to 6 digits
FilePrefix                 = run@RunNumber@        

# Limit processing of a run to a certain number of events
MaxRecordNumber            = 99999999

# Output subfolder structure
DatabasePath               = ./output/database
LcioPath                   = ./output/lcio
HistogramPath              = ./output/histograms
LogPath                    = ./output/logs

# The verbosity used by the eutelescope producers 
# (i.e. MESSAGE, DEBUG, ERROR with appended level from 0..9, e.g. MESSAGE5).
# If you set this to DEBUG0 but you do not see any DEBUG messages, 
# make sure that you set CMAKE_BUILD_TYPE to Debug in the 
# $EUTELESCOPE/CMakeList.txt file.
Verbosity          = MESSAGE
# Verbosity = DEBUG


# Section for the converter step
[converter]
# Hot Pixel detection
FiringFrequency            = 0.1


# Section for the clustering step
[clustering]
# Clustering alogrithm
ClusteringAlgorithm        = SparseCluster2


# Section for the hitmaker step
[hitmaker]
# Choose center of gravity algorithm (FULL, NPixel or NxMPixel), see documentation for information.
CenterOfGravity            = FULL

# Parameters for center of gravity algorithms:
NPixel                     = 9
NxMPixel           = 3 3

4.3 Table (comma-separated text file)

  • format: e.g.
  • commented lines (starting with #) are ignored
  • first row (after comments) has to provide column headers which identify the variables in the steering template to replace (case-insensitive)
  • requires one column labeled "RunNumber"
  • only considers placeholders left in the steering template after processing command-line arguments and config file options

4.3.1 Example

An org-mode emacs table would have the following form:

| RunNumber | BeamEnergy |
|      4115 |          1 |
|      4116 |          2 |
|      4117 |          3 |
|      4118 |          4 |
|      4119 |          5 |

Using this table, the variable @BeamEnergy@ in the templates would
be replaced by the value corresponding to the current run number.

4.4 Concatenation

If you have an option e.g. the LCIO input files that you want to
fill with several runs in one steering file, you can use a command
line switch to activate concatenation. This replaces any steering
file placeholder whose corresponding option contains the string
"@RunRange" multiple times, once for every run specified.

jobsub.py --concatenate --option LCIOInputFiles=/my/path/to/data/@RunRange.lcio align 1234 1235-1237

This will create one steering file (for run 1234) in which the placeholder
"@LCIOInputFiles@" is replaced four times by its value with
@RunRange@ replaced by values from 1234 to 1237, e.g.

<parameter name="FileName" type="string" value= @LCIOInputFiles@/>

becomes:

<parameter name="FileName" type="string" value= /my/path/to/data/1234.lcio 
        /my/path/to/data/1235.lcio /my/path/to/data/1236.lcio /my/path/to/data/1237.lcio/>

This can be useful if you want to combine several runs e.g. for alignment.

5 Example

The following commands show how you would execute the telescope-only
analysis that is provided as an example:

First, create the directory structure and set up some environment variables for convenience:

export ANALYSIS=/data/testbeam/analysis
mkdir -p $ANALYSIS/output/histograms  && mkdir -p $ANALYSIS/output/database \
           && mkdir -p $ANALYSIS/output/logs \
           && mkdir -p $ANALYSIS/output/lcio

The analysis we want to perform is controlled by a config file
(config.cfg), a csv-table (runlist.csv) and steering file
templates (*.xml), all located in the examples subdirectory of
jobsub:

export ANALYSIS_CONF=$EUTELESCOPE/jobsub/examples/datura-noDUT

In principle, neither the table nor the config are required as long
as the template files do not contain any variables except for the
run number ('@RunNumber@').

After making sure that the path to the raw data files in config.cfg
is correct, we can execute the analysis step-by-step:

First, converter step:

cd $ANALYSIS
$EUTELESCOPE/jobsub/jobsub.py --config=${ANALYSIS_CONF}/config.cfg \
                 -csv ${ANALYSIS_CONF}/runlist.csv converter 97

Here, jobsub will generate a steering file using the template file
specified in the config file (default would be 'converter-tmp.xml'),
thereby replacing any variables given in the config and table files.
The final steering file will be processed by executing Marlin.

The next steps follow the same pattern:

$EUTELESCOPE/jobsub/jobsub.py --config=${ANALYSIS_CONF}/config.cfg \
                  -csv ${ANALYSIS_CONF}/runlist.csv clustering 97
$EUTELESCOPE/jobsub/jobsub.py --config=${ANALYSIS_CONF}/config.cfg \
                  -csv ${ANALYSIS_CONF}/runlist.csv filter 97
$EUTELESCOPE/jobsub/jobsub.py --config=${ANALYSIS_CONF}/config.cfg \
                  -csv ${ANALYSIS_CONF}/runlist.csv hitmaker 97
$EUTELESCOPE/jobsub/jobsub.py --config=${ANALYSIS_CONF}/config.cfg \
                  -csv ${ANALYSIS_CONF}/runlist.csv align 97
$EUTELESCOPE/jobsub/jobsub.py --config=${ANALYSIS_CONF}/config.cfg 
                  -csv ${ANALYSIS_CONF}/runlist.csv fitter 97

For this example, the output of each step is stored in
sub directories of the $ANALYSIS path. This is configurable through
the paths in the config.cfg file.

If all steps succeed, we end up with an LCIO-Collection of tracks on
which detailed studies can be performed. In fact, the last step
already determines unbiased residuals for each telescope plane using
the EUTelDUTHistograms processor. The corresponding histograms can
be found in the root file
$ANALYSIS/output/histograms/run000097-fitter.root .

6 Appendix 1: pysub migration

Currently, jobsub does not directly support submission on the grid;
if you need this functionality, please stay with pysub for now.

Please consider setting up a template from scratch: this will ensure
that all existing settings are present and show the correct defaults
and that the documenting comments are up-to-date. Furthermore, the
resulting steering file and jobsub configuration will be much simpler
than their pysub counterparts.

If you still want to keep the steering files used previously with
pysub, you can manually convert them. You mainly need to modify the
configuration files: (replace all bold names with your settings)

  • Set up the global settings section "[Default]"
  • Set up sections for each of your analysis tasks, e.g.
    [align]
    [converter]
    [hitmaker]
    [fitter]
    

    or rename existing sections such as [AlignOptions] to the
    corresponding task name.

  • Move task-specific options to the corresponding section
  • if your steering files do not follow the naming convention
    NAME-tmp.xml you need to have the setting
    TemplateFile = YOUR_FILE_NAME.xml
    

    in each task section.
    Add the setting

    src_conf{TemplatePath = (RELATIVE)_PATH_TO_YOUR_TEMPLATES}
    

    to the global section (or task sections if it varies
    for each task)

  • delete obsolete sections: "[SteeringTemplate]"
  • Move any non-task specific sections into the global [default]
    section: this applies e.g. to [General], [Logger] and [LOCAL] options
  • Rename the "LocalFolder…" options:
    LocalFolderNative -> NativeFolder
    LocalFolderGear -> GearPath
    LocalFolderLcioRaw -> LcioRawPath
    LocalFolderHistoInfo -> HistoInfoPath
    LocalFolderTASKNAMEJoboutput -> LogPath
    LocalFolderTASKNAMEHisto -> HistoPath (either globally or in local sections)
    LocalFolderDBTASKNAME -> DBPath (again, either globally or locally)
    LocalFolderTASKNAMEResults -> ResultsPath (globally or locally)
    
  • Set the following options:
    Globally:
    Output = @RunNumber@
    

    In each task section, modify accordingly:
    e.g. in [align]:

    InputFile = %(ResultsPath)s/hit-@RunNumber@.slcio
    
  • Run jobsub; if it complains about "Missing configuration
    parameters", check your template and config for any of the listed
    parameters and set their value in the config or remove them from
    the steering file.