RailProject

RailProject basics

rail.projects.project.RailProject is the main user facing class.

It collects all the elements needed to run a collection of studies using RAIL.

class rail.projects.RailProject(**kwargs)[source]

Main analysis driver class, this collects all the elements needed to run a collection of studies using RAIL.

The key concepts are:

1. analysis ‘Flavors’, which are versions of similar analyses with slightly different parameter settings and/or input files.

  1. ceci ‘Pipelines’, which run blocks of analysis code

A RailProject basically specifies which Pipelines to run under which flavors, and keeps track of the outputs.

RailProject.functionality_help() for more on class functionality

RailProject.configuration_help() for more on class configuration

Parameters:

kwargs (Any)

Rail Project Functionality

classmethod RailProject.functionality_help()[source]

The main functions that the use will use using are:

Return type:

None

load_config:

Read a yaml file and create a RailProject

reduce_data:

Make a reduced catalog from an input catalog by applying a selction and trimming unwanted colums. This is run before the analysis pipelines.

subsample_data:

Subsample data from a catalog to make a testing or training file. This is run after catalog level pipelines, but before pipeliens run on indvidudal training/ testing samples

build_pipelines:

Build ceci pipeline yaml files

run_pipeline_single:

Run a pipeline on a single file

run_pipeline_catalog:

Run a pipeline on a catalog of files

static RailProject.load_config(config_file)[source]

Create and return a RailProject from a yaml config file

Return type:

RailProject

Parameters:

config_file (str)

RailProject.reduce_data(catalog_template, output_catalog_template, reducer_class_name, input_selection, selection, dry_run=False, **kwargs)[source]

Reduce some data

Return type:

list[str]

Parameters:
  • catalog_template (str) – Tag for the input catalog

  • output_catalog_template (str) – Which label to apply to output dataset

  • reducer_class_name (str,) – Name of the class to use for subsampling

  • input_selection (str,) – Selection to use for the input

  • selection (str,) – Selection to apply

  • dry_run (bool) – If true, do not actually run

  • **kwargs – Used to provide values for additional interpolants.

Returns:

Paths to output files

Return type:

list[str]

RailProject.subsample_data(catalog_template, file_template, subsampler_class_name, subsample_name, dry_run=False, **kwargs)[source]

Subsammple some data

Return type:

str

Parameters:
  • catalog_template (str) – Tag for the input catalog

  • file_template (str) – Which label to apply to output dataset

  • subsampler_class_name (str,) – Name of the class to use for subsampling

  • subsample_name (str,) – Name of the subsample to create

  • dry_run (bool) – If true, do not actually run

  • **kwargs – Used to provide values for additional interpolants, e.g., flavor, basename, etc…

Returns:

Path to output file

Return type:

str

RailProject.subsample_data(catalog_template, file_template, subsampler_class_name, subsample_name, dry_run=False, **kwargs)[source]

Subsammple some data

Return type:

str

Parameters:
  • catalog_template (str) – Tag for the input catalog

  • file_template (str) – Which label to apply to output dataset

  • subsampler_class_name (str,) – Name of the class to use for subsampling

  • subsample_name (str,) – Name of the subsample to create

  • dry_run (bool) – If true, do not actually run

  • **kwargs – Used to provide values for additional interpolants, e.g., flavor, basename, etc…

Returns:

Path to output file

Return type:

str

RailProject.build_pipelines(flavor='baseline', *, force=False)[source]

Build ceci pipeline configuraiton files for this project

Return type:

int

Parameters:
  • flavor (str) – Which analysis flavor to draw from

  • force (bool) – Force overwriting of existing pipeline files

Returns:

0 if ok, error code otherwise

Return type:

int

RailProject.run_pipeline_single(pipeline_name, run_mode=RunMode.bash, **kwargs)[source]

Run pipeline on a single file

Return type:

int

Parameters:
  • pipeline_name (str) – Pipeline in question

  • run_mode (execution.RunMode) – How to run the pipeline (e.g., in bash, or in slurm)

  • **kwargs (Any) – Other interpolants, such as selection

Returns:

0 for success, error code otherwise

Return type:

int

RailProject.run_pipeline_catalog(pipeline_name, run_mode=RunMode.bash, **kwargs)[source]

Run pipeline on a catalog

Return type:

int

Parameters:
  • pipeline_name (str) – Pipeline in question

  • run_mode (execution.RunMode) – How to run the pipeline (e.g., in bash, or in slurm)

  • **kwargs (Any) – Other interpolants, such as selection

Returns:

0 for success, error code otherwise

Return type:

int

Rail Project Configuration

classmethod RailProject.configuration_help()[source]

Configuring a RailProject

Most of these element come from the shared library of elements, which is accesible from rail.projects.library

Shared configuration files

Includes: list[str]

List of shared configuration files to load

Project analysis flavors

See rail.projects.project.RailFlavor for the parameters needed to define an analysis ‘Flavor’.

Baseline: dict[str, Any]

Baseline configuration for this project. This is included in all the other analysis flavors

Flavors: list[dict[str, Any]]

List of all the analysis flavors that have been defined in this project

Bookkeeping elements

These are used to define the file paths for the project.

PathTemplates: dict[str, str]

Overrides for templates used to construct file paths

The defaults are given in rail.projects.name_utils

PathTemplates = dict(
    pipeline_path="{pipelines_dir}/{pipeline}_{flavor}.yaml",
    ceci_output_dir="{project_dir}/data/{selection}_{flavor}",
    ceci_file_path="{tag}_{stage}.{suffix}",
)

CommonPaths: dict[str, str]

Defintions of common paths used to construct file paths

The defaults are given in rail.projects.name_utils

CommonPaths = dict(
    root=".",          # needs to be overridden
    scratch_root=".",  # needs to be overridden
    project="",        # needs to be overridden
    project_dir="{root}/projects/{project}",
    project_scratch_dir="{scratch_root}/projects/{project}",
    catalogs_dir="{root}/catalogs",
    pipelines_dir="{project_dir}/pipelines",
)

IterationVars: dict[str, list[str]]

Iteration variables to construct the catalogs

Shared elements

Things that are pulled from the library, each of these is just a list of the names of things that are defined in the library that can be used in this project. The default is to use all the items defined in the library.

Catalogs: list[str] These are actually CatalogTemplates

Files: list[str] These are actually FileTemplates

Pipelines: list[str] These are actually PipelineTemplates

Reducers: list[str] These reduce the input data catalog

Subsamplers: list[str] These subsample catalogs to get individual files

Selections: list[str] These are the selection parameters

Subsamples: list[str] These are the subsample parameters

PZAlgorithms: list[str]

SpecSelections: list[str]

Classifiers: list[str]

Summarizers: list[str]

ErrorModels: list[str]

Return type:

None