Pipelines
A key concept in rail_projects are ceci Pipelines, which run blocks of analysis code using ceci.
Pipelines basics
A RailProject will have access to several PipelineTemplates and use these to define the Pipelines that it runs.
To do this, it will need some additional information.
What Flavor to run the Pipeline with. This is specified by the user, and will set up the Pipeline to expect the correct column names.
What options to use to construct the Pipeline, such as which algorihms to use, or additional paramters. The is done by merging default infomation in the PipelineTemplate with Flavor specific information.
Any specific overrides for any of the Pipeline stages given in the Flavor definitinos.
How to find the input files. How this is done depends on the type of Pipeline and if it is being run on a single file or an entire catalog.
Where to write the output data. How this is done depends on the type of Pipeline and if it is being run on a single file or an entire catalog.
Running a Pipline of a Catalog
When a Pipeline is run on a catalog of files, the input_catalog_template, user supplied interpolants and possibly the input_catalog_basename parameters are used to construct list of input files
The input_catalog_template should refer to a CatalogTemplate that will give the template for the catalog, e.g., {catalogs_dir}/{project}_{sim_version}/{healpix}/part-0.parquet.
All of the interpolants must be given in one of there places:
The IterationVars block of the RailProject
The CommonPaths block of the RailProject
Explicitly in the keyword arguements provided to the call to run the catalog
The output_catalog_template is used to define the output directory in much the same way.
Note that some catalogs have {basename} as an interpolant. Since ceci write all of its output files to the same directory by default, if we want to, say, create a different version of a degraded catalog by selecting the output of a different degrader, we can simply do so by picked a different file from the same directory. We can specifiy this by setting the input_catalog_basename parameter.
Running a Pipline single set of inputs
When a Pipeline is run on a single set of inputs, the input_file_templates parameter and user supplied interpolants are used to construct list of input files.
For example, the input_file_templates might look like this:
input_file_templates:
input_train:
flavor: baseline
tag: train
input_test:
flavor: baseline
tag: test
This would specify that the are two inputs input_train and input_test and that they should be resolved by getting the test and train tags from the FileAliases block the current Flavor, and resolving them with using flavor=baseline as an interpolant.
This sounds a bit complicated. The idea here is that this is a mechanism that allow use to input files created in one Flavor by another Flavor. E.g., we can make testing / training files in the baseline Flavor and then use them in many other Flavors.
Pipeline definitions
Here is an example of a Pipeline that we typically only run on catalogs.
- PipelineTemplate:
name: truth_to_observed
pipeline_class: rail.pipelines.degradation.truth_to_observed.TruthToObservedPipeline
input_catalog_template: reduced
output_catalog_template: degraded
kwargs:
error_models: ['all']
selectors: ['all']
blending: true
Here is an example of a Pipeline that we typically run on individual input files.
- PipelineTemplate:
name: pz
pipeline_class: rail.pipelines.estimation.pz_all.PzPipeline
input_file_templates:
input_train:
flavor: baseline
tag: train
input_test:
flavor: baseline
tag: test
kwargs:
algorithms: ['all']
Building pipelines with rail.projects
- RailProject.build_pipelines(flavor='baseline', *, force=False)[source]
Build ceci pipeline configuraiton files for this project
- Return type:
int- Parameters:
flavor (str) – Which analysis flavor to draw from
force (bool) – Force overwriting of existing pipeline files
- Returns:
0 if ok, error code otherwise
- Return type:
int
Running pipelines with rail.projects
- RailProject.run_pipeline_single(pipeline_name, run_mode=RunMode.bash, **kwargs)[source]
Run pipeline on a single file
- Return type:
int- Parameters:
pipeline_name (str) – Pipeline in question
run_mode (execution.RunMode) – How to run the pipeline (e.g., in bash, or in slurm)
**kwargs (Any) – Other interpolants, such as selection
- Returns:
0 for success, error code otherwise
- Return type:
int
- RailProject.run_pipeline_catalog(pipeline_name, run_mode=RunMode.bash, **kwargs)[source]
Run pipeline on a catalog
- Return type:
int- Parameters:
pipeline_name (str) – Pipeline in question
run_mode (execution.RunMode) – How to run the pipeline (e.g., in bash, or in slurm)
**kwargs (Any) – Other interpolants, such as selection
- Returns:
0 for success, error code otherwise
- Return type:
int