*********
Pipelines
*********

A key concept in `rail_projects` are ceci `Pipelines`, which run blocks of analysis code using `ceci`.


================
Pipelines basics
================

A `RailProject` will have access to several `PipelineTemplates` and
use these to define the `Pipelines` that it runs.

To do this, it will need some additional information.

1. What `Flavor` to run the `Pipeline` with.  This is specified by the
   user, and will set up the `Pipeline` to expect the correct column
   names.
2. What options to use to construct the `Pipeline`, such as which
   algorihms to use, or additional paramters.  The is done by merging
   default infomation in the `PipelineTemplate` with `Flavor` specific information.
3. Any specific overrides for any of the `Pipeline` stages given in
   the `Flavor` definitinos.
4. How to find the input files.  How this is done depends on the type
   of `Pipeline` and if it is being run on a single file or an entire catalog.
5. Where to write the output data.   How this is done depends on the type
   of `Pipeline` and if it is being run on a single file or an entire catalog.


Running a Pipline of a Catalog
------------------------------
   
When a `Pipeline` is run on a catalog of files, the
`input_catalog_template`, user supplied interpolants and possibly the `input_catalog_basename` parameters are
used to construct list of input files

The `input_catalog_template` should refer to a `CatalogTemplate` that
will give the template for the catalog, e.g.,
`{catalogs_dir}/{project}_{sim_version}/{healpix}/part-0.parquet`.

All of the interpolants must be given in one of there places:

1. The IterationVars block of the `RailProject`
2. The CommonPaths block of the `RailProject`
3. Explicitly in the keyword arguements provided to the call to run the catalog


The `output_catalog_template` is used to define the output directory
in much the same way.


Note that some catalogs have `{basename}` as an interpolant.  Since
`ceci` write all of its output files to the same directory by
default, if we want to, say, create a different version of a
`degraded` catalog by selecting the output of a different degrader, we
can simply do so by picked a different file from the same directory.
We can specifiy this by setting the `input_catalog_basename`
parameter.


Running a Pipline single set of inputs
--------------------------------------

When a `Pipeline` is run on a single set of inputs, the
`input_file_templates` parameter and user supplied interpolants are
used to construct list of input files.

For example, the input_file_templates might look like this:


.. code-block:: yaml

    input_file_templates:
      input_train:
        flavor: baseline
        tag: train
      input_test:
        flavor: baseline
        tag: test


This would specify that the are two inputs `input_train` and
`input_test` and that they should be resolved by getting the `test`
and `train` tags from the `FileAliases` block the current `Flavor`,
and resolving them with using `flavor=baseline` as an interpolant.

This sounds a bit complicated.  The idea here is that this is a
mechanism that allow use to input files created in one `Flavor` by
another `Flavor`.  E.g., we can make testing / training files in the
baseline `Flavor` and then use them in many other `Flavors`.  


====================
Pipeline definitions
====================

Here is an example of a `Pipeline` that we typically only run on catalogs.		

.. code-block:: yaml
		
  - PipelineTemplate:
      name: truth_to_observed
      pipeline_class: rail.pipelines.degradation.truth_to_observed.TruthToObservedPipeline
      input_catalog_template: reduced
      output_catalog_template: degraded
      kwargs:
        error_models: ['all']
        selectors: ['all']
        blending: true


Here is an example of a `Pipeline` that we typically run on individual
input files.

.. code-block:: yaml

  - PipelineTemplate:
      name: pz
      pipeline_class: rail.pipelines.estimation.pz_all.PzPipeline
      input_file_templates:
        input_train:
          flavor: baseline
          tag: train
        input_test:
          flavor: baseline
          tag: test
      kwargs:
        algorithms: ['all']
	

=====================================
Building pipelines with rail.projects
=====================================

.. automethod:: rail.projects.RailProject.build_pipelines		
    :noindex:


====================================
Running pipelines with rail.projects
====================================

.. automethod:: rail.projects.RailProject.run_pipeline_single		
    :noindex:


.. automethod:: rail.projects.RailProject.run_pipeline_catalog
    :noindex: