rail.projects.project module

class rail.projects.project.RailFlavor(**kwargs)[source]

Bases: Configurable

Description of a single analysis variation

This includes

a name for the variant, used to construct filenames

a ‘catalog_tag’, which identifies format of the data being used, and sets the expected names of columns accordingly

a list of ‘pipelines’ that can be run in this variant

a list of ‘file_aliases’ that can be used to specify the input files used in this variant

a dict of ‘pipeline_overrides’ that modify the behavior of the various pipelines

Parameters:

kwargs (Any)

config_options: dict[str, StageParameter] = {'catalog_tag': Parameter(tag for catalog being used, type: <class 'str'>, default: None [optional]), 'file_aliases': Parameter(file aliases used, type: <class 'dict'>, default: {} [optional]), 'name': Parameter(Flavor name, type: <class 'str'>, default: None [required]), 'pipeline_overrides': Parameter(file aliases used, type: <class 'dict'>, default: {} [optional]), 'pipelines': Parameter(pipelines being used, type: <class 'list'>, default: ['all'] [optional])}
class rail.projects.project.RailProject(**kwargs)[source]

Bases: Configurable

Main analysis driver class, this collects all the elements needed to run a collection of studies using RAIL.

The key concepts are:

1. analysis ‘Flavors’, which are versions of similar analyses with slightly different parameter settings and/or input files.

  1. ceci ‘Pipelines’, which run blocks of analysis code

A RailProject basically specifies which Pipelines to run under which flavors, and keeps track of the outputs.

RailProject.functionality_help() for more on class functionality

RailProject.configuration_help() for more on class configuration

Parameters:

kwargs (Any)

add_flavor(name, **kwargs)[source]

Add a new flavor to the Project

Return type:

RailFlavor

Parameters:
  • name (str)

  • kwargs (Any)

build_pipelines(flavor='baseline', *, force=False)[source]

Build ceci pipeline configuraiton files for this project

Return type:

int

Parameters:
  • flavor (str) – Which analysis flavor to draw from

  • force (bool) – Force overwriting of existing pipeline files

Returns:

0 if ok, error code otherwise

Return type:

int

clear_cache()[source]

Reset all the cached configurable items

Return type:

None

config_options: dict[str, StageParameter] = {'Baseline': Parameter(Baseline analysis configuration, type: <class 'dict'>, default: None [required]), 'Catalogs': Parameter(Catalog templates to use, type: <class 'list'>, default: ['all'] [optional]), 'Classifiers': Parameter(Tomographic classifiers to use, type: <class 'list'>, default: ['all'] [optional]), 'CommonPaths': Parameter(Paths to shared directories, type: <class 'dict'>, default: {} [required]), 'ErrorModels': Parameter(Photometric ErrorModels to use, type: <class 'list'>, default: ['all'] [optional]), 'Files': Parameter(Catalog templates to use, type: <class 'list'>, default: ['all'] [optional]), 'Flavors': Parameter(Analysis variants, type: <class 'list'>, default: [] [optional]), 'Includes': Parameter(Files to include, type: <class 'list'>, default: [] [optional]), 'IterationVars': Parameter(Iteration variables to use, type: <class 'dict'>, default: {} [optional]), 'Name': Parameter(Project name, type: <class 'str'>, default: None [required]), 'PZAlgorithms': Parameter(p(z) algorithms to use, type: <class 'list'>, default: ['all'] [optional]), 'PathTemplates': Parameter(File path templates, type: <class 'dict'>, default: {} [optional]), 'Pipelines': Parameter(Catalog templates to use, type: <class 'list'>, default: ['all'] [optional]), 'Reducers': Parameter(Data reducers to use, type: <class 'list'>, default: ['all'] [optional]), 'Selections': Parameter(Data selections to use, type: <class 'list'>, default: ['all'] [optional]), 'SpecSelections': Parameter(Spectroscopic selections to use, type: <class 'list'>, default: ['all'] [optional]), 'Subsamplers': Parameter(Data subsamplers to use, type: <class 'list'>, default: ['all'] [optional]), 'Subsamples': Parameter(Subsample defintions to use, type: <class 'list'>, default: ['all'] [optional]), 'Summarizers': Parameter(n(z) summarizers to use, type: <class 'list'>, default: ['all'] [optional])}
classmethod configuration_help()[source]

Configuring a RailProject

Most of these element come from the shared library of elements, which is accesible from rail.projects.library

Shared configuration files

Includes: list[str]

List of shared configuration files to load

Project analysis flavors

See rail.projects.project.RailFlavor for the parameters needed to define an analysis ‘Flavor’.

Baseline: dict[str, Any]

Baseline configuration for this project. This is included in all the other analysis flavors

Flavors: list[dict[str, Any]]

List of all the analysis flavors that have been defined in this project

Bookkeeping elements

These are used to define the file paths for the project.

PathTemplates: dict[str, str]

Overrides for templates used to construct file paths

The defaults are given in rail.projects.name_utils

PathTemplates = dict(
    pipeline_path="{pipelines_dir}/{pipeline}_{flavor}.yaml",
    ceci_output_dir="{project_dir}/data/{selection}_{flavor}",
    ceci_file_path="{tag}_{stage}.{suffix}",
)

CommonPaths: dict[str, str]

Defintions of common paths used to construct file paths

The defaults are given in rail.projects.name_utils

CommonPaths = dict(
    root=".",          # needs to be overridden
    scratch_root=".",  # needs to be overridden
    project="",        # needs to be overridden
    project_dir="{root}/projects/{project}",
    project_scratch_dir="{scratch_root}/projects/{project}",
    catalogs_dir="{root}/catalogs",
    pipelines_dir="{project_dir}/pipelines",
)

IterationVars: dict[str, list[str]]

Iteration variables to construct the catalogs

Shared elements

Things that are pulled from the library, each of these is just a list of the names of things that are defined in the library that can be used in this project. The default is to use all the items defined in the library.

Catalogs: list[str] These are actually CatalogTemplates

Files: list[str] These are actually FileTemplates

Pipelines: list[str] These are actually PipelineTemplates

Reducers: list[str] These reduce the input data catalog

Subsamplers: list[str] These subsample catalogs to get individual files

Selections: list[str] These are the selection parameters

Subsamples: list[str] These are the subsample parameters

PZAlgorithms: list[str]

SpecSelections: list[str]

Classifiers: list[str]

Summarizers: list[str]

ErrorModels: list[str]

Return type:

None

classmethod functionality_help()[source]

The main functions that the use will use using are:

Return type:

None

load_config:

Read a yaml file and create a RailProject

reduce_data:

Make a reduced catalog from an input catalog by applying a selction and trimming unwanted colums. This is run before the analysis pipelines.

subsample_data:

Subsample data from a catalog to make a testing or training file. This is run after catalog level pipelines, but before pipeliens run on indvidudal training/ testing samples

build_pipelines:

Build ceci pipeline yaml files

run_pipeline_single:

Run a pipeline on a single file

run_pipeline_catalog:

Run a pipeline on a catalog of files

static generate_ceci_command(pipeline_path, config, inputs, output_dir='.', log_dir='.', **kwargs)[source]

Generate a ceci command to run a pipeline

Return type:

list[str]

Parameters:
  • pipeline_path (str) – Path to the pipline yaml file

  • config (str | None) – Path to the pipeline config yaml file

  • inputs (dict) – Input to the pipeline

  • output_dir (str, default=".") – Pipeline output directory

  • log_dir (str, default=".") – Pipeline log directory

  • **kwargs – These are appended to the command in key=value pairs

static generate_kwargs_iterable(**iteration_dict)[source]

Generate a list of kwargs dicts from a dict of lists

Return type:

list[dict]

Parameters:

iteration_dict (Any)

get_algorithm(algorithm_type, algo_name)[source]

Get an algorithm of a particular type with a specific name

Return type:

dict[str, str]

Parameters:
  • algorithm_type (str)

  • algo_name (str)

get_algorithms(algorithm_type)[source]

Get all the algorithms of a particular type

Return type:

dict[str, dict[str, str]]

Parameters:

algorithm_type (str)

get_catalog(name, **kwargs)[source]

Resolve the path for a particular catalog file

Return type:

str

Parameters:
  • name (str)

  • kwargs (Any)

get_catalog_files(name, **kwargs)[source]

Resolve the paths for a particular catalog file

Return type:

list[str]

Parameters:
  • name (str)

  • kwargs (Any)

get_catalogs()[source]

Get the dictionary describing all the types of data catalogs

Return type:

dict

get_classifier(name)[source]

Get the information about a particular tomographic bin classification

Return type:

dict

Parameters:

name (str)

get_classifiers()[source]

Get the dictionary describing all the tomographic bin classification

Return type:

dict

get_common_path(path_key, **kwargs)[source]

Resolve and return a common path using the kwargs as interopolants

Return type:

str

Parameters:
  • path_key (str)

  • kwargs (Any)

get_common_paths()[source]

Return the dictionary of common paths

Return type:

dict

get_error_model(name)[source]

Get the information about a particular photometric error model algorithms

Return type:

dict

Parameters:

name (str)

get_error_models()[source]

Get the dictionary describing all the photometric error model algorithms

Return type:

dict

get_file(name, **kwargs)[source]

Resolve and return a file using the kwargs as interpolants

Return type:

str

Parameters:
  • name (str)

  • kwargs (Any)

get_file_for_flavor(flavor, label, **kwargs)[source]

Resolve the file associated to a particular flavor and label

E.g., flavor=baseline and label=train would give the baseline training file

Return type:

str

Parameters:
  • flavor (str)

  • label (str)

  • kwargs (Any)

get_file_metadata_for_flavor(flavor, label)[source]

Resolve the metadata associated to a particular flavor and label

E.g., flavor=baseline and label=train would give the baseline training metadata

Return type:

RailProjectFileTemplate

Parameters:
  • flavor (str)

  • label (str)

get_files()[source]

Return the dictionary of specific file templates

Return type:

dict[str, RailProjectFileTemplate]

get_flavor(name)[source]

Resolve the configuration for a particular analysis flavor variant

Return type:

RailFlavor

Parameters:

name (str)

get_flavor_args(flavors)[source]

Get the ‘flavors’ to iterate a particular command over

Return type:

list[str]

Parameters:

flavors (list[str])

Notes

If the flavor ‘all’ is included in the list of flavors, this will replace the list with all the flavors defined in this project

get_flavors()[source]

Return the dictionary of analysis flavor variants

Return type:

dict[str, RailFlavor]

get_path(path_key, **kwargs)[source]

Resolve and return a path using the kwargs as interopolants

Return type:

str

Parameters:
  • path_key (str)

  • kwargs (Any)

get_path_templates()[source]

Return the dictionary of templates used to construct paths

Return type:

dict

get_pipeline(name)[source]

Get the information about a particular ceci pipeline

Return type:

RailPipelineTemplate

Parameters:

name (str)

get_pipelines()[source]

Get the dictionary describing all the types of ceci pipelines

Return type:

dict[str, RailPipelineTemplate]

get_pzalgorithm(name)[source]

Get the information about a particular PZ estimation algorithm

Return type:

dict

Parameters:

name (str)

get_pzalgorithms()[source]

Get the dictionary describing all the PZ estimation algorithms

Return type:

dict

get_selection(name)[source]

Get a particular selection by name

Return type:

RailSelection

Parameters:

name (str)

get_selection_args(selections)[source]

Get the ‘selections’ to iterate a particular command over

Return type:

list[str]

Parameters:

selections (list[str])

Notes

If the selection ‘all’ is included in the list of selections, this will replace the list with all the selections defined in this project

get_selections()[source]

Get the dictionary describing all the selections

Return type:

dict[str, RailSelection]

get_spec_selection(name)[source]

Get the information about a particular spectroscopic selection algorithm

Return type:

dict

Parameters:

name (str)

get_spec_selections()[source]

Get the dictionary describing all the spectroscopic selection algorithms

Return type:

dict

get_subsample(name)[source]

Get a particular subsample by name

Return type:

RailSubsample

Parameters:

name (str)

get_subsamples()[source]

Get the dictionary describing all the subsamples

Return type:

dict[str, RailSubsample]

get_summarizer(name)[source]

Get the information about a particular NZ summarization algorithms

Return type:

dict

Parameters:

name (str)

get_summarizers()[source]

Get the dictionary describing all the NZ summarization algorithms

Return type:

dict

static load_config(config_file)[source]

Create and return a RailProject from a yaml config file

Return type:

RailProject

Parameters:

config_file (str)

make_pipeline_catalog_commands(pipeline_name, flavor, **kwargs)[source]

Build the commands to run pipeline on a catalog

Return type:

list[tuple[list[list[str]], str]]

Parameters:
  • pipeline_name (str) – Pipeline in question

  • flavor (str) – Flavor to apply

  • **kwargs (Any) – Other interpolants, such as selection

Returns:

List of pairs of series of commands and potential location for slurm batch file

Return type:

list[tuple[list[list[str]], str]

make_pipeline_single_input_command(pipeline_name, flavor, **kwargs)[source]

Build the command to run pipeline on a single file

Return type:

list[str]

Parameters:
  • pipeline_name (str) – Pipeline in question

  • flavor (str) – Flavor to apply

  • **kwargs (Any) – Other interpolants, such as selection

Returns:

Tokens in the command line, usable by subprocess.run()

Return type:

list[str]

property name: str
projects: dict[str, RailProject] = {}
reduce_data(catalog_template, output_catalog_template, reducer_class_name, input_selection, selection, dry_run=False, **kwargs)[source]

Reduce some data

Return type:

list[str]

Parameters:
  • catalog_template (str) – Tag for the input catalog

  • output_catalog_template (str) – Which label to apply to output dataset

  • reducer_class_name (str,) – Name of the class to use for subsampling

  • input_selection (str,) – Selection to use for the input

  • selection (str,) – Selection to apply

  • dry_run (bool) – If true, do not actually run

  • **kwargs – Used to provide values for additional interpolants.

Returns:

Paths to output files

Return type:

list[str]

run_pipeline_catalog(pipeline_name, run_mode=RunMode.bash, **kwargs)[source]

Run pipeline on a catalog

Return type:

int

Parameters:
  • pipeline_name (str) – Pipeline in question

  • run_mode (execution.RunMode) – How to run the pipeline (e.g., in bash, or in slurm)

  • **kwargs (Any) – Other interpolants, such as selection

Returns:

0 for success, error code otherwise

Return type:

int

run_pipeline_single(pipeline_name, run_mode=RunMode.bash, **kwargs)[source]

Run pipeline on a single file

Return type:

int

Parameters:
  • pipeline_name (str) – Pipeline in question

  • run_mode (execution.RunMode) – How to run the pipeline (e.g., in bash, or in slurm)

  • **kwargs (Any) – Other interpolants, such as selection

Returns:

0 for success, error code otherwise

Return type:

int

subsample_data(catalog_template, file_template, subsampler_class_name, subsample_name, dry_run=False, **kwargs)[source]

Subsammple some data

Return type:

str

Parameters:
  • catalog_template (str) – Tag for the input catalog

  • file_template (str) – Which label to apply to output dataset

  • subsampler_class_name (str,) – Name of the class to use for subsampling

  • subsample_name (str,) – Name of the subsample to create

  • dry_run (bool) – If true, do not actually run

  • **kwargs – Used to provide values for additional interpolants, e.g., flavor, basename, etc…

Returns:

Path to output file

Return type:

str

wrap_pz_model(path, outdir, **kwargs)[source]

Wrap a pz model file for use by Rubin DM software

Return type:

int

Parameters:
  • path (str) – Path to the model file

  • outdir (str) – Directory we are writing to

  • kwargs (Any)

Returns:

status

Return type:

0 for success, error_code otherwise

write_yaml(yaml_file)[source]

Write this project to a yaml file

Return type:

None

Parameters:

yaml_file (str)

yaml_tag: str = 'Project'