Analysis Components

Analysis component basics

An analysis component is our way of wrapping a small bit of the analysis configuration in a way that lets us share and reuse it.

The basic interface to analysis components is the rail.projects.configurable.Configurable class, which defines a few things,

  1. parameters associated to that component, via the config_options class member, and the expected types of those parameters,

  2. type validate, to ensure that objects are created with the correct types of parameters,

  3. access to the current values of the parameters via the config data member,

  4. mechansims to read/write the component to yaml, including the yaml_tag class member defining the yaml tag that marks a block of yaml as defining an object of a particular type of component.

File and Catalog definitions

Objects that define files and sets of files.

FileInstance

class rail.projects.file_template.RailProjectFileInstance(**kwargs)[source]

Simple class for holding information about a single file

Parameters:

kwargs (Any)

FileTemplate

class rail.projects.file_template.RailProjectFileTemplate(**kwargs)[source]

Simple class for holding a template that can be resolved to a single file

For example the path_template might be ‘a_file/{flavor}_data.hdf5’ and the interpolants would be [‘flavor’]

When called with a dict such as flavor: ‘baseline path_template would get expanded out to a_file/baseline_data.hdf5

Parameters:

kwargs (Any)

CatalogInstance

class rail.projects.catalog_template.RailProjectCatalogInstance(**kwargs)[source]

Simple class for holding information need to make a coherent catalog of files using a templated file name and iteration_vars to fill in the interpolation in the file name.

For example the path_template might be ‘a_file/{healpix}/data.parqut’ and the interation_vars would be [‘healpix’].

When called with a dict such as healpix : [3433, 3344] it would the path_template would get expanded out to two files:

a_file/3433/data.parqut a_file/3344/data.parqut

Parameters:

kwargs (Any)

CatalogTemplate

class rail.projects.catalog_template.RailProjectCatalogInstance(**kwargs)[source]

Simple class for holding information need to make a coherent catalog of files using a templated file name and iteration_vars to fill in the interpolation in the file name.

For example the path_template might be ‘a_file/{healpix}/data.parqut’ and the interation_vars would be [‘healpix’].

When called with a dict such as healpix : [3433, 3344] it would the path_template would get expanded out to two files:

a_file/3433/data.parqut a_file/3344/data.parqut

Parameters:

kwargs (Any)

Algorithm definitions

class rail.projects.algorithm_holder.RailAlgorithmHolder(**kwargs)[source]

Simple class for holding an algorithm by name.

This has the information needed to create the associated classes, namely the name of the python module in which they live, and the names of the classes themselves.

Parameters:

kwargs (Any)

There are several sub-classes of RailAlgorithmHolder for different types of algorithms.

PZAlgorithm

class rail.projects.algorithm_holder.RailPZAlgorithmHolder(**kwargs)[source]

Wrapper for algorithms that estimate per-object p(z).

This wraps both the Inform and Estimate classes.

The Inform class will typically be a CatInformer type RailStage, used to train the model for p(z) estimation.

The Estimate class will typically be a CatEstimator type RailStage, which uses the trained model for p(z) estimation.

A set of PZAlgorithm are used as inputs to several of the pipelines, specifying that the set of algorithms to run the pipeline with.

Parameters:

kwargs (Any)

Summarizer

class rail.projects.algorithm_holder.RailSummarizerAlgorithmHolder(**kwargs)[source]

Wrapper for algorithms that make ensemble n(z) from a set of p(z).

This wraps the Summarize class, which is typically a PZToNZSummarizer type RailStage.

A set of Summarizer are used as inputs to the tomography-related pipelines, specifying that the set of algorithms to obtain n(z) information.

Parameters:

kwargs (Any)

Classifier

class rail.projects.algorithm_holder.RailClassificationAlgorithmHolder(**kwargs)[source]

Wrapper for algorithms that assign objects to tomographic bins.

This wraps the Classify class, which is typically a Classifier type RailStage.

A set of Classifier are used as inputs to the tomography-related pipelines, specifying that the set of algorithms to assign objects to tomographic bins.

Parameters:

kwargs (Any)

SpecSelection

class rail.projects.algorithm_holder.RailSpecSelectionAlgorithmHolder(**kwargs)[source]

Wrapper for algorithms that emulate spectrosopic selections.

This wraps the SpecSelection class, which is typically a SpecSelector type RailStage.

A set of SpecSelection are used as inputs to the observation emulation pipelines, specifying that the set of algorithms to emulate spectrosopic selections.

Parameters:

kwargs (Any)

ErrorModel

class rail.projects.algorithm_holder.RailErrorModelAlgorithmHolder(**kwargs)[source]

Wrapper for algorithms that emulate photometric errors.

This wraps the ErrorModel class, which is typically a PhotoErrorModel type RailStage

A set of ErrorModel are used as inputs to the observation emulation pipelines, specifying that the set of algorithms to emulate photometric errors.

Parameters:

kwargs (Any)

Reducer

class rail.projects.algorithm_holder.RailReducerAlgorithmHolder(**kwargs)[source]

Wrapper for algorithms that reduce data sets by applying selections and removing unneed columns.

This wraps the Reduce class, which is typically a RailReducer object.

Typically a single Reducer is used to prepare data for a particular project, possible apply a few different selections along the way.

Parameters:

kwargs (Any)

Subsampler

class rail.projects.algorithm_holder.RailSubsamplerAlgorithmHolder(**kwargs)[source]

Wrapper for algorithms that sumsample catalogs to provide testing and training data sets.

This wraps the Subsample class, which is typically a RailSubsampler object.

Typically a single Subsample is used to create a number of different test and training data sets for a particular project.

Parameters:

kwargs (Any)

Algorithm configurations

Selection

class rail.projects.selection_factory.RailSelection(**kwargs)[source]

Paramters for a simple data selection

This is just defined as a dict of cuts

Parameters:

kwargs (Any)

Subsample

class rail.projects.subsample_factory.RailSubsample(**kwargs)[source]

Paramters for a simple data subsample

This is just defined as a random number seed and a number of objects

Parameters:

kwargs (Any)

Plot definitions

Plotter

class rail.plotting.plotter.RailPlotter(**kwargs)[source]

Base class for making matplotlib plot

The main function in this class is:

run(prefix: str, kwargs**: Any) -> dict[str, RailPlotHolder]

This function will make a set of plots and return them in a dict. prefix is string that gets prepended to plot names.

The data to be plotted is passed in via the kwargs.

Sub-classes should implement

config_options: dict[str, ceci.StageParameter]

that will be used to configure things like the axes binning, selection functions, and other plot-specfic options

input_type: RailPZPointEstimateDataset

that specifics the inputs that the sub-classes expect, this is used the check the kwargs that are passed to the run function.

A function:

_make_plots(self, prefix: str, **kwargs: Any) -> dict[str, RailPlotHolder]:

That actually makes the plots. It does not need to do the checking that the correct kwargs have been given.

Parameters:

kwargs (Any)

PlotterList

class rail.plotting.plotter.RailPlotterList(**kwargs)[source]

The class collects a set of plotter that can all run on the same data.

E.g., plotters that can all run on a dict that looks like {truth:np.ndarray, pointEstimates: np.ndarray} could be put into a PlotterList. This make it easier to collect similar types of plots.

Parameters:

kwargs (Any)

Plotting dataset defintions

Dataset

class rail.plotting.dataset_holder.RailDatasetHolder(**kwargs)[source]

Base class for extracting data from a RailProject

The resolve method will return the wrapped dataset

Sub-classes should implement

a class member: extractor_inputs: a dict [str, type]

that specifies the inputs that the sub-classes expect, this is used the check the kwargs that are passed to the _get_data() function

a class member: output_type: type[RailDataset]

that specifies the output dataset type

A function: get_extractor_inputs(self) -> dict[str, Any]

The resolves anything for the call to _get_data from the configuration parameters. For example, loading the underlying project if needed.

A function: _get_data(self,**kwargs: Any) -> dict[str, Any]:

That actually gets the data. It does not need to do the checking that the correct kwargs have been given.

A class method: generate_dataset_dict()

that will find all the datasets that the extractor can extract

Parameters:

kwargs (Any)

The class_name parameter in the yaml file specifies which sub-class to use, and the other parameters specify the keys needed to specify a unique dataset.

DatasetList

class rail.plotting.dataset_holder.RailDatasetListHolder(**kwargs)[source]

Class to wrap a list of consistent RailDatasetHolders

i.e., all of the RailDatasetHolders should return the same type of dataets, meaning that they should all contain the same columns.

The resolve method will return the list of RailDatasetHolders

Parameters:

kwargs (Any)

Project

class rail.plotting.dataset_holder.RailProjectHolder(**kwargs)[source]

Class to wrap a RailProject

This is just the path to the yaml file that define the project

The resolve method will create a RailProject object by reading that file

Parameters:

kwargs (Any)

Plot Group definitions

PlotGroup

class rail.plotting.plot_group.RailPlotGroup(**kwargs)[source]

Class defining of a group on plots to make with a particular list of coherent datasets

Parameters:

kwargs (Any)