Adding a new RailDatasetHolder
Because of the variety of formats of files in RAIL, and the variety of analysis flavors
in a RailProject, it is useful to be able to have re-usable tools that wrap particular types
datasets from a RailProject These are implemented as subclasses of the rail.plotting.dataset_holder.RailDatasetHolder class.
A RailDatasetHolder is intended to take a particular set of inputs and
extract a particular set of data from the RailProject. The inputs and outputs
are all defined in particular ways to allow RailDatasetHolder
objects to be integrated into larger data analysis pipelines.
New RailDatasetHolder Example
The following example has all of the required pieces of a RailDatasetHolder and almost nothing else.
class RailPZPointEstimateDataHolder(RailDatasetHolder):
"""Simple class for holding a dataset for plotting data that comes from a RailProject"""
config_options: dict[str, StageParameter] = dict(
name=StageParameter(str, None, fmt="%s", required=True, msg="Dataset name"),
project=StageParameter(
str, None, fmt="%s", required=True, msg="RailProject name"
),
selection=StageParameter(
str, None, fmt="%s", required=True, msg="RailProject data selection"
),
flavor=StageParameter(
str, None, fmt="%s", required=True, msg="RailProject analysis flavor"
),
tag=StageParameter(
str, None, fmt="%s", required=True, msg="RailProject file tag"
),
algo=StageParameter(
str, None, fmt="%s", required=True, msg="RailProject algorithm"
),
)
extractor_inputs: dict = {
"project": RailProject,
"selection": str,
"flavor": str,
"tag": str,
"algo": str,
}
output_type: type[RailDataset] = RailPZPointEstimateDataset
def __init__(self, **kwargs: Any):
RailDatasetHolder.__init__(self, **kwargs)
self._project: RailProject | None = None
def __repr__(self) -> str:
ret_str = (
f"{self.config.extractor} "
"( "
f"{self.config.project}, "
f"{self.config.selection}_{self.config.flavor}_{self.config.tag}_{self.config.algo}"
")"
)
return ret_str
def get_extractor_inputs(self) -> dict[str, Any]:
if self._project is None:
self._project = RailDatasetFactory.get_project(self.config.project).resolve()
the_extractor_inputs = dict(
project=self._project,
selection=self.config.selection,
flavor=self.config.flavor,
tag=self.config.tag,
algo=self.config.algo,
)
self._validate_extractor_inputs(**the_extractor_inputs)
return the_extractor_inputs
def _get_data(self, **kwargs: Any) -> dict[str, Any] | None:
return get_pz_point_estimate_data(**kwargs)
@classmethod
def generate_dataset_dict(
cls,
**kwargs: Any,
) -> list[dict[str, Any]]:
The required pieces, in the order that they appear are:
The
RailProjectDatasetHolder(RailDatasetHolder):defines a class calledRailProjectDatasetHolderand specifies that it inherits fromRailDatasetHolder.The
config_optionslines define the configuration parameters for this class, as well as their default values. Note that we are specifying a helper class to actually extract the data.The
extractor_inputs = [('input', PqHandle)]andoutputs = [('output', PqHandle)]define the inputs that will be based to theThe
output_type: type[RailDataset] = RailPZPointEstimateDatasetline specifies that this class will return a RailPZPointEstimateDataset dataset.The
__init__method does any class-specific initialization, in this case defining that this class will store and project and extractorThe
__repr__method is optional, here it gives a useful representation of the classThe
get_extractor_inputs()method does the first part of the actual work, note that it doesn’t take any arguments, that it uses the factories to find the helper objects and passes algo it’s configuration and validates it’s outputsThe
_get_data()method does the rest of actual work (in this case it passes it off to a utility functionget_pz_point_estimate_datawhich knows how to extract data from theRailProjectThe
generate_dataset_dict()can scan aRailProjectand generate a dictionary of all the available datasets