Adding new flavors to a RailProject and using them

This notebook will show you the basics of adding new flavors to RailProject and using them. It’s quite simple.

Import the usual suspects

[1]:
import os
from rail.projects import RailProject, RailFlavor, library
/home/docs/checkouts/readthedocs.org/user_builds/rail-projects/envs/latest/lib/python3.11/site-packages/ceci/__init__.py:12: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  from pkg_resources import DistributionNotFound

Setup environment, and move to correct directory. This will also download the needed data files to run this example.

[2]:
import os
from rail.projects import library

check_dir = os.path.basename(os.path.abspath(os.curdir))
if check_dir == 'examples':
    os.chdir('..')

print(os.path.abspath(os.curdir))

setup = library.setup_mininal_example_files()
assert setup == 0
/home/docs/checkouts/readthedocs.org/user_builds/rail-projects/checkouts/latest

Build a very minimal project, with only the baseline flavor

[3]:
project = RailProject.load_config('tests/minimal.yaml')

Add a new flavor to the project

Note that we are specifying:

  1. a name for the new flavor

  2. the pipelines that we can run under the new flavor

  3. overrides for the pipelines with respect to the baseline

[4]:
new_flavor = project.add_flavor(
    name='new_flavor',
    pipelines=['pz', 'tomography'],
    pipeline_overrides=dict(
        default=dict(
            kwargs=dict(
              algorithms=['gpz'],
            )
        ),
        pz=dict(
            inform_gpz=dict(
              gpz_method='VC',
            )
        )
    )
)

Make sure the flavor got added, and inspect it’s configuration.

Note that it picked up a few parameters from the baseline

[5]:
project.get_flavors()
[5]:
{'baseline': baseline, 'new_flavor': new_flavor}
[6]:
new_flavor.config
[6]:
StageConfig{name:new_flavor,catalog_tag:com_cam,pipelines:['pz', 'tomography'],file_aliases:{'test': 'test_file', 'train': 'train_file'},pipeline_overrides:{'default': {'kwargs': {'algorithms': ['gpz']}}, 'pz': {'inform_gpz': {'gpz_method': 'VC'}}},}

Let’s see where the pipeline files will go

[7]:
print(f"The template for pipeline files is: {project.get_path('pipeline_path')}")
The template for pipeline files is: tests/temp_data/projects/minimal/pipelines/{pipeline}_{flavor}.yaml
[8]:
print(f"The path for the 'pz' pipeline of flavor 'baseline' is {project.get_path('pipeline_path', pipeline='pz', flavor='baseline')}")
The path for the 'pz' pipeline of flavor 'baseline' is tests/temp_data/projects/minimal/pipelines/pz_baseline.yaml

Build the pipelines for the few flavor

[9]:
project.build_pipelines('new_flavor', force=True)
Skipping pipeline inform from flavor new_flavor
Skipping pipeline estimate from flavor new_flavor
Skipping pipeline evaluate from flavor new_flavor
Writing tests/temp_data/projects/minimal/pipelines/pz_new_flavor.yaml
Inserting handle into data store.  model_inform_gpz: inprogress_model_inform_gpz.pkl, inform_gpz
Inserting handle into data store.  model: None, estimate_gpz
Inserting handle into data store.  output_estimate_gpz: inprogress_output_estimate_gpz.hdf5, estimate_gpz
Inserting handle into data store.  input: None, evaluate_gpz
Writing tests/temp_data/projects/minimal/pipelines/tomography_new_flavor.yaml
Inserting handle into data store.  output_classify_gpz_equal_count: inprogress_output_classify_gpz_equal_count.hdf5, classify_gpz_equal_count
Inserting handle into data store.  tomography_bins: None, true_nz_gpz_equal_count_bin0
Inserting handle into data store.  output_classify_gpz_uniform_binning: inprogress_output_classify_gpz_uniform_binning.hdf5, classify_gpz_uniform_binning
[9]:
0

Let’s see what the inputs for the new pz pipeline are

[10]:
pipeline_info = project.get_pipeline('pz')
pipeline_instance = pipeline_info.make_instance(project, 'new_flavor', {})
pipeline_instance.get_input_files(project, selection='gold')
[10]:
{'input_train': 'tests/temp_data/data/train/minimal_gold_train.hdf5',
 'input_test': 'tests/temp_data/data/test/minimal_gold_test.hdf5',
 'sink_dir': 'tests/temp_data/projects/minimal/data/gold_new_flavor'}

All set, run the pipeline

[11]:
project.run_pipeline_single('pz', flavor='new_flavor', selection='gold')
subprocess: ceci tests/temp_data/projects/minimal/pipelines/pz_new_flavor.yaml config=tests/temp_data/projects/minimal/pipelines/pz_new_flavor_config.yml output_dir=tests/temp_data/projects/minimal/data/gold_new_flavor log_dir=tests/temp_data/projects/minimal/data/gold_new_flavor/logs inputs.input_train=tests/temp_data/data/train/minimal_gold_train.hdf5 inputs.input_test=tests/temp_data/data/test/minimal_gold_test.hdf5
>>>>>>>>
/home/docs/checkouts/readthedocs.org/user_builds/rail-projects/envs/latest/lib/python3.11/site-packages/ceci/__init__.py:12: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  from pkg_resources import DistributionNotFound
Over-riding config parameters from command line:
    config: tests/temp_data/projects/minimal/pipelines/pz_new_flavor_config.yml
    output_dir: tests/temp_data/projects/minimal/data/gold_new_flavor
    log_dir: tests/temp_data/projects/minimal/data/gold_new_flavor/logs
    inputs.input_train: tests/temp_data/data/train/minimal_gold_train.hdf5
    inputs.input_test: tests/temp_data/data/test/minimal_gold_test.hdf5

Executing inform_gpz
Command is:
OMP_NUM_THREADS=1   python3 -m ceci rail.estimation.algos.gpz.GPzInformer   --input=tests/temp_data/data/train/minimal_gold_train.hdf5   --name=inform_gpz   --config=tests/temp_data/projects/minimal/pipelines/pz_new_flavor_config.yml   --model=tests/temp_data/projects/minimal/data/gold_new_flavor/model_inform_gpz.pkl
Output writing to tests/temp_data/projects/minimal/data/gold_new_flavor/logs/inform_gpz.out

Job inform_gpz has completed successfully in 6.0 seconds seconds !

Executing estimate_gpz
Command is:
OMP_NUM_THREADS=1   python3 -m ceci rail.estimation.algos.gpz.GPzEstimator   --model=tests/temp_data/projects/minimal/data/gold_new_flavor/model_inform_gpz.pkl   --input=tests/temp_data/data/test/minimal_gold_test.hdf5   --name=estimate_gpz   --config=tests/temp_data/projects/minimal/pipelines/pz_new_flavor_config.yml   --output=tests/temp_data/projects/minimal/data/gold_new_flavor/output_estimate_gpz.hdf5
Output writing to tests/temp_data/projects/minimal/data/gold_new_flavor/logs/estimate_gpz.out

Job estimate_gpz has completed successfully in 3.0 seconds seconds !

Executing evaluate_gpz
Command is:
OMP_NUM_THREADS=1   python3 -m ceci rail.evaluation.single_evaluator.SingleEvaluator   --input=tests/temp_data/projects/minimal/data/gold_new_flavor/output_estimate_gpz.hdf5   --truth=tests/temp_data/data/test/minimal_gold_test.hdf5   --name=evaluate_gpz   --config=tests/temp_data/projects/minimal/pipelines/pz_new_flavor_config.yml   --output=tests/temp_data/projects/minimal/data/gold_new_flavor/output_evaluate_gpz.hdf5   --summary=tests/temp_data/projects/minimal/data/gold_new_flavor/summary_evaluate_gpz.hdf5   --single_distribution_summary=tests/temp_data/projects/minimal/data/gold_new_flavor/single_distribution_summary_evaluate_gpz.hdf5
Output writing to tests/temp_data/projects/minimal/data/gold_new_flavor/logs/evaluate_gpz.out

Job evaluate_gpz has completed successfully in 3.0 seconds seconds !
Pipeline successful.  Joy is sparked.
<<<<<<<<
subprocess completed with status 0 in 16.361343383789062 seconds

[11]:
0
[ ]: