Hi all I am having an issue defining a pipeline wi...
# questions
j
Hi all I am having an issue defining a pipeline with using namespaces in multiple modular pipelines. I am following the structure of the spaceflights tutorial and I am getting this error:
Copy code
ValueError: Duplicate keys found in 
<project repo>/conf/base/parameters/pr
epare.yml and:
- 
<project repo>/conf/base/parameters/in
gest.yml: train_pipeline
I have the
train_pipeline
namespace in both the ingest and prepare modular sub-pipelines, here are the respective yamls:
Copy code
# The following is a list of parameters for ingest pipeline for each namespace (train, inference)


# Parameters for train namespace
train_pipeline:
  ingestion_options:
    #Portfolio to use
    portfolio_name: has_meds_portfolio.HasMedsPortfolio
    # Feature store sub-pipes, only one for now.
    feature_store_subpipe_name: BasicFeaturePipeline
    # Expected output columns
    expected_columns:
      datetime: datetime64[ns]
      patient_id: int64
      age_days: int64
      Male: int64
      binary_smoking_status: object
      overall_censorship_time: datetime64[ns]
      months_until_overall_censorship: int64
      death_date: datetime64[ns]

# Parameters for inference namespace
# currently same as train but this will change
# first updated to Nightly Porrtfolio then to
# an api call to the valuation queue.
inference_pipeline:
  ingestion_options:
    #Portfolio to use
    portfolio_name: has_meds_portfolio.HasMedsPortfolio
    # Feature store sub-pipes, only one for now.
    feature_store_subpipe_name: BasicFeaturePipeline
    # Expected output columns
    expected_columns:
      datetime: datetime64[ns]
      patient_id: int64
      age_days: int64
      Male: int64
      binary_smoking_status: object
      overall_censorship_time: datetime64[ns]
      months_until_overall_censorship: int64
      death_date: datetime64[ns]
Copy code
# all parameters for prepare pipeline are in train_pipeline namespace
train_pipeline:
  preparation_options:
    # target params
    target_death_buffer_months: 2
    
    # split params 
    splitter: TimeSeriesSplit
    holdout_size: 0.3
am I not allowed to use the same namespace in multiple modular pipelines?
d
No by definition namespaces need to be isolated and distinct
j
thanks. is there a proper way to isolate distinct paths through distinct namespaces of multiple modular pipelines? I suppose tags would do the trick but wouldn't guarantee isolation, extra care would have to be taken to ensure it.
d
you can nest namespaces with multiple
.
characters
the
train_evaluation
pipeline on https://demo.kedro.org/ is an example of this
j
awesome thanks