Hi everyone I m a novice to Kedro experimenting with my firs Kedro #questions

Hi everyone, I'm a novice to Kedro, experimenting ...

Shah

10/02/2025, 10:49 AM

Hi everyone, I'm a novice to Kedro, experimenting with my first implementation. Trying to parametrize every function to take the maximum advantage of the platform. While attempting to access parameters defined in the 'parameters_xxx.yml' file, say for example 'data_processing' pipeline, I have two questions. But first, a glimpse into my

parameters_data_processing.yml

file:

Copy code

column_rename_params: # Suffix to be added to overlapping columns
    skip_cols: ['Date'] # Columns to skip while renaming
    co2: '_co2'
    motion: '_motion'
    presence: '_presence'

data_clean_params:
  V2_motion: {
        condition: '<0',
        new_val: 0
        }
  V2_presence: {
        condition: '<0',
        new_val: 0
        }

  infinite_values:
    infinite_val_remove: true
    infinite_val_conditions:
      - column_name: V2_motion
        lower_bound: -1e10
        upper_bound: 1e10
      - column_name: V2_presence
        lower_bound: -1e10
        upper_bound: 1e10

I am experimenting with different parameter styles: dictionaries of dictionaries, dictionary of lists etc. So the two questions are as following: 1. How do I pass the second or third level dictionary parameters to a node? e.g. how do I pass

column_rename_params['co2']

key's value to one node, and

column_rename_params['motion']

key's value to another? My attempt of passing inputs to a node as

inputs=['co2_processed', 'params:column_rename_params:co2', 'params:column_rename_params:skip_cols']

, has returned

"not found in the DataCatalog"

error. Do I need to define these parameters in

catalog.yml

? Since, the parameters are not defined in the catalog.yml, yet I can access the

"params:column_rename_params"

dictionary, I guess there must be a way to access the next level as well. As a workaround, I have simplified the dictionary, keeping everything on the base level (not nested dictionaries). 2. Curiousity: Why do we write

'params:<key>'

instead of

'parameters:<key>'

? Just curious because I do not remember to have defined any object as 'params'. I was just following the tutorial. Thanks ahead, and also thanks for Kedro and this slack workspace.

datajoely

10/02/2025, 11:17 AM

So this is very possible but I would argue it’s pretty dangerous to do in kedros current state with no native parameter validation. To answer your question you use the dot snyrax to access this nested attributes. Also check out omegaconf resolver section of the docs for extra power here I have this open issue which proposes native Pydantic support, if you were to comment your thoughts it would be helpful.

👍 1

datajoely

10/02/2025, 11:18 AM

https://github.com/kedro-org/kedro/issues/5110

Merel

10/02/2025, 11:26 AM

Curiousity: Why do we write
'params:<key>'
instead of
'parameters:<key>'
? Just curious because I do not remember to have defined any object as 'params'. I was just following the tutorial.

The unofficial answer is that it's just always been this way 😄 When you want to use all parameters you just reference

parameters

and otherwise

params:<key>

my guess is this might be just for convenience since

params

is shorter to write than

parameters

💡 1

Shah

10/02/2025, 11:44 AM

@datajoely Thanks. Checking the page. Seems like your proposal is more exhaustive. @Merel That actually makes sense because initially I had written

parameters

and it was still working. Then following the tutorial, wrote

params:xxx

and it worked as well.

👍 1

Shah

10/02/2025, 11:51 AM

The parameters issue is now becoming critical as I am not able to perform split. Tried simplifying the parameters but seems like missing something. Even tried calling the whole dictionary by using

parameters

. My parameters.yml is now with two options:

Copy code

split_params:
  test_size: 0.2
  random_state: 42
  features:
    - V2_presence
    - V2_motion
  target: 
    - V17_co2 
split_param_features:
  - V2_presence
  - V2_motion
split_param_target:
  - V17_co2
split_param_test_size: 0.2
split_param_random_state: 42

pipeline.py contains:

Copy code

def create_pipeline(**kwargs) -> Pipeline:
    return pipeline([
        node(
            func=split_data,
            inputs=["presence_motion_co2_combined","params:split_param_features", "params:split_param_target", "params:split_param_test_size", "params:split_param_random_state"],
            outputs=["X_train", "X_test", "y_train", "y_test"],
            name="split_data_node"
        ),

and the nodes.py contains:

Copy code

def split_data(df: pd.DataFrame, features, target, test_size, random_state) -> t.Tuple:
    X = df[features]
    y = df[target]
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=random_state)
    return X_train, X_test, y_train, y_test

Yet, getting an error while running the pipeline:

Copy code

KeyError: "None of [Index(['V2_presence', 'V2_motion'], dtype='object')] are in the [columns]"

The parquet file generated in the data_processing pipeline has (should have) the columns. Is there a way to run the pipeline in debug mode so that I can check the exact dataframe being passed?

Shah

10/02/2025, 2:41 PM

Updates: ✅ the above issue is resolved. It was a bug in the code: redundant concatenation of suffix string caused unexpected column names. The logger.info() strings helped debug it. Would like to know if there is any better debug method for kedro pipelines. ✅ Also, the parameters issue is resolved by passing higher level dictionary in the nodes input, and unfolding it inside the function. Thank you both!

🥳 1

datajoely

10/03/2025, 1:14 PM

We're going to discuss this as a Kedro team next week so if you have any capacity to add your thoughts to the GH Issue it would be really useufl

👍 1

Shah

10/03/2025, 2:01 PM

Yes, I will. Finishing my first trial project, and will share all the feedback there. So whatever is useful/relevant/applicable can be considered for the next Kedro update.

3 Views

Open in Slack

Previous Next