Francis Duval
12/20/2024, 3:21 PM# training/nodes.py
def preprocess(data):
# Preprocessing logic here
# training/pipeline.py
from kedro.pipeline import Pipeline, node, pipeline
from .nodes import preprocess
def create_pipeline(**kwargs) -> Pipeline:
return pipeline(
[
node(
func=preprocess,
inputs='data_training',
outputs='preprocessed_data_training',
name='preprocess_training',
)
]
)
# inference/pipeline.py
from kedro.pipeline import Pipeline, node, pipeline
from ..training.nodes import preprocess ## IMPORT FROM training/nodes.py
def create_pipeline(**kwargs) -> Pipeline:
return pipeline(
[
node(
func=preprocess,
inputs='data_inference',
outputs='preprocessed_data_inference',
name='preprocess_inference',
)
]
)
However, I feel like this is not elegant and probably not optimal. Is there a better way of doing this? Maybe a "meta" nodes.py that can be used by all pipelines? Maybe rearranging the whole pipeline?
Thanks!Hall
12/20/2024, 3:21 PMJuan Luis
12/20/2024, 3:33 PMfrom ...utils.preprocessing import preprocess
even if having a utils
package/module is not very elegant, the point is still to not tie it to any pipeline. it could be src/utils
, or src/preprocessing
, or anything else that makes sense. I'd save src/pipelines
for modular pipelines.
does it make sense?Francis Duval
12/20/2024, 3:50 PMkedro pipeline create utils
in which I would define a "preprocessing" modular pipeline:
# utils/pipeline.py
from kedro.pipeline import Pipeline, node, pipeline
from .nodes import preprocess_func_1, preprocess_func_2
def preprocess_template() -> Pipeline:
return pipeline(
pipe=[
node(
func=preprocess_func_1,
inputs='raw_data',
outputs='preprocessed_data_1',
name='preprocess_1'
),
node(
func=preprocess_func_2,
inputs='preprocessed_data_1',
outputs='preprocessed_data_2',
name='preprocess_2'
)
]
)
that I would then import and use in both training/pipeline.py and inference/pipeline.py?
from ..utils/pipeline.py import preprocess_template
Juan Luis
12/20/2024, 3:52 PMutils
, but something more meaningful. in addition, you can parametrize the create_pipeline
function to your needs, so that you can instantiate such pipeline with, say, different inputsFrancis Duval
12/20/2024, 3:56 PMYolan Honoré-Rougé
12/20/2024, 7:48 PM