Michał Stachowicz
10/27/2022, 9:56 AMLorenzo Castellino
10/27/2022, 9:59 AMmarrrcin
10/27/2022, 10:07 AMMichał Stachowicz
10/27/2022, 10:26 AMPaweł Lis
10/27/2022, 10:33 AMDeepyaman Datta
10/27/2022, 11:10 AMPaweł Lis
10/27/2022, 4:08 PMDeepyaman Datta
10/27/2022, 4:22 PMBen Horsburgh
11/02/2022, 9:39 AMPipeline:
Remove outliers
Impute
Normalize
Which could be the implementation of a single kedro pipeline node. By defining these as an SKLearn pipeline I get access to the SKLearn ecosystem and can do things like hyperparameter optimization, which by definition is a non-DAG process cycling over the pipeline many times.
If I were to define a kedro pipeline with the above steps as different nodes, that is also ok. In this instance though I cannot co-tune the logical SKLearn pipeline steps. Instead from an SKLearn perspective it would look like:
Pipeline:
Remove outliers
Pipeline:
Impute
Pipeline:
Normalize
What are the pros and cons of each?
• Single SKLearn pipeline
◦ + hypterparameter tuning over entire pipeline
◦ + export best pipeline model
◦ - complex parameterization
◦ - complex search space
• Multiple SKLearn pipeline
◦ + simple to parameterize
◦ + simple search space
◦ + export best transformer
◦ - No holistic tuning
Which to chose depends very much on the problem you are trying to solve.Paweł Lis
11/02/2022, 8:18 PMYolan Honoré-Rougé
11/23/2022, 9:09 PMSebastian Cardona Lozano
01/20/2023, 12:47 AM