Hi everyone, I was wondering if there is a repo with some template pipelines for standard data science tasks? E.g. a basic pipeline to make a train-test split or to generate a model evaluation report. Thanks!
12/15/2022, 12:31 PM
Hi, you check out our starters, pandas-iris starter is a basic example with train-test split on the iris dataset.
12/15/2022, 7:14 PM
if i understand this correct, im torn whether this would be an opportunity for kedro devs, the community or package devs themselves. for most of the 'standard datascience tasks' you need to select a 'standard' package and 'standard' approach of assumptions. i can see the magic of 'give me a 'merge three tables from sql and do some data prep in between plus a description report in the end' or 'give me a goldstandard binary classification ml pipeline', where you would just have to fill in some blanks, but the techstack would have to be chosen, all the code would have to be written and to be maintained by someone. from what i see this is not in the scope of kedro itself. but interesting nonetheless.
12/20/2022, 8:49 AM
Indeed @Sebastian Pehle you got the idea. I find myself often reusing some basic pipeline and thought that it would be cool to share them and potentially build strong pipeline with train test split, feature selection, hyper paramter tuning for a regression on tabular data, for example