Hi all, what is best practice when performing infe...
# questions
r
Hi all, what is best practice when performing inference in kedro, when inference input data requires the same pre-processing pipeline steps as a training pipeline? I want to reuse the same pre-processing steps for both training and inference, however I cannot find any documentation on how to do this. Ultimately I’d like to package the entire model, including pre-processing and inference. Any guidance would be helpful
d
Something like a scikit-learn is probably more appropriate for this. You can then call the
train
and
predict
from within nodes.
Unfortunately, while it's been explored a number of times, I don't know if there's a great way to create an equivalency between Kedro pipelines + nodes and scikit-learn pipelines + transformers, the way things currently stand Removing my answer in favor of @Yolan Honoré-Rougé’s :)
y
This tries to answer this very question : https://github.com/Galileo-Galilei/kedro-mlflow-tutorial There is a focus on mlflow, but you can ignore it and adapt to another tool if you want.
d
This
pipeline_ml_factory
concept is very interesting! I never know of it, thanks. I will try it out soon.
y
I don't know how to make it more discoverable. The goal is exactly to create a scikit-learn like pipeline by "binding" two kedro pipelines to have extra flexibility. Many people who discovered this tutorial enjoy it a lot but they often discover it through me, I'd love to find a way to make it more "googlable"
d
I've been giving this canned answer for a while now: https://kedro-org.slack.com/archives/C03QF15L1K9/p1720307969353049?thread_ts=1720246714.100739&cid=C03QF15L1K9 I'll make sure to update. A blog article, or even inclusion in Kedro FAQ, could be great; I really think this comes up often.
👍 1
y
The tutorial is already mentioned in above thread ^^' Maybe I should rebrand it as" scikit-learn like pipelines" to make the concept easier to grasp.
😅 1
d
Oh, it was three weeks after. Even though I link to it, I never reread the thread! 😂
😂 1
r
Thanks for the answers @Yolan Honoré-Rougé and @Deepyaman Datta. I actually found the kedro-mlflow tutorial after I posted my original question and have been working through it. I’m really surprised this question doesn’t come up more often though. I would have thought that training and inference pipelines would be a really really common use case with Kedro. Am I missing something?