Nok Lam Chan
02/23/2024, 12:57 AMmlflow serve
any good? I guess this is question for @Yolan Honoré-Rougé and @Takieddine Kadiri, how much of Mlflow were you using? My guess is mlflow serve
is not good enough thus kedro-boot
has its own FastAPI serving.
btw I was reading https://github.com/Galileo-Galilei/kedro-mlflow-tutorial/ and I think it's a very nice read, especially enjoy how it puts MLOps into perspective.Nok Lam Chan
02/23/2024, 11:20 AMmarrrcin
02/23/2024, 11:21 AMmarrrcin
02/23/2024, 11:21 AMNok Lam Chan
02/23/2024, 11:51 AMYolan Honoré-Rougé
02/23/2024, 11:56 AMNok Lam Chan
02/23/2024, 12:17 PMmlflow serve
for production. Follow up questions:
• The kedro-mlflow
takes an interesting approach to bundle training
+ inference
into an object, I don't see this in kedro-boot
anymore. Do you also evolve to different patterns for training
and inference
?
• How are you using kedro-mlfow
with kedro-boot
? Do you simply fetch the model files from Mlflow more or less like an object store or you keep the Mlflow concept of "Model" which has the pre/post processing too?
p.s. We are trying to create some docs about Mlflow, more from the MLOps perspective what's the role of Kedro/mlflow, not necessary a specific plugin(kedro-mlflow/kedro-boot). I think the tracking part is pretty straight forward for reproducibility/collaboration, the serving part is less clear to me.Takieddine Kadiri
02/23/2024, 2:22 PMPipelineML.
At the end of the pipeline ml
running kedro mlflow
log a trained model, which is a pyfunc mlflow model containing a pickeled inference pipeline and all trained artifacts that have been saved in the training process (ex: classifier, encoder, …). This is useful when having multiples objects to be fitted in a training process, and this limit the training-serving skew problem.
Pipeline ml
take features and labels datasets as an inputs. Thoses features and labels are processed using a features pipeline
.
The prediction pipeline
have a node that load the pyfunc mlflow model (pickeled inference pipeline) and do the mlflow model predictions, it also have some nodes for predictions post processing.
The main pipeline
(that we call also inference, in an orchestration point of vue), is composed of the features pipeline and the prediction pipeline.
The main pipeline could differ depending on the use case
Kedro boot
offer a way to serve the main pipeline using a fully fledged rest api, that can be developed using nearly all the capabilities of fastapi. Having the full control over the API is mandatory in production.
There is more to say in this topic, like model versioning (aligning it with the code versioning), making the cicd aware of the model version, evaluation pipeline, training datasets selections (from features and labels store), training datasets management, monitoring …Yolan Honoré-Rougé
02/23/2024, 9:32 PMkedro-mlflow
to package their pipeline as a custom mlflow model because it is a very convenient way to have a "consistent" model with code+artifacts well versioned (the pain point is really about artifacts, because you all know the good deployments with my_encoder.pkl transferred in a zip folder alongside the well versioned code🫠 ).
In the end we add an extra layer with kedro-boot
as Taki explains in details, but sometimes it is not the data scientist which originally build the model who will develop with last step for serving the model.Jorit Studer
03/19/2024, 9:20 PMNok Lam Chan
04/05/2024, 9:52 AMJorit Studer
04/06/2024, 7:25 PM