What is your experience with combining Kedro and <...
# plugins-integrations
h
What is your experience with combining Kedro and BentoML? I’ve used Kedro-MLFlow quite a bit, but find its built-in model serving quite limiting, especially when it comes to the performance and stability of batch inference for transformer models on the GPU. However, I do really like the concept of packaging basically the evaluation part of my pipeline as a KedroPipeline and serving that. To establish the same in BentoML, do you package/pickle every node and store it (in bento) and then create a runnable out of that, or do you have you found a more sophisticated approach like serving the KedroPipeline, and maybe extracting only the framework specific parts to leverage bento’s integrations? For integration with BentoML, im currently using the following:
Copy code
import bentoml
from pathlib import Path


def _find_kedro_project(current_dir):  # pragma: no cover
    from kedro.framework.startup import _is_project

    while current_dir != current_dir.parent:
        if _is_project(current_dir):
            return current_dir
        current_dir = current_dir.parent

    return None


def retrieve_kedro_context(env="local"):
    from kedro.framework.session import KedroSession
    from kedro.framework.startup import bootstrap_project

    project_path = _find_kedro_project(Path.cwd())
    metadata = bootstrap_project(project_path)

    with KedroSession.create(
        package_name=metadata.package_name,
        project_path=metadata.project_path,
        env=env,
    ) as kedro_session:
        return kedro_session.load_context()


def download_model(name: str) -> bentoml.Model:
    try:
        return bentoml.transformers.get(name)

    except bentoml.exceptions.NotFound:
        catalog = retrieve_kedro_context().catalog
        pipeline = catalog.load(name)
        return bentoml.transformers.save_model(name, pipeline)


def get_runner(name: str, init_local: bool = False):
    runner = download_model(name).to_runner()
    if init_local:
        runner.init_local(quiet=True)
    return runner
This allows one to use any arbitrary catalog entry (so you store your model as a pickle on s3, or in mlflow), however integration with KedroPipeline seems very complicated as bento needs to be aware of the model framework. In additation, CI/CD now needs access the kedro context, while id prefer to simply link the mlflow storage to bentoml and maybe use some adapters for pre and post-processing. Additionaly, Bento uses its own storage location, which for as far as i know i can;t move to the cloud (which is quite catastrophic when using Llama=70b, since it will instantly fill up your local storage), have you found any ways around this? Do you make storing bentos part of your kedro pipelines? in the above code, i look in the bento storage, and if not found in the kedro catalog. But this obvisously only works when you actively manage this (otherwise you’d be pulling old models). any thoughts or preferences? Also, do you use pip, conda or poetry? im looking to use different dependancy groups to separate model deps from dev/training deps. also because there have been quite a few breaking changes lately when upgrading packages. Is there any special tricks you employ wrt staggered updating of deps combined with tests? Do you link the poetry deps with mlflow, or do you use bentos inver_packages? Also, what are your opinions when it comes to deploying the packaged models to k8s? Do you simply deploy docker containers direcly, or use something like Seldon or Kserve? Or even Bento’s Yatai? Im curious!
👀 2
f
very curious on this one too. I came across bentoML recently when I was thinking on how people split pre/post processing and inference on deployments. E.g. the classic approach I guess is that one could use fastapi to do everything (download model + any related data needed for processing and then use kedro pipelines to do pre/post processing or just the node functions). But that also means potentially “largish” deployments, another option would be e.g. when using AzureML to host the model only using AzureML and then have fastapi only do the pre/post processing and in between call the hosted model. Of course alternatives are seldon or bentoML. I think seldon has options to have pre/post-processing as well. Has anyone used seldon to host kedro models plus pre/post-processing? Would be very curious to know how others split these tasks or if at all 🙂
d
@marrrcin wanted to think through how best to do serving in Kedro too
m
Yeah, one time I’ve actually used BentoML with Kedro - I’ve just packaged custom BentoML model which had some pre-processing steps done using the same
func
tionts as Kedro nodes had. Worked OK.
h
did you convert the nodes to bentoml runnables?
m
Back then I’ve implemented custom
BentoService
class, but I’m not sure how BentoML looks right now, I think there were some breaking changes since last time I’ve used it for this case.
h
because thats what im figuring out now, whether its better to package the kedro pipeline/nodes with the bento container, or to take all the nodes in the inference pipeline, and convert them to runnables and somehow store them with the model at the end of training. I dont really want to use the bentostore for that, but im not sure whether i can store it in mlflow
otherwise id need to package the code alongside the model, and store that somewhere
or build a docker container for each model i train, but that would cause an enormous storage bloat
but yeah, the bentoml 1.0 version is quite different (and better) then the 0.13 version, so i think its certainly worth revisiting, especially when serving transformers & multipart models
👍 1
j
if I understand correctly, these "model serving" systems (BentoML, Seldon Core) would be a fancy "REST/gRPC API over a model", am I right? what else do these systems bring to the table?