Georgi Iliev
06/16/2023, 7:56 AMONNX
files and uploading them to S3 automatically using "only" the catalog definition.
Broadly speaking, the main flow of what we're trying to build is the following:
1. There is a process that trains and creates some files (PCA, scaler, some K-Means models, etc.) and saves them as Pickle
to use them between different nodes.
2. Once the main pipeline
is done, we're ready to distribute the model to our services.
3. We're using ONNX
because our services are not built in Python and the ONNX libraries we use are a bit faster.
4. So taking this into account, we have a publish
pipeline now that takes this Picke
files, converts them to ONNX
using convert_sklearn
, and then uploads to S3.
So, my main question here is: Is there a way to implement this so the transformation and the S3 upload is done automatically?
⢠I know that we can specify a S3 path in the catalog, but I didn't see how to set the .onnx
file type.Juan Luis
06/16/2023, 8:03 AM# conf/base/catalog.yml
regressor:
type: kedro_onnx.io.OnnxDataSet
filepath: <s3://data/06_models/reg.onnx>
backend: sklearn
(adapted from https://kedro-onnx.readthedocs.io/en/latest/usage.html)
let me know if that works for you!Georgi Iliev
06/16/2023, 8:11 AM