Hi team! I need advice on using `ONNX` files and u...
# questions
g
Hi team! I need advice on using
ONNX
files and uploading them to S3 automatically using "only" the catalog definition. Broadly speaking, the main flow of what we're trying to build is the following: 1. There is a process that trains and creates some files (PCA, scaler, some K-Means models, etc.) and saves them as
Pickle
to use them between different nodes. 2. Once the main
pipeline
is done, we're ready to distribute the model to our services. 3. We're using
ONNX
because our services are not built in Python and the ONNX libraries we use are a bit faster. 4. So taking this into account, we have a
publish
pipeline now that takes this
Picke
files, converts them to
ONNX
using
convert_sklearn
, and then uploads to S3. So, my main question here is: Is there a way to implement this so the transformation and the S3 upload is done automatically? • I know that we can specify a S3 path in the catalog, but I didn't see how to set the
.onnx
file type.
K 3
j
hi @Georgi Iliev! there's a kedro-onnx community plugin https://github.com/nickolasrm/kedro-onnx created by @Nickolas da Rocha Machado, @Melle van der Linde tried it a couple of months ago and reported that it still works nicely https://kedro-org.slack.com/archives/C03RKPCLYGY/p1682073587640849?thread_ts=1682070059.869009&cid=C03RKPCLYGY the catalog entry would supposedly look like this:
Copy code
# conf/base/catalog.yml
regressor:
  type: kedro_onnx.io.OnnxDataSet
  filepath: <s3://data/06_models/reg.onnx>
  backend: sklearn
(adapted from https://kedro-onnx.readthedocs.io/en/latest/usage.html) let me know if that works for you!
g
TYVM! I'll test it and let you know!
šŸ™ŒšŸ¼ 1