Hello Team, I've encountered some difficulties whi...
# questions
a
Hello Team, I've encountered some difficulties while attempting to use Kedro with the
kedro-airflow-k8s
plugin version 0.8.2. I've followed the documentation provided, but I'm unable to get it to function as expected. Here's the output I'm getting when I try to use the commands:
Copy code
(kedro-17.0) $ pip show kedro
Name: kedro
Version: 0.16.6
Summary: Kedro helps you build production-ready data and analytics pipelines
Home-page: <https://github.com/quantumblacklabs/kedro> 
Author: QuantumBlack Labs
Author-email:
License: Apache Software License (Apache 2.0)
Requires: anyconfig, cachetools, click, cookiecutter, fsspec, jmespath, jupyter-client, pip-tools, pluggy, python-json-logger, PyYAML, setuptools, toposort
Required-by: kedro-airflow-k8s, kedro-docker, kedro-telemetry

(kedro-17.0) $ kedro install
Usage: kedro [OPTIONS] COMMAND [ARGS]...
Try 'kedro -h' for help.

Error: No such command 'install'.

(kedro-17.0) $ kedro airflow-k8s
Usage: kedro [OPTIONS] COMMAND [ARGS]...
Try 'kedro -h' for help.

Error: No such command 'airflow-k8s'.

(kedro-17.0)$ pip show kedro-airflow-k8s
Name: kedro-airflow-k8s
Version: 0.8.2
Summary: Kedro plugin with Airflow on Kubernetes support
Home-page: <https://github.com/getindata/kedro-airflow-k8s/>
Author: Michal Zelechowski, Mariusz Strzelecki, Mateusz Pytel
Author-email: <mailto:mateusz@getindata.com|mateusz@getindata.com>
License: Apache Software License (Apache 2.0)
Requires: click, kedro, pip-tools, python-slugify, semver, tabulate
Required-by:
Another issue is when I include the configurations for
X_test
,
y_train
,
y_test
in the
catalog.yml
file as follows:
Copy code
X_train:
  type: pickle.PickleDataSet
  filepath: data/05_model_input/X_train.pickle
  layer: model_input

y_train:
  type: pickle.PickleDataSet
  filepath: data/05_model_input/y_train.pickle
  layer: model_input

X_test:
  type: pickle.PickleDataSet
  filepath: data/05_model_input/X_test.pickle
  layer: model_input

y_test:
  type: pickle.PickleDataSet
  filepath: data/05_model_input/y_test.pickle
  layer: model_input
And then execute
kedro run
locally, I receive the following exception:
Copy code
Class 'pickle.PickleDataset' not found, is this a typo?
However, I have confirmed that
kedro-datasets
is installed in my environment:
Copy code
(kedro-17.0) $ pip show kedro-datasets
Name: kedro-datasets
Version: 5.0.0
Summary: Kedro-Datasets is where you can find all of Kedro's data connectors.
Home-page: 
Author: Kedro
Author-email: 
License: Apache Software License (Apache 2.0)
Requires: kedro, lazy-loader
Required-by:
Could someone please assist me in resolving this issue? I would greatly appreciate any guidance you can provide.
m
Use the official plugin:
kedro-airflow
Also a good point of reference for more advanced stuff: https://getindata.com/blog/deploying-kedro-pipelines-gcp-composer-airflow-node-grouping-mlflow/
👍 1
y
Another thing : you need to use
pickle.PickleDataset
instead of
pickle.PickleDataSet
(notice the correct version has a lowercase "s" at the beginning of "set")
👍 2
m
@Alex Shawn It seems like you're using Kedro
0.16.6
, is that correct? That predates
kedro-datasets
, so it won't be using the datasets from there. Any chance you can update to a more recent Kedro version?
👍 4
a
thanks! really appreciate. i will give a try later
The mistake regarding 'PickleDataSet' originated from the following doc: https://kedro-airflow-k8s.readthedocs.io/en/0.8.2/source/03_getting_started/01_quickstart.html
@Merel that's correct. kedro-airflow-k8s with version 0.8.2 requires 'kedro<0.17'
i also tried to follow the steps from https://pypi.org/project/kedro-airflow/, but step 3 redirects to another kedro doc Apache Airflow deployment which i only find the k8s section related to my use case. the current doc is somewhat hard to follow. will it be possible to revise it? it becomes much clearer when the command-line steps are simply listed out in order.
m
@Dmitry Sorokin / @Ankita Katiyar can you advise on what would be the best approach here?
👀 1
d
Hi @Alex Shawn, Thanks for the feedback! We’ll work on updating our Airflow dosc. The current K8 plugin is outdated, but I believe you can achieve a similar setup with the latest Kedro version, though it may require some manual work: 1. Create a Dockerfile using the
kedro-docker
plugin: https://github.com/kedro-org/kedro-plugins/tree/main/kedro-docker 2. Generate an Airflow DAG using the
kedro-airflow
plugin: https://github.com/kedro-org/kedro-plugins/tree/main/kedro-airflow 3. Take the generated DAG and replace the
KedroOperator
with
KubernetesPodOperator()
. Make sure to link the operator to the Docker container from step 1, instead of using the packaged project used by the
kedro-airflow
plugin. 4. Upload the DAG and Container to K8