Sergei Benkovich
02/05/2023, 8:05 PMModuleNotFoundError: No module named 'pipelines'
any suggestions on how to handle it?datajoely
02/06/2023, 9:39 AMSergei Benkovich
02/06/2023, 9:42 AMdatajoely
02/06/2023, 10:53 AMSergei Benkovich
02/06/2023, 10:57 AMdatajoely
02/06/2023, 10:57 AMSergei Benkovich
02/06/2023, 2:36 PMdatajoely
02/06/2023, 3:26 PMFilip Panovski
02/06/2023, 3:47 PMNo module named 'my_pipeline'
error). One temporary workaround was to define the functions being called as inner functions, so instead of:
def A(df):
# ... A
def B(df):
# ... B
A(df)
we'd have:
def B(df):
def A(df):
# ... A
# ... B
A(df)
The assumption was that the code de/serialization wasn't quite working for some reason, though we used cloudpickle (the default?). Packaging stuff into wheels and importing those instead would probably work too.
Do note that using these inner functions is a bit slower than calling separate functions, though we haven't benchmarked it accurately.
I'm not quite sure if this helps, but it may be worth a shot if this happens when running the nodes themselves...datajoely
02/06/2023, 4:01 PMFilip Panovski
02/06/2023, 4:11 PMSergei Benkovich
02/06/2023, 4:29 PMIvan Danov
02/06/2023, 4:30 PMthen use a different script to load this dill and run inference on the model but get this error.@Sergei Benkovich is that different script outside of the Kedro project? Sometimes when serialising/deserialising Python objects, you need to make sure the classes used are importable in both the serialising and deserialising code. Here you can find a similar SO issue, not Kedro related: https://stackoverflow.com/questions/63101601/import-error-no-module-named-utils-when-using-pickle-load As for @Filip Panovski,s issue for Dask jobs, the underlying reason is the same as Sergei's, but probably has more relevance to Kedro. By default, most Kedro starters follow a
src/
based layout (https://setuptools.pypa.io/en/latest/userguide/package_discovery.html#src-layout) as contrasted to flat-layout https://setuptools.pypa.io/en/latest/userguide/package_discovery.html#flat-layout. In order to run everything normally, Kedro adds the src/
folder to your PYTHON_PATH under the hood with bootstrap_project
. So if you are running your code through Kedro, all your modules are importable normally. However, if you submit to other execution engines, they might have a different entrypoints, processing modules, etc. and they might have different assumptions on how the packages can be imported. I am not sure about Dask, but it is entirely possible that if you run in distributed mode, rather than just parallel, then Dask somehow skips the part of Kedro which adds src/
to the PYTHON_PATH and thus makes none of your code importable. A quick way to fix this is to move all of your code out of src/
and add this to `pyproject.toml`:
[tool.kedro]
source_dir = "."
This way you will force Kedro to use the flat-layout package, which will likely be easier for Dask to pick up.
As a side note, executing Python scripts has implications on what is on the PYTHON_PATH
, so you should always make sure you have only one and only one entrypoint, rather than calling python src/package/script1.py
and then python src/package/script2.py
. Python is a nice scripting language, but the moment you start using packages, your entrypoints start to matter (I am not an expert on the topic, but I suppose it should be documented somewhere what gets added to the import path and what does not depending on how you execute your code).Filip Panovski
02/06/2023, 4:37 PM