Sergei Benkovich02/05/2023, 8:05 PM
any suggestions on how to handle it?
ModuleNotFoundError: No module named 'pipelines'
datajoely02/06/2023, 9:39 AM
Sergei Benkovich02/06/2023, 9:42 AM
datajoely02/06/2023, 10:53 AM
Sergei Benkovich02/06/2023, 10:57 AM
datajoely02/06/2023, 10:57 AM
Sergei Benkovich02/06/2023, 2:36 PM
datajoely02/06/2023, 3:26 PM
Filip Panovski02/06/2023, 3:47 PM
error). One temporary workaround was to define the functions being called as inner functions, so instead of:
No module named 'my_pipeline'
def A(df): # ... A def B(df): # ... B A(df)
The assumption was that the code de/serialization wasn't quite working for some reason, though we used cloudpickle (the default?). Packaging stuff into wheels and importing those instead would probably work too. Do note that using these inner functions is a bit slower than calling separate functions, though we haven't benchmarked it accurately. I'm not quite sure if this helps, but it may be worth a shot if this happens when running the nodes themselves...
def B(df): def A(df): # ... A # ... B A(df)
datajoely02/06/2023, 4:01 PM
Filip Panovski02/06/2023, 4:11 PM
Sergei Benkovich02/06/2023, 4:29 PM
Ivan Danov02/06/2023, 4:30 PM
then use a different script to load this dill and run inference on the model but get this error.@Sergei Benkovich is that different script outside of the Kedro project? Sometimes when serialising/deserialising Python objects, you need to make sure the classes used are importable in both the serialising and deserialising code. Here you can find a similar SO issue, not Kedro related: https://stackoverflow.com/questions/63101601/import-error-no-module-named-utils-when-using-pickle-load As for @Filip Panovski,s issue for Dask jobs, the underlying reason is the same as Sergei's, but probably has more relevance to Kedro. By default, most Kedro starters follow a
based layout (https://setuptools.pypa.io/en/latest/userguide/package_discovery.html#src-layout) as contrasted to flat-layout https://setuptools.pypa.io/en/latest/userguide/package_discovery.html#flat-layout. In order to run everything normally, Kedro adds the
folder to your PYTHON_PATH under the hood with
. So if you are running your code through Kedro, all your modules are importable normally. However, if you submit to other execution engines, they might have a different entrypoints, processing modules, etc. and they might have different assumptions on how the packages can be imported. I am not sure about Dask, but it is entirely possible that if you run in distributed mode, rather than just parallel, then Dask somehow skips the part of Kedro which adds
to the PYTHON_PATH and thus makes none of your code importable. A quick way to fix this is to move all of your code out of
and add this to `pyproject.toml`:
This way you will force Kedro to use the flat-layout package, which will likely be easier for Dask to pick up. As a side note, executing Python scripts has implications on what is on the
[tool.kedro] source_dir = "."
, so you should always make sure you have only one and only one entrypoint, rather than calling
. Python is a nice scripting language, but the moment you start using packages, your entrypoints start to matter (I am not an expert on the topic, but I suppose it should be documented somewhere what gets added to the import path and what does not depending on how you execute your code).
Filip Panovski02/06/2023, 4:37 PM