Hi all I m having an issue importing a custom dataset The pi Kedro #questions

Hi all! I'm having an issue importing a custom dat...

Gianni Giordano

03/11/2025, 9:59 AM

Hi all! I'm having an issue importing a custom dataset. The pipeline is running on a Microsoft hosted agent in Azure DevOps. If use a MacOs image it works perfectly, while if I switch to an ubuntu image I get this error. It seems related to the environment. Weird enough, we're developing on a Linux VM and it works fine locally. Any idea?

Copy code

DatasetError: An exception occurred when parsing config for dataset 
'raw_default_dataset':
Class '<my-project>.datasets.spark_lowercase_dataset.SparkLowerDataset' not found, is 
this a typo?
Hint: If you are trying to use a dataset from `kedro-datasets`, make sure that 
the package is installed in your current environment. You can do so by running 
`pip install kedro-datasets` or `pip install kedro-datasets[<dataset-group>]` to
install `kedro-datasets` along with related dependencies for the specific 
dataset group.

Hall

03/11/2025, 9:59 AM

Someone will reply to you shortly. In the meantime, this might help:

Merel

03/11/2025, 12:02 PM

Hi Gianni, is the dataset installed correctly inside the environment?

Gianni Giordano

03/11/2025, 1:17 PM

Hi Merel, yes, it's installed correctly. As mentioned, if I use MacOs imagine it works fine (same pipeline), I get this error when I switch to an ubuntu image. Everything is the same including the requirements

Merel

03/11/2025, 1:32 PM

This is outside of my own experience, so I'm asking some help from ChatGPT: It sounds like your custom dataset class (

SparkLowerDataset

) isn’t being found when you switch to an Ubuntu image — but it works fine on macOS. Let’s troubleshoot this step by step! Here’s what might be going wrong: 🛠️ 1. Package Installation Issue The error suggests the class isn’t found — it could be that your package isn’t installed in the Ubuntu environment. 👉 Check your
install
step: Make sure you’re installing your project as a package:

Copy code

- script: pip install -e .
  displayName: "Install project package"

If you have a

requirements.txt

pyproject.toml

, make sure it’s installed too:

Copy code

- script: pip install -r src/requirements.txt
  displayName: "Install requirements"

📂 2. PYTHONPATH Issue Ubuntu might not pick up the

src

directory by default, while macOS sometimes handles that more gracefully. 👉 Add this to your pipeline:

Copy code

- script: echo "##vso[task.prependpath]$(Pipeline.Workspace)/src"
  displayName: "Set PYTHONPATH"

Or explicitly export it:

Copy code

- script: export PYTHONPATH=$(pwd)/src
  displayName: "Export PYTHONPATH"

🧩 3. Case Sensitivity in File/Folder Names Ubuntu is case-sensitive, while macOS is not. 👉 Double-check your file names: • Is the file named exactly as expected? (

spark_lowercase_dataset.py

) • Does your import match the case exactly? (

from <my_project>.datasets.spark_lowercase_dataset import SparkLowerDataset

) 🏷️ 4. Module Discovery & Project Name If your project has a custom structure,

kedro

might not discover the dataset properly. 👉 Check your
pyproject.toml
or `setup.py`: Make sure the package is defined correctly:

Copy code

[tool.kedro]
package_name = "<my_project>"

And that your

setup.py

includes the right packages:

Copy code

packages=find_packages(where="src"),
package_dir={"": "src"},

🏃 5. Docker or Image Differences If you’re using Spark, Ubuntu might not have the right libraries installed, or the classpath might differ. 👉 Try adding any OS-level dependencies in your pipeline:

Copy code

- script: sudo apt-get update && sudo apt-get install -y openjdk-11-jdk
  displayName: "Install Java (for Spark)"

Gianni Giordano

03/12/2025, 11:02 AM

Hi Merel, thanks for the help! Unfortunately, none of the suggestions above worked but I ended up fixing it. If anyone else needs the information, it was related to the python environment. I had to delete and recreate it from scratch.

Merel

03/12/2025, 12:33 PM

Ah glad to hear you manage to fix it in the end! Do you know what went wrong in the environment?

3 Views

Open in Slack

Previous Next