Gianni Giordano
03/11/2025, 9:59 AMDatasetError: An exception occurred when parsing config for dataset
'raw_default_dataset':
Class '<my-project>.datasets.spark_lowercase_dataset.SparkLowerDataset' not found, is
this a typo?
Hint: If you are trying to use a dataset from `kedro-datasets`, make sure that
the package is installed in your current environment. You can do so by running
`pip install kedro-datasets` or `pip install kedro-datasets[<dataset-group>]` to
install `kedro-datasets` along with related dependencies for the specific
dataset group.
Hall
03/11/2025, 9:59 AMMerel
03/11/2025, 12:02 PMGianni Giordano
03/11/2025, 1:17 PMMerel
03/11/2025, 1:32 PMSparkLowerDataset
) isn’t being found when you switch to an Ubuntu image — but it works fine on macOS. Let’s troubleshoot this step by step!
Here’s what might be going wrong:
🛠️ 1. Package Installation Issue
The error suggests the class isn’t found — it could be that your package isn’t installed in the Ubuntu environment.
👉 Check your install
step: Make sure you’re installing your project as a package:
- script: pip install -e .
displayName: "Install project package"
If you have a requirements.txt
or pyproject.toml
, make sure it’s installed too:
- script: pip install -r src/requirements.txt
displayName: "Install requirements"
📂 2. PYTHONPATH Issue
Ubuntu might not pick up the src
directory by default, while macOS sometimes handles that more gracefully.
👉 Add this to your pipeline:
- script: echo "##vso[task.prependpath]$(Pipeline.Workspace)/src"
displayName: "Set PYTHONPATH"
Or explicitly export it:
- script: export PYTHONPATH=$(pwd)/src
displayName: "Export PYTHONPATH"
🧩 3. Case Sensitivity in File/Folder Names
Ubuntu is case-sensitive, while macOS is not.
👉 Double-check your file names:
• Is the file named exactly as expected? (spark_lowercase_dataset.py
)
• Does your import match the case exactly? (from <my_project>.datasets.spark_lowercase_dataset import SparkLowerDataset
)
🏷️ 4. Module Discovery & Project Name
If your project has a custom structure, kedro
might not discover the dataset properly.
👉 Check your pyproject.toml
or `setup.py`: Make sure the package is defined correctly:
[tool.kedro]
package_name = "<my_project>"
And that your setup.py
includes the right packages:
packages=find_packages(where="src"),
package_dir={"": "src"},
🏃 5. Docker or Image Differences
If you’re using Spark, Ubuntu might not have the right libraries installed, or the classpath might differ.
👉 Try adding any OS-level dependencies in your pipeline:
- script: sudo apt-get update && sudo apt-get install -y openjdk-11-jdk
displayName: "Install Java (for Spark)"
Gianni Giordano
03/12/2025, 11:02 AMMerel
03/12/2025, 12:33 PM