Hi I have a requirement to run a pipeline both local and in Kedro #questions

Hi, I have a requirement to run a pipeline both lo...

Lukas Innig

10/23/2023, 11:04 PM

Hi, I have a requirement to run a pipeline both local and in databricks. Databricks doesn’t seem to like relative paths, so I have to specify the absolute path in dbfs in my catalog. But that of course breaks my local runs. Is there a way to manipulate the paths (with a hook?) - I can programatically check if I’m in dbx or local.

Yolan Honoré-Rougé

10/24/2023, 7:07 AM

I think configuration environments are the way to go. Create a

conf/databricks

folder, copy your

catalog.yml

inside and replace relative paths with absolute filepaths. You can now run

kedro run--env=databricks

on databricks or

kedro run

locally without any modifications to your code.

👍 4

👍🏾 1

👍🏼 2

Lukas Innig

10/24/2023, 7:39 AM

Amazing, thank you!

Deepyaman Datta

10/24/2023, 1:18 PM

Just another option, to avoid maintaining a second catalog, if just the prefix differs--you can use globals to specify a base path. For example:

Copy code

# conf/base/globals.yaml
base_path: /mnt/dbfs/

# conf/local/globals.yaml
base_path: data/

# conf/base/catalog.yaml
companies:
  filepath: ${globals:base_path}/01_raw/companies.csv

Or you can even see an example of how to override folder at runtime from CLI in https://docs.kedro.org/en/stable/configuration/advanced_configuration.html#how-to-override-configuration-with-[…]rameters-with-the-omegaconfigloader

👍 2

👍🏼 1

Lukas Innig

10/24/2023, 11:51 PM

wow! that is really cool. I’m learning something new every day here 🤩

🙌 1

Open in Slack

Previous Next