Hi, I have a requirement to run a pipeline both lo...
# questions
l
Hi, I have a requirement to run a pipeline both local and in databricks. Databricks doesn’t seem to like relative paths, so I have to specify the absolute path in dbfs in my catalog. But that of course breaks my local runs. Is there a way to manipulate the paths (with a hook?) - I can programatically check if I’m in dbx or local.
y
I think configuration environments are the way to go. Create a
conf/databricks
folder, copy your
catalog.yml
inside and replace relative paths with absolute filepaths. You can now run
kedro run--env=databricks
on databricks or
kedro run
locally without any modifications to your code.
👍🏼 2
👍 4
👍🏾 1
l
Amazing, thank you!
d
Just another option, to avoid maintaining a second catalog, if just the prefix differs--you can use globals to specify a base path. For example:
Copy code
# conf/base/globals.yaml
base_path: /mnt/dbfs/

# conf/local/globals.yaml
base_path: data/

# conf/base/catalog.yaml
companies:
  filepath: ${globals:base_path}/01_raw/companies.csv
Or you can even see an example of how to override folder at runtime from CLI in https://docs.kedro.org/en/stable/configuration/advanced_configuration.html#how-to-override-configuration-with-[…]rameters-with-the-omegaconfigloader
👍 2
👍🏼 1
l
wow! that is really cool. I’m learning something new every day here 🤩
🙌 1