Hi kedro community!! I have encountered an issue w...
# questions
l
Hi kedro community!! I have encountered an issue when working with kedro within a marimo notebook (I think the issue would be just the same in a jupyter notebook). Basically, I initially was working on my notebook by calling it from the command line from the kedro project root folder, something like:
marimo edit notebooks/nb.py
where my folder structure is something like:
Copy code
├── README.md
├── conf
│   ├── base
│   ├── local
├── data ...
├── notebooks
│   ├── nb.py
├── pyproject.toml
├── requirements.txt
├── src ... 
└── tests ...
Within
nb.py
I have a cell that runs:
Copy code
from kedro.io import DataCatalog
from kedro.config import OmegaConfigLoader
from kedro.framework.project import settings
from pathlib import Path
conf_loader = OmegaConfigLoader(
    conf_source=Path(__file__).parent /settings.CONF_SOURCE,
    default_run_env = "base"
)

catalog = DataCatalog.from_config(conf_loader["catalog"], credentials=conf_loader["credentials"])
and later...
Copy code
weekly_sales = pl.from_pandas(
    catalog.load("mytable")
)
The issue is, within the
catalog
all the filepaths are absolute and assume that wherever the catalog is being used from is using the Kedro project root level. the
conf_source
argument in the
OmegaConfigLoader
instance is an absolute path (e.g.
conf/base/sql/somequery.sql
or
data/mydataset.csv
so if I run my notebook from the root of my kedro project, all is fine but I were to run:
cd notebooks; marimo edit nb.py
then
catalog.load
will attempt to load the query or dataset from
notebooks/conf/base/sql/somequery.sql
Is it clear? PD: please don't ask me why there is SQL code within the conf folder 😅, it's moving soon
h
Someone will reply to you shortly. In the meantime, this might help:
j
hi @Luis Chaves Rodriguez! I think your message is incomplete? or otherwise could you clarify what the issue is? solved
l
Yes sorry, I pressed Enter by mistake as I was writing it, it's complete now, let me know if it's unclear @Juan Luis, the main issue is how the catalog defines the paths to the files that the catalog items are based on I believe
👍🏼 1
I see that the problem is solved in jupyter notebooks by using magic, but I wonder if there's a magic-free solution?
r
hi this a known issue, and looks like the solution for now was to improve our error messaging - https://github.com/kedro-org/kedro/issues/3248. Maybe you can raise this issue on github, and we can revisit.
l
but isn't this a solved issue in Jupyter? It should be possible to reproduce in other environments no? Couldn't we get the project root/session/context programmatically just like it happens with the magic?
j
the story of relative filepaths in the catalog is a bit tricky unfortunately. indeed, using the
%load_ext kedro
works, but there's not a good magic-free solution. @Luis Chaves Rodriguez one thing you can try is to use runtime parameters. in your dataset:
Copy code
ds:
  filepath: ${runtime_params:project_root}/data/01_raw/thing.csv
and then you can specify it as follows:
Copy code
config_loader = OmegaConfigLoader(..., runtime_params={"project_root": Path(...).to_posix()})
the missing bit then is how to find the
Path(...)
to the project root. https://docs.kedro.org/en/latest/configuration/advanced_configuration.html#how-to-override-configuration-with-[…]rameters-with-the-omegaconfigloader does this make sense?
l
that makes sense, so every file, based on its location in the project would need to have a different
Path(...)
correct? Would the
catalog.load
respect that? In my example, would it be the following?
Copy code
conf_loader = OmegaConfigLoader(
    ...,
    default_run_env = "base",
    runtime_params = {"project_root": Path(__file__).parent }
)
If you had to start from scratch how would you fix this? How do other similar projects approach this?
j
catalog.load will respect it because it will know nothing about it. you’ll instantiate the catalog from the config loader. the translation happens at that step. so it’s a matter of properly prefixing your file paths in the catalog and then instantiating the config loader with the right runtime_params. you can probably wrap that in a function if you’re using it more than once
👌🏼 1
l
What about this?
If you had to start from scratch how would you fix this? How do other similar projects approach this?
Hey @Juan Luis why not use the
_find_kedro_project
function for this? https://github.com/kedro-org/kedro/blob/46259b9f5b89a226d47e2119afb40ad7b4fa5e63/kedro/utils.py#L66
j
maybe! @Luis Chaves Rodriguez have you tried it? btw, I just read https://github.com/kedro-org/kedro/issues/4440, thanks for opening it 💯
👍🏼 1
l
I tried it briefly on Friday but the project I’m working on is not properly set up as python package, so I got some errors at import. I need to clean up some of how the repo was initially set up by the people that came before me, I’ll report back on this next week
👍🏼 1