hey everyone, Quick question regarind using data ...
# questions
f
hey everyone, Quick question regarind using data catalog with python api. Following this documentation, I have the following questions • Should catalog.py be in conf/ folder? (same as where catalog.yaml is) • Does that work the same with nodes when i do
kedro run
or do i have to explicitly use this python object and load the data on my own? • Is it possible to define some sections in the yaml file and other parts in python? I know i can do something in the hooks but I wanted to check if there is way where this catalog variable would be accessible by the user? Thanks in advance! 🙂
m
Hi Fazil, can you explain what you would like to achieve?
f
Hi Merel, Instead of defining catalog.yaml, I wanted to know how defining catalog.py should be done. There was not much example on usage of python file in the documentation. Basically, if I use python, I have to pass my new
catalog
variable to kedro. That's what i was trying to find out.
m
Why do you want to use python instead of yaml? I think there’s a bit of a misunderstanding on how the catalog should be used, so it would be helpful for me to understand what you’re trying to do 🙂
In our docs when it says
catalog.py
what’s meant is that you can load the catalog itself directly in python. It doesn’t mean that your catalog can be python. The example in the docs:
Copy code
from <http://kedro.io|kedro.io> import DataCatalog
from kedro_datasets.pandas import (
    CSVDataSet,
    SQLTableDataSet,
    SQLQueryDataSet,
    ParquetDataSet,
)

io = DataCatalog(
    {
        "bikes": CSVDataSet(filepath="../data/01_raw/bikes.csv"),
        "cars": CSVDataSet(filepath="../data/01_raw/cars.csv", load_args=dict(sep=",")),
        "cars_table": SQLTableDataSet(
            table_name="cars", credentials=dict(con="sqlite:///kedro.db")
        ),
        "scooters_query": SQLQueryDataSet(
            sql="select * from cars where gear=4",
            credentials=dict(con="sqlite:///kedro.db"),
        ),
        "ranked": ParquetDataSet(filepath="ranked.parquet"),
    }
)
shows how to use the catalog within python code
f
Sure, First Im trying to grasp the possibilities of what I can do in kedro. Secondly, I wanted to see if I can make catalog more dynamic in python. One of my use case is that I need the filepath to be a bit more generic - meaning I can't put a variable in globals.yml - so I was thinking maybe pythonic way would be to do it.
Example you posted is fine. What Im wondering when I create this, if I define
node(example_func, inputs='scooters_query')
will it work directly or what do I need to do it make it work? Using context hook to inject my catalog to KedroContext?
m
Ahh right, I think I know what you mean. The example provided is purely to let you load and save data directly to the catalog. It’s more meant for experimentation/debugging and is not connected to your “regular”
kedro run
. You could create a custom
DataCatalog
class and then register it in
settings.py
https://docs.kedro.org/en/latest/kedro_project_setup/settings.html#application-settings. But that would require overwriting the
DataCatalog
which isn’t exactly what you want I think?
Do I understand correctly that you would prefer your catalog config to be in a python format instead of yaml?
f
I see. It's not exactly my preference but i know some people don't like YAML templating too much, prefer python way to do it. I'll check out the internals of
DataCatalog
but i simpled wanted to see if I have the
io
variable in your code snippet above, what would be the way to hook this up to
kedro run ...
Btw, I've been going through the doc a lot to learn more about kedro which I like it very much so far. What would be nice is that if we could have a Kedro Flow to show high level of how everything works together so that if people want to change some components they can know which part they need to adapt. I drew this on my own 😅
m
We used to have this! But I’m not quite sure where the architecture diagram went. Do you know @Jo Stichbury?
j
I think it's on the wiki... 🏃‍♀️ to check.
Take a look at this older version of the docs and I'll fix the wiki https://docs.kedro.org/en/0.17.2/12_faq/02_architecture_overview.html
🥳 1
f
ah thanks for this! It could be nice to put this maybe on the website or a link so the curios mind can find it. I wasn't able to find this page either so far 😄
n
It’s definitely in the docs I use it often - let me find it
f
It's not present in the latest doc 😄 but thanks for the links 👍
👍🏼 1
n
Yes, we move this to GitHub wiki recently but I argue we should keep it in docs or at least a pointer to the wiki.
👍 1