Hey team I m migrating a legacy data engineering pipeline to Kedro #questions

Hey team, I'm migrating a legacy data engineering...

Pedro Sousa Silva

01/29/2024, 2:47 PM

Hey team, I'm migrating a legacy data engineering pipeline to kedro. We had created classes in the past to handle the connection with these sources, e.g. SnowflakeInstance, SharePointSite etc, functioning as clients to these sources with custom read/transform/write functionality. *The problem: W*e're still using the node functions to initialize the clients from those configuration arguments (printscreen 1) My question: Is there an easy way to take the client as the node input (printscreen 2)? How can I define it in the catalog?

Sajid Alam

01/30/2024, 1:12 PM

You can use Kedro's config for managing env variables by storing it in the

conf

directory with subdirectories for different environments . See the docs for it here.

Pedro Sousa Silva

01/30/2024, 1:39 PM

Thanks @Sajid Alam, but my question is not on how to load env variables from the catalog, rather on how to load a client (a custom python object) from the catalog, such as the SnowflakeInstance on my example

Sajid Alam

01/30/2024, 1:54 PM

Right I see, it sounds like you already have some classes that sound like datasets,

SnowflakeInstance

and

SharePointSite

, I think these need to turn into custom datasets in Kedro then you can define them in the catalog.yml and load it into nodes directly. You can follow this guide to make these into kedro custom datasets.

Pedro Sousa Silva

01/30/2024, 10:44 PM

Thanks! So I guess one can only load simple objects like strings, integers, floats from the params or other config files, or custom datasets through the catalog. Is that correct?

👍 1

6 Views

Open in Slack

Previous Next