Hi we are getting ready to deploy our first kedro project bu Kedro #questions

Hi, we are getting ready to deploy our first kedro...

Christoph Imler

01/22/2024, 12:05 PM

Hi, we are getting ready to deploy our first kedro project, but we see one challenge ahead. We are getting our initial raw dataset from a data service where we need an authorized client to access. When we run our pipeline locally we authorize the client personally is the first node of our pipeline and store it in our catalog, no problems there. But in the service where we are going to host the pipeline, an authorized client is provided, but how do we now pass this inn to our packaged pipeline? Can we pass it as an argument som how?

Copy code

from <package_name>.__main__ import main
from service import client
main(
    ["--pipeline", "__default__", client]
)  # or simply main() if you don't want to provide any arguments

K 1

Nok Lam Chan

01/22/2024, 12:13 PM

Hi @Christoph Imler, this is a great question. Generally, authorisation/connections are done with hook, occasionally they are being passed as an object and pass between nodes. Is it possible to re-written this in a hook fashion? How would it looks like when it's getting run in a local mode? You mention this get stored in the Catalog, can you show the relevant entry in your

catalog.yml

? Is this an object that can be pickled?

Christoph Imler

01/22/2024, 12:30 PM

Here are snippits of the code below. The client is stored as MemoryDataset, I have not tried using pickle, but it might also work. I have not worked with hooks before, will look into that as well. As I am writing, I realise that we are not able to recreate the folder hierarchy used by the data catalog in the service we are going to use, is it possible to change the catalog to run everything in memory when we deploy it and still have the folders localy? nodes.py

Copy code

def get_client():
    client = CogniteClient(
        ClientConfig(
            credentials=oauth_provider,
            project=COGNITE_PROJECT,
            base_url=f"https://{CDF_CLUSTER}.<http://cognitedata.com|cognitedata.com>",
            client_name="test",  # a name to identify your session
            debug=False,
            max_workers=100,
            timeout=60 * 5,  # 5 minutes
        )
    )
    _ = client.iam.token.inspect()
    return client

catalog.yml

Copy code

### Pipeline 1
client:
  type: MemoryDataset

Nok Lam Chan

01/22/2024, 12:49 PM

It's best to separate your environment, i.e. you can have

prod

environment that everything runs in Memory

Nok Lam Chan

01/22/2024, 12:49 PM

So you will be running something like

kedro run --env=prod

K 1

Christoph Imler

01/22/2024, 12:51 PM

But then a Hook is my best chance of getting the client in?

Nok Lam Chan

01/22/2024, 12:53 PM

Correct me if I am wrong, I think it's not so straight forward to pass this into DataCatalog because it requires it to be something saved on disk. Allow injecting data into a `KedroSession` run #2169 is an ticket that discuss this issue but we don't have anything implemented yet. @Takieddine Kadiri may have opinion on this but this may diverge from the standard kedro since this is a plugin perspective.

Nok Lam Chan

01/22/2024, 12:54 PM

Are you accessing this

client

object in your node? or you just need to initialised some connections?

Christoph Imler

01/22/2024, 12:55 PM

I am accessing the client in some of the nodes, but its not saved on disk only memory, but it still needs to be passed in somehow

Nok Lam Chan

01/22/2024, 12:55 PM

If you actually need this in your node. The best option that I can think of is a custom dataset Essentially you want to create the connection conditionally base on the runtime environment pseudocode:

Copy code

if env == prod:
  from service import client
else:
  from local import client
  ...

Nok Lam Chan

01/22/2024, 1:02 PM

Or you just have two different custom datasets. ( I think I prefer this option better since it's cleaner). Creating a custom dataset may sounds intimidating but it's actually quite easy. Since this is just a client you don't need any versioning feature, implementing the

AbstractDataset

interface should be sufficient.

Copy code

# catalog.yml in base
client:
  type: LocalClientDataset

# Catalog.yml in prod
client:
  type: ExternalClientDataset

👍 1

Christoph Imler

01/22/2024, 5:05 PM

What about Kedro-boot? Have you tried it?

Nok Lam Chan

01/22/2024, 5:53 PM

It's a community developed plugin. I haven't tried it myself but I have tagged the author in this thread.

Takieddine Kadiri

01/22/2024, 8:29 PM

Hi Christoph, If i understand correctly, you need to inject data to your kedro project at runtime, and this data is actually a Client object. Here is a solution path that you can explore: • As adviced by @Nok Lam Chan Develop a AbstractDataset that load such Client object. The client object could now be setted as a dataset • Take a look at kedro-boot-examples. It demonstrate through some examples, how you can use kedro-boot to inject data into kedro project and perform multiple pipeline runs.

👍🏼 1

Christoph Imler

01/22/2024, 8:55 PM

Thanks, I was looking into that earlier, but its nice to know that this is the right track. It gets complicated fast.

Christoph Imler

01/24/2024, 1:00 PM

@Takieddine Kadiri Hi, I am building a pipeline to use with Kedro-boot, but get this error:

Copy code

ImportError: cannot import name 'ConfigLoader' from 'kedro.config' (/opt/homebrew/Caskroom/miniforge/base/envs/cogniteds/lib/python3.11/site-packages/kedro/config/__init__.py)
Traceback (most recent call last):

According to the new release note, ConfigLoader has been replaced with OmniaConfigLoader, are you planning to release a new version. Can you open for pull requests to the repo?

Takieddine Kadiri

01/25/2024, 9:15 AM

You can open an issue in the kedro-boot repo. We plan to release a new version in a week or two that suport kedro 0.19.x

👍 1

Christoph Imler

01/25/2024, 9:25 AM

Great! Thanks

40 Views

Open in Slack

Previous Next