Hi everyone Are kedro objects such as `session` or `context` Kedro #questions

Hi everyone, Are kedro objects such as `session` o...

Florian d

08/11/2023, 12:36 PM

Hi everyone, Are kedro objects such as

session

context

“aware” that they were initialised using

kedro ipython

? As in is there any attribute that can identify this?

datajoely

08/11/2023, 12:37 PM

good question - I don’t think so, but you could try this https://stackoverflow.com/a/39662359/2010808

datajoely

08/11/2023, 12:37 PM

what are you trying to achieve?

Nok Lam Chan

08/11/2023, 12:39 PM

I don’t think so either - but would like to understand more to see if there are alternatives

marrrcin

08/11/2023, 12:40 PM

https://github.com/tqdm/tqdm/blob/4c956c20b83be4312460fc0c4812eeb3fef5e7df/tqdm/autonotebook.py#L12

marrrcin

08/11/2023, 12:40 PM

That would come in handy

marrrcin

08/11/2023, 12:40 PM

It’s used by tqdm to handle notebook vs terminal resolving, it “mostly” works 😄

👏 1

Florian d

08/11/2023, 12:41 PM

So with the help of @marrrcin we just released a new version of kedro-azureml that uses hooks to allow pipeline execution locally and remotely. I realised that the hooks wont support the case of using it in

kedro ipython

sessions because the hook we use to update the catalog won’t be executed.

🤔 1

Florian d

08/11/2023, 12:41 PM

So I want to update the package to support integration with EDA using

kedro ipython/jupyter

datajoely

08/11/2023, 12:41 PM

Also @Nok Lam Chan it’s trivial for us to add a

execution_context

property to the session right?

marrrcin

08/11/2023, 12:43 PM

Nice catch @Florian d

Florian d

08/11/2023, 12:44 PM

I was thinking we could use the

after_context_created

hook and expand that for the IPython use case if the context is aware or can be made aware that it is an ipython session. That way in the ipython case we can support datasets that only exist remotely to be downloadable

datajoely

08/11/2023, 12:44 PM

yeah that would work

datajoely

08/11/2023, 12:44 PM

I think

Nok Lam Chan

08/11/2023, 12:44 PM

I realised that the hooks wont support the case of using it in
kedro ipython
sessions because the hook we use to update the catalog won’t be executed.

Can you explains a little bit why is this the case?

Florian d

08/11/2023, 12:48 PM

Sure, we modify the catalog in the

before_pipeline_run

hook (setting a

download

flag to True) because for pipeline runs we need to know which datasets are “root” datasets. We obtain this information from the

pipeline

argument in that hook spec. If users use

kedro run

it will download the “root” datasets but leave intermediate and output datasets untouched. (some other things happen too). However, for

kedro ipython

if the dataset was not downloaded in a previous pipeline run it does not exist locally and the catalog/dataset does not know it should be downloaded

Florian d

08/11/2023, 12:49 PM

hope that makes sense 😓

Florian d

08/11/2023, 12:49 PM

https://github.com/getindata/kedro-azureml/blob/develop/kedro_azureml/hooks.py

Florian d

08/11/2023, 12:50 PM

@marrrcin I’ll try the snippet you posted @datajoely if the session could be aware of this even better

Nok Lam Chan

08/11/2023, 12:53 PM

FYI we do something similar for Spark to detect the IPython session for Databricks

Nok Lam Chan

08/11/2023, 12:54 PM

Did I understand this is for two different use caes? 1. You catalog is coupled with your pipeline, that determines which datasets to be downloaded 2. using

kedro ipython

catalog alone?

Florian d

08/11/2023, 12:58 PM

This would be to allow

kedro ipython

the pipeline use-case already works

Nok Lam Chan

08/11/2023, 12:59 PM

What would be the

after_context_created

hook looks like? I am unsure why IPython is important here. If an user start from a terminal (not IPython), wouldn’t you still want it to be loaded?

Florian d

08/11/2023, 1:10 PM

hmm so you mean also when creating session/catalog etc from just

python

Nok Lam Chan

08/11/2023, 1:17 PM

yes

Florian d

08/11/2023, 1:18 PM

I rarely do that but you’re right, that might actually also solve another use-case I was pondering

👍🏼 1

2 Views

Open in Slack

Previous Next