Hi everyone, Are kedro objects such as `session` o...
# questions
f
Hi everyone, Are kedro objects such as
session
or
context
“aware” that they were initialised using
kedro ipython
? As in is there any attribute that can identify this?
d
good question - I don’t think so, but you could try this https://stackoverflow.com/a/39662359/2010808
what are you trying to achieve?
n
I don’t think so either - but would like to understand more to see if there are alternatives
That would come in handy
It’s used by tqdm to handle notebook vs terminal resolving, it “mostly” works 😄
👏 1
f
So with the help of @marrrcin we just released a new version of kedro-azureml that uses hooks to allow pipeline execution locally and remotely. I realised that the hooks wont support the case of using it in
kedro ipython
sessions because the hook we use to update the catalog won’t be executed.
🤔 1
So I want to update the package to support integration with EDA using
kedro ipython/jupyter
d
Also @Nok Lam Chan it’s trivial for us to add a
execution_context
property to the session right?
m
Nice catch @Florian d
f
I was thinking we could use the
after_context_created
hook and expand that for the IPython use case if the context is aware or can be made aware that it is an ipython session. That way in the ipython case we can support datasets that only exist remotely to be downloadable
d
yeah that would work
I think
n
I realised that the hooks wont support the case of using it in
kedro ipython
sessions because the hook we use to update the catalog won’t be executed.
Can you explains a little bit why is this the case?
f
Sure, we modify the catalog in the
before_pipeline_run
hook (setting a
download
flag to True) because for pipeline runs we need to know which datasets are “root” datasets. We obtain this information from the
pipeline
argument in that hook spec. If users use
kedro run
it will download the “root” datasets but leave intermediate and output datasets untouched. (some other things happen too). However, for
kedro ipython
if the dataset was not downloaded in a previous pipeline run it does not exist locally and the catalog/dataset does not know it should be downloaded
hope that makes sense 😓
@marrrcin I’ll try the snippet you posted @datajoely if the session could be aware of this even better
n
FYI we do something similar for Spark to detect the IPython session for Databricks
Did I understand this is for two different use caes? 1. You catalog is coupled with your pipeline, that determines which datasets to be downloaded 2. using
kedro ipython
catalog alone?
f
This would be to allow
kedro ipython
the pipeline use-case already works
n
What would be the
after_context_created
hook looks like? I am unsure why IPython is important here. If an user start from a terminal (not IPython), wouldn’t you still want it to be loaded?
f
hmm so you mean also when creating session/catalog etc from just
python
?
n
yes
f
I rarely do that but you’re right, that might actually also solve another use-case I was pondering
👍🏼 1