Hey there! Quick question about kedro-azureml. W...
# plugins-integrations
a
Hey there! Quick question about kedro-azureml. We are using AzureML, and we'd like to use AzureMLAssetDataset with dataset factories. After a lot of headach and debugging, it seems impossible to use both, as the way credentials are passed to the AzureMLAssetDataset is done through a hook (after_catalog_created), but the issue is that if you use a dataset_patterns (as in, declare your dataset as "{name}.csv" or something similar), the hook is called, but the patterned dataset is not instanciated yet. After all that, a before_node_run is called, and then there is a AzureMLAssetDataset._load() called, but the AzureMLAssetDataset.azure_config setter hasn't been called yet (as it is called only in the after_catalog_created hook). At first glance, it seems like a kedro-azureml issue, as AzureMLAssetDataset._load() can be called without the setter being called when used as a dataset factory. But also, it might be a kedro issue, as I think there should be an obvious way to setup credentials in that specific scenario, and I don't quite see it from the docs on hook either
Trying to make it slightly clearer : AzureMLAssetDataset : Is instanciated, then after_catalog_created is called, and setter for azureml credentials is set, and eventually _load() is called When used as a dataset factory, after_catalog_created is called, then it is instanciated, then _load() is called, and I can't find a good hook in-between to set up credentials
👀 1
@Artur Dobrogowski I did not see anything like that reported in kedro-azureml's github, is that something you are aware/need to be reported as an issue?
j
hi @Alexandre Ouellet, sorry you had a bumpy experience. looks like this might be an issue with dataset factories in general. maybe an alternative to
after_catalog_created
for passing credentials would work? cc @Ravi Kumar Pilla
where have you read the recommendation for using the hook for the credentials?
r
I can see a use case mentioned for regular datasets - https://docs.kedro.org/en/stable/hooks/common_use_cases.html#use-hooks-to-load-external-credentials But for dataset factory as @Alexandre Ouellet mentioned the execution flow is different (dataset resolution happens later). I think there was some work around this by @Elena Khaustova. I am not sure if the resolution happens like regular datasets.
e
Hi @Alexandre Ouellet, credentials are resolved when the catalog is instantiated, regardless of whether you use dataset patterns. So, when you use
after_catalog_created,
credentials are already resolved by this time, and you cannot pass them to the catalog.
The right way to do that is to follow an example that @Ravi Kumar Pilla shared and use
after_context_created
hook: https://docs.kedro.org/en/stable/hooks/common_use_cases.html#use-hooks-to-load-external-credentials
a
right, in that case, I'll open a kedro-azureml issue, as the AzureMLAssetDataset class expects to receive credentials when calling after_catalog_created (through a setter for all datasets of types AzureMLAssetDataset), and expects after_catalog_created to be called before calling "_load()". In the case of dataset factory, the AzureMLAssetDataset is created then _load() is called right after, never having its setter called. It seems like AzureMLAssetDataset needs a bit of rework in how they gather their credentials
👍 1
https://github.com/getindata/kedro-azureml/pull/161 has been created as a fix for kedro-azureml
Essentially, I explicitly set up a "credentials" parameter in the catalog for AzureMLAssetDataset, and implicitly inject the azureml's credential as an "azureml" key in the context in after_context_created. Thus, the credentials are injected at init time, and not at "after_catalog_created", fixing my issue with dataset factories
👏 1
PR is coming soon. In the meantime, I've found a way to also make it work without this PR, as I don't know when it will be integrated
for those wondering, it involves a custom AzureMLAssetDataset to simply add the injected credentials from context, creating a "after_context_created" hook to inject credentials into the context, and to add a "before_pipeline_run" hook to handle not only a dataset, but also the datset factory/pattern dataset.
👀 1
Technically, after that, there shouldn't be a need for the "after_catalog_created" hook
👍 1
I've commented my workaround here : https://github.com/getindata/kedro-azureml/issues/160
it should be fairly straightforward (custom dataset + hook), but let me know in the issue if you have any problems, I'll try to find some time and have a look at it. The main issues were credentials passing, and properly detecting when running as remote(within AzureML's compute) when having dataset pattern (dataset factories).