Mattis
06/16/2025, 12:55 PMSajid Alam
06/16/2025, 1:36 PMMattis
06/23/2025, 6:45 AMclass DynamicCatalogHook:
@hook_impl
def after_catalog_created(self, catalog: DataCatalog, **kwargs) -> None:
for ....
for n in [name, processed_name, valid_name, warp_name]:
logger.info(f"Registering dataset: {n}")
catalog.add(
name,
BinaryBlobDataSet(
filepath=raw_path,
connection_string=connection_string,
container=container_name,
)
)
and in the pipeline.py i call the following:
def create_pipeline(**kwargs):
for input_name, output_name in VALID_FILES:
pipeline_nodes.append(
node(
func=node1_check_and_process,
inputs=_name_from_blob_processed(input_name),
outputs=_name_from_blob_valid(output_name),
name=f"check_validity_{_name_from_blob_valid(input_name)[-10:]}_node1",
# name=f"valid_node1",
tags=["multi_test_1"],
)
)
return Pipeline(pipeline_nodes, tags="stl_multi_pip")Ravi Kumar Pilla
06/24/2025, 2:51 PMcreate_pipeline() is called in AzureML?
• Are your HOOKS registered in settings.py?
I am not much familiar with AzureML environments but I will try to read through this. Thanks for your patienceMattis
06/25/2025, 7:14 AMMattis
06/25/2025, 2:13 PMCMD ["kedro", "run", "-r", "SequentialRunner"]
and when executing it on AzureML:
kedro azureml run -p multi_pip -s 12341234ce-1234-123r-23ff-1234f231234--aml-env kedro_env
It is still running the pipelines parallely and not sequentially. By that they don´t have references by the time when they´re called.
Is there a way to force AzureML to run it sequentially with the azureml plugin?
Because the
kedro azureml run -r SequentialRunner
seems not to be supported.Ravi Kumar Pilla
06/25/2025, 2:46 PMDatasetNotFoundError or DatasetError as you said its a race condition. Since you said it is working fine locally, I don't think Kedro runners would help here.
I am not well aware of the azureml plugin. cc: @marrrcin have you seen something like this ? Thank youMattis
06/25/2025, 2:50 PMRavi Kumar Pilla
06/25/2025, 10:12 PMnode AA in AzureML states ""pipeline does not contain that AA node", is completely new to me. Can we get on a call tomorrow when you have some time ? (I work in the CT timezone). Thank youmarrrcin
06/26/2025, 6:19 AMkedro run --pipeline=<pipeline name> --node=<name of the node> underneath the hood. Check if running your docker image locally with kedro run --pipeline=<pipeline name> --node=<name of the node that fails in Azure> works for you. If not, then the approach you took for dynamic pipelines is not correct (Kedro in general does not support dynamic pipelines - there are some workarounds though).
There is no way to set SequentialRunner in AzureML - again: each node pushed to AzureML is executed with a command similar to kedro run --pipeline=... --node=... and the ordering is determined by Kedro itself (toposort based on in/out of nodes as Ravi mentioned).