Riley Brady
06/02/2023, 4:44 PMPIPELINE1
node1
tags=[
"task1",
variable,
model,
region
]
node2
tags=[
"task2",
variable,
model,
region
]
I want to run all node1
s under PIPELINE1
for a certain variable and model, but over all regions (working on geospatial data).
We run from the kedro CLI, launching AWS batch jobs. I found that I could run jobs from a config spec here. So I set up the following `config.yml`:
run:
tags: task1, temperature, GFDL-ESM4 # don't declare region so all regions are run
pipeline: PIPELINE1
env: dev
runner: cra_data_pipelines.runner.BatchRunner
Then I run kedro run --config=config.yml
.
RESULT: It ends up launching all 700 jobs from PIPELINE1 without any distinction for the listed tags above. I of course just want the 20 or so that meet the AND conditions of those three tags.
I recall having this issue back in the fall and asked about it, and at the time I don’t think there was any way to run tags with AND logic. I was told that recent versions of kedro updated this, and saw on the config page that it listed multiple tags, so I assumed that’s how it should work.
Any help would be great here! Would prefer a simple solution like this rather than looping through each node manually in a shell script.Deepyaman Datta
06/02/2023, 8:59 PMI was told that recent versions of kedro updated this, and saw on the config page that it listed multiple tags, so I assumed that’s how it should work.I'm not personally aware of such a change, and code doesn't seem to reflect it, but I could be wrong. You have a link to the example? For reference: https://github.com/kedro-org/kedro/blob/main/kedro/pipeline/pipeline.py#L665
Riley Brady
06/02/2023, 9:01 PMDeepyaman Datta
06/02/2023, 9:03 PMonly_nodes_with_tags
function; either extend the syntax or switch it to intersection