I’m having trouble designating multiple AND tags t...
# questions
r
I’m having trouble designating multiple AND tags to run a subset of a pipeline. We have a large pipeline with ~700 nodes and I only want to run 20 or so. Each node has tags like:
Copy code
PIPELINE1
node1
tags=[
    "task1",
    variable,
    model,
    region
]

node2
tags=[
    "task2",
    variable,
    model,
    region
]
I want to run all
node1
s under
PIPELINE1
for a certain variable and model, but over all regions (working on geospatial data). We run from the kedro CLI, launching AWS batch jobs. I found that I could run jobs from a config spec here. So I set up the following `config.yml`:
Copy code
run:
  tags: task1, temperature, GFDL-ESM4 # don't declare region so all regions are run
  pipeline: PIPELINE1
  env: dev
  runner: cra_data_pipelines.runner.BatchRunner
Then I run
kedro run --config=config.yml
. RESULT: It ends up launching all 700 jobs from PIPELINE1 without any distinction for the listed tags above. I of course just want the 20 or so that meet the AND conditions of those three tags. I recall having this issue back in the fall and asked about it, and at the time I don’t think there was any way to run tags with AND logic. I was told that recent versions of kedro updated this, and saw on the config page that it listed multiple tags, so I assumed that’s how it should work. Any help would be great here! Would prefer a simple solution like this rather than looping through each node manually in a shell script.
kedro version is 0.18.3
I’m realizing it’s up to 0.18.9 now, but not seeing anything in release notes specifying declaring multiple tags.
d
I was told that recent versions of kedro updated this, and saw on the config page that it listed multiple tags, so I assumed that’s how it should work.
I'm not personally aware of such a change, and code doesn't seem to reflect it, but I could be wrong. You have a link to the example? For reference: https://github.com/kedro-org/kedro/blob/main/kedro/pipeline/pipeline.py#L665
r
Yes unfortunately it still seems like it’s ANY instead of ALL. Is there any hacky way to get the intersection of a bunch of tags? It seems like it could be a pretty standard use case.
d
You could patch the
only_nodes_with_tags
function; either extend the syntax or switch it to intersection