https://kedro.org/ logo
#questions
Title
# questions
a

Artur Dobrogowski

03/20/2024, 2:53 PM
Are there any special characters that should be avoided in tag names (nodes, etc)? I can't find relevant info in the docs
d

datajoely

03/20/2024, 2:54 PM
I think tag names are pretty generic, just needs to be valid YAML
are you running into issues?
a

Artur Dobrogowski

03/20/2024, 2:54 PM
no but I got PR claiming that colons make problems and it's PR to change colons to dots as separator
and I wonder whats the issue
because I got no more comment
d

datajoely

03/20/2024, 2:55 PM
colons may break the YAML validity
so yes dots or dashes are better
a

Artur Dobrogowski

03/20/2024, 2:55 PM
huh... you just use quotes and its okay
but I guess
d

datajoely

03/20/2024, 2:55 PM
dots in key names may conflict with namespaces
a

Artur Dobrogowski

03/20/2024, 2:56 PM
yes
that's my intuition that it feels wrong to use dots as they are reserved for namespaces
d

datajoely

03/20/2024, 2:56 PM
I think it’s safest to use dashes / underscores here
a

Artur Dobrogowski

03/20/2024, 2:56 PM
😕
I don't like it
I think I'll use slash
It's for tag grouping feature
d

datajoely

03/20/2024, 2:59 PM
image.png
I think I’m okay with it
👍 1
j

Juan Luis

03/20/2024, 3:00 PM
hmmm so we don't have documented anywhere what are the valid chars for names? 🤔 I recall that it's written somewhere
a

Artur Dobrogowski

03/20/2024, 3:00 PM
maybe, I was looking specifically for tags info and there's nothing about it
d

datajoely

03/20/2024, 3:01 PM
Copy code
for tag in self._tags:
            if not re.match(r"[\w\.-]+$", tag):
                raise ValueError(
                    f"'{tag}' is not a valid node tag. It must contain only "
                    f"letters, digits, hyphens, underscores and/or fullstops."
                )
in
node.py
so actually I don’t think slashes are allowed
a

Artur Dobrogowski

03/20/2024, 3:01 PM
yeah neither are colons or dots
weird it must be a new feature limitation as it was working before
d

datajoely

03/20/2024, 3:02 PM
namespaces are the way we intended grouping to be noted, is there any reason that doesn’t work for you purposes?
it’s the same in 0.18.0
a

Artur Dobrogowski

03/20/2024, 3:03 PM
yes, it's for plugin for grouping nodes during node translation for execution environment like kedro->vertexai nodes
d

datajoely

03/20/2024, 3:04 PM
and do namespaces not work for you there?
It would be incredibly helpful to get your thoughts here https://github.com/kedro-org/kedro/issues/3094
this falls under the first point Deciding on granularity when translating to orchestrator DSL
a

Artur Dobrogowski

03/20/2024, 3:07 PM
I already commented there
I'm Lasica on github
❤️ 1
and namespaces are not enough imho
d

datajoely

03/20/2024, 3:08 PM
it would still be very helpful for you to set out why namespace aren’t
a

Artur Dobrogowski

03/20/2024, 3:08 PM
yeah I need to gather my thoughts but that was my impression when I was dealing with it last time hence the feature to group nodes via tags
d

datajoely

03/20/2024, 3:10 PM
this is genuinely incredibly helpful
equally if we need to relax the tag validation this wold help make the argument
j

Juan Luis

03/20/2024, 3:13 PM
to Artur's point, I also don't have an articulate opinion on namespaces yet but I perceive them as "heavy"
after 1.5 years of using Kedro I'm still not sure how to use them correctly
☝️ 1
d

datajoely

03/20/2024, 3:13 PM
I should rephrase - @Ivan Danov designed them for this purpose so it’s helpful to articulate where the friction is
a

Artur Dobrogowski

03/20/2024, 3:15 PM
well they are hierarchical and cumbersome a bit because of that, once you start using them you need to use them everywhere in the pipeline
say I got 5 nodes - 1, 2, 3, 4, 5 and I want to group nodes 1-2, and 4-5.
if I use namespaces then the best would be to namespace whole pipeline and then add subnamespaces for 1,2 and 4,5
and when I want to run the pipeline I need to provide extra parameters - the namespace, which gets longer because I need to add extra steps
that's one point of friction
but maybe it's only in my head
I think I didn't properly consider using namespaces for that because they have some more restrictions and need getting more used to it
I think you can't run nodes without namespace together with namespaced nodes
image.png
looks like this limitation is quite fresh, I implemented that feature like 6 months ago
i

Ivan Danov

03/20/2024, 3:26 PM
• Namespaces were designed to group nodes in an inclusive fashion, i.e. if you want to run a group of nodes as one task in VertexAI/Airflow/etc. • Any solution for this will be by nature hierarchical. Tags on the other hand are inclusive, e.g. you might tag a node to be both for example gpu and largemem node. • How you name your tags has no influence over yaml or namespaces. • Only nodes have tags, but both nodes and datasets have namespaces. • The namespace of a dataset doesn't decide anything in terms of scheduling, but is only needed in order to avoid duplicate dataset names if you are reusing the same pipeline twice in a bigger pipeline.
It seems that for your usecase, you'd be better served by namespaces. Not sure what restrictions are preventing you from using them - in fact a namespace is nothing more than just a prefix to your node name, and that's how they are implemented internally. In the future we are considering to add more restrictions to avoid creating loops between namespaces by accident, which will prevent you from scheduling your namespaces as separate tasks (currently people do that a lot, since it's not prohibited by Kedro).
a

Artur Dobrogowski

03/20/2024, 4:05 PM
I don't disagree that namespaces could probably also handle my case, but I was searching for reasons why I didn't decide to use them or what was the friction, that made me decide to use tags. When I think about it it can be summed up with the following points: • I wasn't very familiar with namespaces yet and didn't know their full purpose • starting to use namespaces has friction in the fact that you need to start using them widely, everywhere. The fact that I need to add namespace to dataset in this case is more of a pain point than a benefit, especially as I was doing it before I was familiar with dataset factories or the feature was not yet released, • we were already using tags to steer behavior of certain nodes in translation process (say assign gpu), so it made sense to expand the functionality of it instead introduce a new mechanism that requires learning • namespaces felt cumbersome in a way that you cant start using them partially, as you can't run default namespace + some other namespace (at least I don't know how). Since the grouping feature was made to be modular/swappable it makes total sense to me now to make namespace grouping feature as an alternative.
anyways @datajoely I believe it would be helpful to have information about tag names limits/convention in the following places in docs: https://docs.kedro.org/en/stable/api/kedro.pipeline.node.html - implies any string is fine https://docs.kedro.org/en/stable/nodes_and_pipelines/nodes.html#how-to-tag-a-node - info box here
1000000 3
j

Juan Luis

03/20/2024, 4:12 PM
absolutely yes - @Artur Dobrogowski do you have a moment to open an issue in https://github.com/kedro-org/kedro/issues/ ?
a

Artur Dobrogowski

03/20/2024, 4:14 PM
can do
i

Ivan Danov

03/20/2024, 4:15 PM
Totally makes sense @Artur Dobrogowski. 1 and 3 seem like situational reasons not to use them, i.e. specific for your project. But it would be useful if you give a couple of examples for 2 and 4, which will help us address that, be it through documentation, a blog post or additional feature. Do you mind sharing more about those two points of yours?
a

Artur Dobrogowski

03/20/2024, 4:16 PM
sure I think I already made an issue about #4 with Marcin
and I can elaborate on #2 another time
i

Ivan Danov

03/20/2024, 4:17 PM
sounds great, thanks a lot 🙇
a

Artur Dobrogowski

03/20/2024, 4:22 PM
n

Nok Lam Chan

03/20/2024, 4:52 PM
Relate to this tangentially: Namespaced Nodes #3679
11 Views