Are there any special characters that should be av...
# questions
a
Are there any special characters that should be avoided in tag names (nodes, etc)? I can't find relevant info in the docs
d
I think tag names are pretty generic, just needs to be valid YAML
are you running into issues?
a
no but I got PR claiming that colons make problems and it's PR to change colons to dots as separator
and I wonder whats the issue
because I got no more comment
d
colons may break the YAML validity
so yes dots or dashes are better
a
huh... you just use quotes and its okay
but I guess
d
dots in key names may conflict with namespaces
a
yes
that's my intuition that it feels wrong to use dots as they are reserved for namespaces
d
I think it’s safest to use dashes / underscores here
a
😕
I don't like it
I think I'll use slash
It's for tag grouping feature
d
image.png
I think I’m okay with it
👍 1
j
hmmm so we don't have documented anywhere what are the valid chars for names? 🤔 I recall that it's written somewhere
a
maybe, I was looking specifically for tags info and there's nothing about it
d
Copy code
for tag in self._tags:
            if not re.match(r"[\w\.-]+$", tag):
                raise ValueError(
                    f"'{tag}' is not a valid node tag. It must contain only "
                    f"letters, digits, hyphens, underscores and/or fullstops."
                )
in
node.py
so actually I don’t think slashes are allowed
a
yeah neither are colons or dots
weird it must be a new feature limitation as it was working before
d
namespaces are the way we intended grouping to be noted, is there any reason that doesn’t work for you purposes?
it’s the same in 0.18.0
a
yes, it's for plugin for grouping nodes during node translation for execution environment like kedro->vertexai nodes
d
and do namespaces not work for you there?
It would be incredibly helpful to get your thoughts here https://github.com/kedro-org/kedro/issues/3094
this falls under the first point Deciding on granularity when translating to orchestrator DSL
a
I already commented there
I'm Lasica on github
❤️ 1
and namespaces are not enough imho
d
it would still be very helpful for you to set out why namespace aren’t
a
yeah I need to gather my thoughts but that was my impression when I was dealing with it last time hence the feature to group nodes via tags
d
this is genuinely incredibly helpful
equally if we need to relax the tag validation this wold help make the argument
j
to Artur's point, I also don't have an articulate opinion on namespaces yet but I perceive them as "heavy"
after 1.5 years of using Kedro I'm still not sure how to use them correctly
☝️ 1
d
I should rephrase - @Ivan Danov designed them for this purpose so it’s helpful to articulate where the friction is
a
well they are hierarchical and cumbersome a bit because of that, once you start using them you need to use them everywhere in the pipeline
say I got 5 nodes - 1, 2, 3, 4, 5 and I want to group nodes 1-2, and 4-5.
if I use namespaces then the best would be to namespace whole pipeline and then add subnamespaces for 1,2 and 4,5
and when I want to run the pipeline I need to provide extra parameters - the namespace, which gets longer because I need to add extra steps
that's one point of friction
but maybe it's only in my head
I think I didn't properly consider using namespaces for that because they have some more restrictions and need getting more used to it
I think you can't run nodes without namespace together with namespaced nodes
image.png
looks like this limitation is quite fresh, I implemented that feature like 6 months ago
i
• Namespaces were designed to group nodes in an inclusive fashion, i.e. if you want to run a group of nodes as one task in VertexAI/Airflow/etc. • Any solution for this will be by nature hierarchical. Tags on the other hand are inclusive, e.g. you might tag a node to be both for example gpu and largemem node. • How you name your tags has no influence over yaml or namespaces. • Only nodes have tags, but both nodes and datasets have namespaces. • The namespace of a dataset doesn't decide anything in terms of scheduling, but is only needed in order to avoid duplicate dataset names if you are reusing the same pipeline twice in a bigger pipeline.
It seems that for your usecase, you'd be better served by namespaces. Not sure what restrictions are preventing you from using them - in fact a namespace is nothing more than just a prefix to your node name, and that's how they are implemented internally. In the future we are considering to add more restrictions to avoid creating loops between namespaces by accident, which will prevent you from scheduling your namespaces as separate tasks (currently people do that a lot, since it's not prohibited by Kedro).
a
I don't disagree that namespaces could probably also handle my case, but I was searching for reasons why I didn't decide to use them or what was the friction, that made me decide to use tags. When I think about it it can be summed up with the following points: • I wasn't very familiar with namespaces yet and didn't know their full purpose • starting to use namespaces has friction in the fact that you need to start using them widely, everywhere. The fact that I need to add namespace to dataset in this case is more of a pain point than a benefit, especially as I was doing it before I was familiar with dataset factories or the feature was not yet released, • we were already using tags to steer behavior of certain nodes in translation process (say assign gpu), so it made sense to expand the functionality of it instead introduce a new mechanism that requires learning • namespaces felt cumbersome in a way that you cant start using them partially, as you can't run default namespace + some other namespace (at least I don't know how). Since the grouping feature was made to be modular/swappable it makes total sense to me now to make namespace grouping feature as an alternative.
anyways @datajoely I believe it would be helpful to have information about tag names limits/convention in the following places in docs: https://docs.kedro.org/en/stable/api/kedro.pipeline.node.html - implies any string is fine https://docs.kedro.org/en/stable/nodes_and_pipelines/nodes.html#how-to-tag-a-node - info box here
1000000 3
j
absolutely yes - @Artur Dobrogowski do you have a moment to open an issue in https://github.com/kedro-org/kedro/issues/ ?
a
can do
i
Totally makes sense @Artur Dobrogowski. 1 and 3 seem like situational reasons not to use them, i.e. specific for your project. But it would be useful if you give a couple of examples for 2 and 4, which will help us address that, be it through documentation, a blog post or additional feature. Do you mind sharing more about those two points of yours?
a
sure I think I already made an issue about #4 with Marcin
and I can elaborate on #2 another time
i
sounds great, thanks a lot 🙇
a
n
Relate to this tangentially: Namespaced Nodes #3679