Good morning I am looking into dataset factories ...
# questions
i
Good morning I am looking into dataset factories and I have a question: I have dataset names which I define like so: country_technology_granularity__model___name__model_object I am trying to capture this in a dataset factory, and (I havent tried it yet but I believe) this should work:
Copy code
{signal_name}__{model_name}__model_object:
  filepath: <abfs://container/{signal_name}/{model_name}/fitted_model.pkl>
My question is: would it be possible to transform that signal name? I would like to replace the "_"s with "/"s:
signal_name.replace("_", "/")
) Is this something that can somehow be done using the omegaconf capabilities?
At worst I could set the group capture at each individual country, technology, etc, But if possible I would like to keep that flexible to add or remove levels of granularity
a
So omegaconf loads the config as it is, it doesn’t fill in the dataset factories placeholders. That happens later in the catalog when the datasets are being loaded. My guess is that you could try doing this with a hook?
i
Thanks Ankita. I think hooks are gonna be overkill for what I want to do, I will probably just stick to setting the individual group capture for the time being if I'm not able to somehow do the replacing from the catalog itself
Copy code
"{signal_name}__{model_name}__model_obj":
  filepath: '<abfs://container/{signal_name.replace(>"_", "/")}/{model_name}/fitted_model.pickle'
This raised
Copy code
│   460 │   │   │   ]                                                                              │
│   461 │   │   elif isinstance(config, str) and "}" in config:                                    │
│   462 │   │   │   try:                                                                           │
│ ❱ 463 │   │   │   │   config = str(config).format_map(result.named)                              │
│   464 │   │   │   except KeyError as exc:                                                        │
│   465 │   │   │   │   raise DatasetError(                                                        │
│   466 │   │   │   │   │   f"Unable to resolve '{config}' from the pattern '{matched_pattern}'.   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'str' object has no attribute 'replace("_", "/")'
a
Could you try -
Copy code
filepath: '<abfs://container/{signal_name}.replace(>"_", "/")/{model_name}/fitted_model.pickle'
Or something like -
Copy code
'<abfs://container/>' + '{signal_name}'.replace("_", "/") + '/{model_name}/fitted_model.pickle'
I’m not sure it would work but right now the format_map function is not recognising
{signal_name.replace("_", "/")}
as a placeholder
i
1 results in "container
/
signal_name.replace("_", "
/
")
/
...
/
... " 😆 2 says invalid yaml Thanks again Ankita 🙂 For now I will just add the individual levels as placeholders which gets me 90% of the way there. I'll leave this up in case somebody else has any ideas, but that solution works for me.
a
haha I tried it too, doesn’t work. Might be messy naming but you can have
'/'
in your dataset names 😅
i
you can have
'/'
in your dataset names
Oh that's something new from 0.18, right? In 0.17 we couldn't, there was an explicit regex check at some point for dataset names
a
Not a 100% sure when it changed but I did try it with the latest version and I believe it should be possible with 0.18 too