Hi, is it possible to use a dataset factory in con...
# questions
k
Hi, is it possible to use a dataset factory in config resolver? An example:
Copy code
"{name}_feature":
  type: pandas.ParquetDataset
  filepath: data/04_feature/{name}_feature.parquet
  metadata:
    pandera:
      schema: ${pa.python:my_kedro_project.pipelines.feature_preprocessing.schemas.{name}_feature_schema}
The above gives me this:
Copy code
omegaconf.errors.GrammarParseError: mismatched input '{' expecting BRACE_CLOSE
    full_key: {name}_feature.metadata.pandera.schema
    object_type=dict
👍 2
K 2
d
it’s a good question - the error would suggest not, @Ankita Katiyar any ideas?
a
no, unfortunately, the config is resolved by the
OmegaConfigLoader
before the dataset factory values are filled in by the
DataCatalog
so this probably isn’t possible
m
Copy code
"{name}_feature":
  type: pandas.ParquetDataset
  filepath: data/04_feature/{name}_feature.parquet
  metadata:
    schema_name: {name}
    pandera:
      schema: ${custom_resolver:${..schema_name}}
I'm wondering whether this kind of hack would work 🤔
a
I doubt it since when OCF will try to resolve the value of the custom resolver, it’s still be a string
"{name}"
d
I think it’s an order of execution thing
OCL -> DC -> Rendered config
and the dataset factory happens middway
m
sadcat2
k
ah, that's a pity. Thanks everyone!
d
It’s not super elegant, but what you could do is use an
after_catalog_updated
hook to mutate / replace the datasets dynamically
k
and use Marcin's solution with that?
d
no it would have to be quite custom
I’m only 95% sure it will work too
👍 1
k
cool, I will come back with more info when I come around to it
🤞 1
m
@Kacper Leśniara did you find an elegant solution for this? I've found myself in the same situation and it would be great to know how you approached your solution in the end
k
I didn't unfortunately 😔
🙂 1
😔 1
n
Too bad, I also have this usecase where I've got a map id -> path, and I though I could register my map as a dict and then do:
Copy code
"contents_{content_id}_data#json":
  type: kedro_datasets.json.json_dataset.JSONDataset
  versioned: True
  filepath: data/bronze/contents/${content_mapping:{content_id}}.json
but got the same error 😞
🙏 1
m
@datajoely and kedro team. I think this is a useful discussion and since slack hides messages after 90 days it would be good to keep track of it to help future people that struggle with the same. Would it make sense to open a github issue to have this documented in the repo?
d
It should've available on our searchable backup https://linen-slack.kedro.org/c/questions
❤️ 1
But an issue is also a good idea
m
I added it here, I think it's the issue that is better related to this topic. So that people who are on github but not slack can track it too
💯 1