Good morning, we have a question about Kedro datas...
# questions
j
Good morning, we have a question about Kedro dataset factories, we'd be hoping you'd be able to help. I will put the details in the thread to keep this channel tidy 🙂
h
Someone will reply to you shortly. In the meantime, this might help:
j
We have a custom dataset defined as
Copy code
class MyDataset(SparkDataset):

    def __init__(  # noqa: PLR0913
        self,
        *,
        filepath: str,
        table: str
    ):
        ...
We are then trying to use it in our catalog, but this entry was failing
Copy code
integration.int.{source}.data1:
  type: MyDataset
  filepath: ${globals:integration_source_path}/int/{source}/data1
  table: {source}_data1
with the following error pointing to the
table: {source}_data1
line:
Copy code
An error has occurred: Invalid YAML or JSON file .../catalog.yml, unable to read line 20, position 17.
                    ERROR    An error has occurred: Invalid YAML or   ....py:212
                             JSON file                                          
                             .../catalog.yml,           
                             unable to read line 20, position 17.
We managed to solve it by putting
{source}
at the end of the table name, like this:
Copy code
integration.int.{source}.data1:
  type: MyDataset
  filepath: ${globals:integration_source_path}/int/{source}/data1
  table: data1_{source}
Is this an expected behaviour, or should we raise it as an issue?
j
Hi Jacques, YAML gets confused because it sees the leading
{
and tries (and fails) to parse it as a mapping. So
table: data1_{source}
or
table: "{source}_data1"
should work. and I think no need to raise an issue.
j
Thanks a lot, I'll try with the double quotes to see if it works!
👍 1
it worked, thanks again 🙂