Hi team is it possible to use dataset factories on any keys Kedro #questions

Hi team, is it possible to use dataset factories o...

Melvin Kok

02/27/2024, 8:21 AM

Hi team, is it possible to use dataset factories on any keys other than

filepath

? I am using

spark.SparkJDBCDataSet

so the data location is specified using

table

, not

filepath

. I tried to use dataset factories and got this error:

Copy code

File "/Users/user/.pyenv/versions/3.9.18/envs/env-name/lib/python3.9/site-packages/kedro/framework/cli/catalog.py", line 263, in resolve_patterns
    str(context.project_path) + "/", ds_config["filepath"]
KeyError: 'filepath'

Example of what I'm trying to do:

Copy code

raw.{table_name}:
  <<: *spark_table
  table: dbo.{table_name}

Ankita Katiyar

02/27/2024, 8:26 AM

Hey Melvin, you can use dataset factories with any keys. This is a bug in the

kedro catalog resolve

cli command, the fix for which will be out in Kedro 0.19.3 - but the resolution should work properly when you run the pipelines.

Melvin Kok

02/27/2024, 8:28 AM

I see, is there any other way I can do resolution of the table names without that command then? I'm on 0.18.14 so upgrading to 0.19.3 is not an option unfortunately

Melvin Kok

02/27/2024, 8:33 AM

catalog.list()

in a notebook doesn't show the datasets too 😞

Ankita Katiyar

02/27/2024, 8:34 AM

The command is just to see what your catalog will resolve to, and the bug is only in the CLI command. The dataset itself will be properly resolved when you try to load it.

Ankita Katiyar

02/27/2024, 8:35 AM

The dataset factory datasets are loaded lazily, so if you do

catalog.load()

for the dataset you should see it loaded properly and then

catalog.list()

will have it in the list

👍 1

Melvin Kok

02/27/2024, 8:35 AM

Got it. Thank you very much for the help!

🙌 1

Open in Slack

Previous Next