Hi team, is it possible to use dataset factories o...
# questions
m
Hi team, is it possible to use dataset factories on any keys other than
filepath
? I am using
spark.SparkJDBCDataSet
so the data location is specified using
table
, not
filepath
. I tried to use dataset factories and got this error:
Copy code
File "/Users/user/.pyenv/versions/3.9.18/envs/env-name/lib/python3.9/site-packages/kedro/framework/cli/catalog.py", line 263, in resolve_patterns
    str(context.project_path) + "/", ds_config["filepath"]
KeyError: 'filepath'
Example of what I'm trying to do:
Copy code
raw.{table_name}:
  <<: *spark_table
  table: dbo.{table_name}
a
Hey Melvin, you can use dataset factories with any keys. This is a bug in the
kedro catalog resolve
cli command, the fix for which will be out in Kedro 0.19.3 - but the resolution should work properly when you run the pipelines.
m
I see, is there any other way I can do resolution of the table names without that command then? I'm on 0.18.14 so upgrading to 0.19.3 is not an option unfortunately
catalog.list()
in a notebook doesn't show the datasets too 😞
a
The command is just to see what your catalog will resolve to, and the bug is only in the CLI command. The dataset itself will be properly resolved when you try to load it.
The dataset factory datasets are loaded lazily, so if you do
catalog.load()
for the dataset you should see it loaded properly and then
catalog.list()
will have it in the list
👍 1
m
Got it. Thank you very much for the help!
🙌 1