Hello Kedro Team, I am using SQLTAbleDataSet to s...
# questions
c
Hello Kedro Team, I am using SQLTAbleDataSet to save data to my DB, but in my pipeline if I am use the same variable to send it to next node, Then it loads the data again from DB instead of using it from MemoryDataSet. catalog.yml:
"{NAMESPACE}.ballbyball_final":
type: pandas.SQLTableDataSet
table_name: MODEL_{NAMESPACE}_BALLBYBALL_V2
credentials: db2
save_args:
if_exists: replace
chunksize: 10000
pipelines.py:
node(
func=total_balls_done,
inputs=["ballbyball_final_1","params:MIN_TOTAL_BALLS_MATCH"],
outputs="ballbyball_final",
---> Data is saved to DB over here
name="total_balls_done",tags="ballbyball_preprocessing"
),
node(
func=lambda x: x,
inputs="ballbyball_final",
----> Data is loaded from SQLTableDataSet instead of MemoryDataSet
outputs="cache_ballbyball_final",
name="cache_ballbyball_final",tags="ballbyball_preprocessing"
),
This is how it looks when pipeline is running:
a
Try a different name for the MemoryDataset -> itโ€™s still getting matched to the dataset factory pattern
MemoryDataset is only used for datasets that are not mentioned in the catalog
c
I am creating a cache, but to create the cache also I need to pass the variable as input, I can not change the input variable name
a
c
No I was not,
"{NAMESPACE}.cache_ballbyball_final":
type: CachedDataset
dataset:
type: pandas.SQLTableDataSet
table_name: MODEL_{NAMESPACE}_BALLBYBALL_V2
credentials: db2
save_args:
if_exists: replace
chunksize: 10000
but now I am {NAMESPACE} is not getting resolved insidle table_name
a
What version of Kedro are you using?
This was a bug that we fixed in
0.18.14
.
c
Wow, works now, Thank you @Ankita Katiyar ๐Ÿ˜„
๐Ÿ™Œ 1