I am writing my first Kedro pipeline tests and I a...
# questions
a
I am writing my first Kedro pipeline tests and I am a little confused. I am testing a pipeline with two nodes, however the first node outputs a spark object which needs to have copy mode assign as a memory dataset. How can I specify that in python rather than yaml? catalog = DataCatalog( ) caplog.set_level(logging.DEBUG, logger="kedro") successful_run_msg = "Pipeline execution completed successfully." SequentialRunner().run(pipeline, catalog) assert successful_run_msg in caplog.text do I do that using add_feed_dict? how?
d
So you can use Kedro this way, but it's not actually the way we recommend unless you have a specific reason to do so. I would really recommend that you follow the Spaceflights tutorial since it covers the key concepts and abstracts some of this complexity
we also have a full training course on YouTube youtube https://www.youtube.com/playlist?list=PL-JJgymPjK5LddZXbIzp9LWurkLGgB-nY
a
But this is for integration tests for pipelines
d
ah gotcha!
that falls into a good reason
a
This is the recommended way in the Kedro documentation to write pipeline tests: https://docs.kedro.org/en/stable/tutorial/test_a_project.html
d
give me sec
a
No worries take your time I'd appreciate any help I can get
m
You might be able to get some inspiration from the Kedro code base tests!
Let me see if I can find a good example
a
Oh that's a great idea actually, I'm having Friday brain!
d
This is a little old - but relevant https://github.com/kedro-org/kedro/discussions/1068
I've been desperate to get a
kedro-test
micro-framework off the ground but it's been hard to prioritise
👍 1
a
If we end up using Kedro we might be interested in doing some OSS contributions with it so could maybe help
💛 1
a
hmmmm mine is similar but I'm having the issue that I don't know how to specify that the output of 1 pipeline should use copy_mode "assign"
m
I guess something like:
Copy code
dataset = MemoryDataset({"data": 42}, copy_mode="assign")
DataCatalog().add_feed_dict({"dataset":dataset})
a
Yes indeed, it's: catalog = DataCatalog( datasets={ "data_utility": MemoryDataset(copy_mode="assign"), "extract_model_features": MemoryDataset(copy_mode="assign"), }, )
I got it working now thanks!
🥳 1