I am writing my first Kedro pipeline tests and I am a little Kedro #questions

I am writing my first Kedro pipeline tests and I a...

Alexis Drakopoulos

10/04/2024, 1:46 PM

I am writing my first Kedro pipeline tests and I am a little confused. I am testing a pipeline with two nodes, however the first node outputs a spark object which needs to have copy mode assign as a memory dataset. How can I specify that in python rather than yaml? catalog = DataCatalog( ) caplog.set_level(logging.DEBUG, logger="kedro") successful_run_msg = "Pipeline execution completed successfully." SequentialRunner().run(pipeline, catalog) assert successful_run_msg in caplog.text do I do that using add_feed_dict? how?

datajoely

10/04/2024, 1:56 PM

So you can use Kedro this way, but it's not actually the way we recommend unless you have a specific reason to do so. I would really recommend that you follow the Spaceflights tutorial since it covers the key concepts and abstracts some of this complexity

datajoely

10/04/2024, 1:56 PM

we also have a full training course on YouTube youtube https://www.youtube.com/playlist?list=PL-JJgymPjK5LddZXbIzp9LWurkLGgB-nY

Alexis Drakopoulos

10/04/2024, 1:58 PM

But this is for integration tests for pipelines

datajoely

10/04/2024, 1:58 PM

ah gotcha!

datajoely

10/04/2024, 1:58 PM

that falls into a good reason

Alexis Drakopoulos

10/04/2024, 1:59 PM

This is the recommended way in the Kedro documentation to write pipeline tests: https://docs.kedro.org/en/stable/tutorial/test_a_project.html

datajoely

10/04/2024, 1:59 PM

give me sec

Alexis Drakopoulos

10/04/2024, 1:59 PM

No worries take your time I'd appreciate any help I can get

Merel

10/04/2024, 1:59 PM

You might be able to get some inspiration from the Kedro code base tests!

Merel

10/04/2024, 1:59 PM

Let me see if I can find a good example

Alexis Drakopoulos

10/04/2024, 1:59 PM

Oh that's a great idea actually, I'm having Friday brain!

datajoely

10/04/2024, 1:59 PM

This is a little old - but relevant https://github.com/kedro-org/kedro/discussions/1068

datajoely

10/04/2024, 2:00 PM

I've been desperate to get a

kedro-test

micro-framework off the ground but it's been hard to prioritise

👍 1

Alexis Drakopoulos

10/04/2024, 2:00 PM

If we end up using Kedro we might be interested in doing some OSS contributions with it so could maybe help

💛 1

Merel

10/04/2024, 2:02 PM

Maybe this one helps? https://github.com/kedro-org/kedro/blob/main/tests/pipeline/test_pipeline_integration.py

Alexis Drakopoulos

10/04/2024, 2:03 PM

hmmmm mine is similar but I'm having the issue that I don't know how to specify that the output of 1 pipeline should use copy_mode "assign"

Merel

10/04/2024, 2:16 PM

I guess something like:

Copy code

dataset = MemoryDataset({"data": 42}, copy_mode="assign")
DataCatalog().add_feed_dict({"dataset":dataset})

Alexis Drakopoulos

10/04/2024, 2:21 PM

Yes indeed, it's: catalog = DataCatalog( datasets={ "data_utility": MemoryDataset(copy_mode="assign"), "extract_model_features": MemoryDataset(copy_mode="assign"), }, )

Alexis Drakopoulos

10/04/2024, 2:21 PM

I got it working now thanks!

🥳 1

16 Views

Open in Slack

Previous Next