This is kind of specific but I am writing some E2E tests for Kedro #questions

This is kind of specific, but I am writing some E2...

Iñigo Hidalgo

11/16/2023, 10:50 AM

This is kind of specific, but I am writing some E2E tests for some pipelines and I'd like to check things in intermediate datasets after a pipeline has run. I could write these datasets to a file but I am trying to do everything in the code API, so I am trying to use MemoryDataset to access these datasets. I am adding a Memorydataset called "dummy_model__train.aligned_labels" to my test_catalog. When I run the whole pipeline that dataset isn't accessible from the catalog at the end. This is because of this block of code which releases any datasets not in pipeline.inputs() or pipeline.outputs(). Any workarounds you can think of?

Iñigo Hidalgo

11/16/2023, 10:51 AM

I guess I could monkeypatch Memorydataset's release method?

➕ 1

Iñigo Hidalgo

11/16/2023, 11:24 AM

Copy code

@pytest.fixture
def mock_release_memorydataset(monkeypatch):
    def mock_release(self):
        pass

    monkeypatch.setattr(MemoryDataSet, "_release", mock_release)


def test_pipeline(mock_release_memorydataset):
    assert ...

Nok Lam Chan

11/16/2023, 2:23 PM

Do you want to keep all intermediate dataset?

Nok Lam Chan

11/16/2023, 2:24 PM

If you are adding the dataset instead, maybe you can create a custom Persist MemoryDataSet that has a noop release method?

Iñigo Hidalgo

11/16/2023, 2:27 PM

technically I only wanted to keep one specific dataset, but it's a relatively small pipeline, and anyways the catalog is discarded immediately after, so im happy to just mock it the way i did. Though I guess subclassing would be as simple as just specifying the _release method and adding that dataset to the catalog explicitly

👍🏼 1

18 Views

Open in Slack

Previous Next