Stanley
04/10/2024, 2:54 PMgenerate_poem_image_node
) isn’t being passed correctly to another (generate_poem_video_node
) despite configurations seeming correct. Specifically, the image_path
output isn’t accessible from my session object, although other outputs are fine.
When attempting to access image_path
in my session with:
result = session.run("run_generation_pipeline")
...
"image_path": Path(result.get("image_path", ""))
I notice image_path
is missing in the result
dictionary, unlike other paths.
{'image_prompt_path': 'generated_image_prompts/user_/2024-04-10/prompt_image_14-49-49-442418.txt', 'poem_path': 'generated_poetry/user_/2024-04-10/poetry_14-49-47-295838.txt', 'audio_path': 'generated_audio/user_/2024-04-10/audio_14-50-02-478601.mp3', 'video_path': 'generated_videos/user_/2024-04-10/video_14-51-25-185057.mp4'}
Here’s the relevant snippet from pipeline.py
showing the node configurations:
from kedro.pipeline import Pipeline, node, pipeline
...
def create_pipeline(**kwargs) -> Pipeline:
return pipeline(
...
node(func=generate_poem_image, outputs="image_path", ...),
node(func=generate_poem_video, inputs=["image_path", ...], ...)
)
Nok Lam Chan
04/10/2024, 2:58 PMcatalog.yml
?Stanley
04/10/2024, 3:01 PMNok Lam Chan
04/10/2024, 3:05 PMalso there are no datasets in my catalog.yml, I’m making API calls to get each generated fileDoes that mean your nodes are taking
path
directly as input and the node outputs are just dictionary of paths?Nok Lam Chan
04/10/2024, 3:06 PMresult
object didn't have what you expected?Stanley
04/10/2024, 3:09 PMNok Lam Chan
04/10/2024, 3:10 PMkedro catalog create <pipeline>
? This will create some entries in your catalog.yml as MemoryDataset
The reason for this to happen is that this is slightly deviated from what Kedro expect, there is a concept of "free dataset" which mean we throw away dataset that are not longer needed during a pipeline. We only return what is in memory already.
https://github.com/kedro-org/kedro/pull/3475Nok Lam Chan
04/10/2024, 3:11 PMresult
objectStanley
04/10/2024, 3:13 PM{'image_prompt_path': 'generated_image_prompts/user_/2024-04-10/prompt_image_14-49-49-442418.txt', 'poem_path': 'generated_poetry/user_/2024-04-10/poetry_14-49-47-295838.txt', 'audio_path': 'generated_audio/user_/2024-04-10/audio_14-50-02-478601.mp3', 'video_path': 'generated_videos/user_/2024-04-10/video_14-51-25-185057.mp4'}
so it that there’s a limited number of datasets that result can store at any one time…and having hit the limit, it’s dropping image_path?Stanley
04/10/2024, 3:15 PMNok Lam Chan
04/10/2024, 3:18 PMCan you tryhave you tried this already??kedro catalog create <pipeline>
Nok Lam Chan
04/10/2024, 3:18 PMso it that there’s a limited number of datasets that result can store at any one time…and having hit the limit, it’s dropping image_path?We don't do anything special to limit datasets number
Stanley
04/10/2024, 3:22 PMaudio_path:
type: MemoryDataset
image_path:
type: MemoryDataset
image_prompt:
type: MemoryDataset
image_prompt_path:
type: MemoryDataset
poem:
type: MemoryDataset
poem_path:
type: MemoryDataset
video_path:
type: MemoryDataset
Nok Lam Chan
04/10/2024, 3:26 PMresult
?Stanley
04/10/2024, 3:34 PMresult = {
'image_prompt_path': xyz.txt',
'audio_path': 'abc.mp3',
'video_path': 'abc.mp4',
'poem_path': 'def.txt'
]
This is the kedro viz of the pipeline, looking at it…I think the issue is that only node outputs which are not passed along to other nodes are keptNok Lam Chan
04/10/2024, 3:37 PMNok Lam Chan
04/10/2024, 3:39 PMpickle.PickleDataset
and instead of getting them from result
use catalog.load("dataset_name")
instead.
I have a custom implementation for Runner which was designed for debugging purpose. This would work too but you need to use a custom runner.
https://github.com/kedro-org/kedro/issues/1802#issuecomment-1270096651Stanley
04/10/2024, 3:47 PMNok Lam Chan
04/10/2024, 3:49 PMpath
attributes as a dataset attribute instead of a node input.
For reference: APIDataset
, the more Kedro way of doing things will be creating a OpenAPIDataset, but I don't have time to look into the details of the pipeline to see if it's possible.Nok Lam Chan
04/10/2024, 3:50 PMdataset factories
partitionDataset
etcStanley
04/10/2024, 3:53 PMNok Lam Chan
04/10/2024, 3:54 PMkedro-plugins
Stanley
04/10/2024, 3:56 PM