Hello everyone, Our team is relatively new to Ked...
# questions
m
Hello everyone, Our team is relatively new to Kedro, and we aim to use it primarily for data processing, particularly we are working with large images. Each pipeline is designed to processes a single image. However, when transitioning to production, we would like to apply these pipelines to a whole set of images. Our current approach involves parametrizing the filepath in the data catalog and passing it as a CLI argument. This is then executed by a wrapper script which applies the same pipeline to a whole folder by simply calling kedro with different arguments. Unfortunately, this method is very inefficient and seems suboptimal. We are considering an alternative of wrapping pipelines within another that loads the filenames in a directory and calls the main pipeline with the given filename as an argument. However, this approach appears to lack scalability and may be confusing. Ideally, we envision creating some kind of "collection" dataset that would dynamically generate pipelines with given image at runtime. This should also allow execution on either a single image or a whole set by simply changing the data type in data catalog. While this seems promising, we aren’t sure if Kedro supports such implementation. Any suggestions on how to properly handle this scenario in a scalable and reproducible manner would be appreciated.
t
You can look at the PartitionedDataset
K 1
n
Our current approach involves parametrizing the filepath in the data catalog and passing it as a CLI argument. This is then executed by a wrapper script which applies the same pipeline to a whole folder by simply calling kedro with different arguments. Unfortunately, this method is very inefficient and seems suboptimal.
Is that a second-order pipeline? That is you use a kedro pipeline to generate another kedro pipeline? I agree with @Takieddine Kadiri
PartitionedDataset
and additionally dataset factory may help. docs.kedro.org
m
Thanks for your suggestions, but I think I wasn't quite clear about our scenario, so I drew a little sketch. So, we have the
Tile Pipeline
that we want to run either independently or in the
Image Pipeline
(not quite sure how to do this properly). The second part is being able to execute the "Image Pipeline" for set of images. Currently, we achieve this by parameterizing the data catalog entry and executing Kedro with the file name as CLI parameter. However, this is very inefficient. We have multiple pipelines like this so we are seeking for a general solution. I don't see how
PartitionedDataset
could help us. We are likely dealing with second-order pipelines, but I can't find any information about that in the docs.
Group 11.png
n
Are you currently doing this in 2 steps? Step1: Run the pipeline upto
create N tiles
Step2: Run individual “tile pipeline” and concat.
Currently, we achieve this by parameterizing the data catalog entry and executing Kedro with the file name as CLI parameter. However, this is very inefficient. We have multiple pipelines like this so we are seeking for a general solution.
Would you be able to give some example how do you do this exactly?
In my mind, Kedro are dealing mostly with static pipeline. I think it’s inevitably requires 2 steps because N is not determined until the node get executed. Cc @marrrcin to see if you have any idea? Since I know you have experience dealing with image processing pipeline!
t
You can compose explicitely you pipelines execution order, and solve you second order (or dynamic) pipeline by using kedro boot Here is an example of using kedro boot for a similar problem (monte carlo simulation) https://github.com/takikadiri/kedro-boot-examples/tree/main#bonus-example--monte-carlo-simulation You can declare your image pipeline and tile pipeline as an AppPipeline, then use them in a kedro boot app, where you can define explicitely your orchestration logic.
🔥 1
l
Here's another approach, but it relies on knowing the number of sub pipelines in advance: https://getindata.com/blog/kedro-dynamic-pipelines/ Courtesy of @marrrcin
👍 1