Hello, I'm evaluating Kedro for my company, it is ...
# questions
g
Hello, I'm evaluating Kedro for my company, it is currently one of the closest to what we need. But I have a question about something very common in our workflow and i'm not sure how we would implement it in Kedro. Some of our pipelines start with someting like this - Download a dataset (between 20 and 100GB) - Create a local index in a temporary folder of the data (with lucene for example) using bash command - Use the index to extract a dataset using bash command - Remove the temporary local index - Use the dataset in subsequent steps (after that step kedro seems to handle our needs) It is similar to that kind of thing in some way https://docs.dagster.io/tutorial/assets/non-argument-deps To summarize - Doing operations outside the graph by using local filesystem - Another thing, instead of loading the data in memory and let Kedro serialize it to store it on S3 for example, being able to give it a local path where data is stored, and let kedro pick the local path to upload it to S3 Thanks!