hi all, Actually, I am very new to Kedro and have ...
# questions
m
hi all, Actually, I am very new to Kedro and have a few small doubts. If someone can help me out, I would really appreciate it. 1. I am working on a project that requires input as a GeoTiff file, which contains all the image data related to the project and is a single file. Now I am really confused about how can I start my project to give this file as input to my first node? 2. How can I make Kedro work for multiple nodes and processes when I have the hyperspectral image dataset (where the size of a single dataset file is in GBs)? 3. Is there any project or tutorial from where I can learn more about Kedro and where it specifically works on a project where the input is an image dataset it will be really helpful for me to understand it. 4. Is it always mandatory to give the first node input in CSV file and if I am not working on CSV then how can I give my dataset as input ? 5. How does Kedro work if every node in my case, after every function or node, generates an output file that will generate and cause a storage memory issue? How will I solve this issue?
d
Hi Mohammed, Thank you for your question. In Kedro, it's possible to utilize various datasets for input. You can find more information in the documentation: https://docs.kedro.org/en/stable/data/data_catalog.html#the-basics-of-catalog-yml If you are dealing with image data of a huge size, it's essential to carefully consider how and where you want to store and process them. Depending on your specific requirements, you may find it beneficial to use tools like Spark or Polars for handling such large datasets.
j
on top of what @Dmitry Sorokin said, we are working on a Geotiff dataset https://github.com/kedro-org/kedro-plugins/pull/355 have a look @Mohammed Fazal and if you end up trying it, let us know how it went
m
Thanks for the feedback, I'll have a look into that