Hi team. Does anyone have experience using kedro for ETL pipelines (not ML) incrementally loading unstructured documents? ex. extract, parse and process PDF, Word, etc.
I'm not sure if whole idea of Kedro datasets is made for this use-case as we'll be working with bunch of files that need to be loaded one by one vs relational-like data read from parquet/csv etc. Of course we can do Extraction, then Parsing and after combine data from all files into single dataframe [document name; document text; ...] to be processed in kedro-dataset fashion. But not sure if this sounds like trying to fit a tool into solution it was not designed for.
Any experiences or best practices are much appreciated