Hello everyone Is there a way to load a flat file from S3 ba Kedro #questions

Hello everyone Is there a way to load a flat file...

Vishal Pandey

09/16/2024, 4:29 PM

Hello everyone Is there a way to load a flat file from S3 based on some conditions like pulling the latest file from the mentioned bucket.

Laurens Vijnck

09/16/2024, 4:46 PM

Split it into two problems: • Listing files in the bucket based ◦ Listing metadata can help you filter the one you'd like to load • Load correct file from list

Vishal Pandey

09/16/2024, 4:50 PM

But how is that done, the below example expects the exact name of the file

motorbikes:

type: pandas.CSVDataset

filepath: <s3://your_bucket/data/02_intermediate/company/motorbikes.csv>

credentials: dev_s3

load_args:

sep: ','

skiprows: 5

skipfooter: 1

na_values: ['#NA', NA]

Vishal Pandey

09/16/2024, 4:50 PM

I just know the name of the bucket, we need to fetch the files based on some conditions right ?

Laurens Vijnck

09/16/2024, 4:55 PM

in that case you'll have to implement a custom dataset

Vishal Pandey

09/16/2024, 4:55 PM

I see

Laurens Vijnck

09/16/2024, 4:55 PM

you can expand the behaviour of the PandasDataset

Laurens Vijnck

09/16/2024, 4:55 PM

and override the

load

method

Vishal Pandey

09/16/2024, 4:55 PM

understood

Laurens Vijnck

09/16/2024, 4:55 PM

https://github.com/kedro-org/kedro-plugins/pull/810/files

Laurens Vijnck

09/16/2024, 4:55 PM

this is an example that adds additional functionality for sheets

Vishal Pandey

09/16/2024, 4:56 PM

how challenging it is going to be?

Laurens Vijnck

09/16/2024, 4:56 PM

but same manner you can add a dataset, have filtering args in constructor and use those args in the load method

👍 1

Vishal Pandey

09/16/2024, 4:56 PM

cannot open the above link

Laurens Vijnck

09/16/2024, 4:58 PM

my bad, updated link

Laurens Vijnck

09/16/2024, 4:58 PM

not hard, datasets are just classes with load and save methods really

👍 1

Vishal Pandey

09/16/2024, 5:13 PM

I just found that PartionedDatasets , provides a way of iterating over each file present in a bucket/folder https://docs.kedro.org/en/stable/data/partitioned_and_incremental_datasets.html#partitioned-dataset-load @Laurens Vijnck

Laurens Vijnck

09/16/2024, 5:19 PM

ah yes, that one exists , though then you'll be implementing the conditions in the node

👍 1

Laurens Vijnck

09/16/2024, 5:20 PM

I usually prefer to have the dataset internals handling the selection

👍 1

Laurens Vijnck

09/16/2024, 5:20 PM

you might even be able to extend the ParitionedDataset and overload the load method there to call super and thereafter do the filtering

👍 1

5 Views

Open in Slack

Previous Next