Hello is there a way to get Kedro to create folders in paths Kedro #questions

Hello, is there a way to get Kedro to create fold...

Nicolas Betancourt Cardona

01/16/2025, 5:08 PM

Hello, is there a way to get Kedro to create folders in paths if they do not exist? For instance if my data structure is

Copy code

data:
  outputs:

And I have the catalog entry

Copy code

data@pandas:
  type: pandas.CSVDataset
  filepath: data/outputs/csv_files/data.csv

It would be nice for Kedro to automatically create

csv_files

inside data and store

data.csv

afterwards. Or if i'm working with a partitioned dataset, and my nodes returns a dictionary with the structure

{'non_existent_subfolder/file_name': data}

for being saved in the catalog entry

Copy code

data@PartitionedDataset:
  type: partitions.PartitionedDataset
  path: data/outputs
  dataset: 
    type: pandas.ExcelDataset
    save_args:
      sheet_name: Sheet1
    load_args:
      sheet_name: Sheet1
  filename_suffix: ".xlsx"

It would be nice for Kedro to create '`non_existen_subfolder`' automatically inside

data/outputs

. I already tryed it and Kedro does not creates folders when they don't exist. Is there a way of changing this default behaviour? Thank you all in advance :)

Hall

01/16/2025, 5:08 PM

Someone will reply to you shortly. In the meantime, this might help:

Nok Lam Chan

01/16/2025, 5:44 PM

It would be nice for Kedro to automatically create
csv_files
inside data and store
data.csv
afterwards.

Pretty sure this is the case for many years, did you get some error?

👍 1

Nicolas Betancourt Cardona

01/16/2025, 7:03 PM

@Nok Lam Chan yes! The error I get is that Kedro could not find the path

data/outputs/non_existen_subfolder

Nicolas Betancourt Cardona

01/16/2025, 9:09 PM

@Nok Lam Chan specifically i get

FileNotFoundError: [Errno 2] No such file or directory:

my node is a generator and the data catalog output is a partitioned dataset:

Copy code

def generator_node(df):
   for idx, row in df.iterows():
      ....
      yield {f'non_existen_subfolder/{file_name}':data}

And my catalog entry goes like this

Copy code

partitioned_data:
  type: partitions.PartitionedDataset
  path: data/outputs
  dataset:
    type: kedro_pamflow.datasets.custom_dataset.CustomDataset

However when I yield

{file_name:dada}

instead of

{f'non_existen_subfolder/{file_name}':data}

it works perfectly

Nicolas Betancourt Cardona

01/16/2025, 10:04 PM

@Nok Lam Chan I already solved it managing it handling it in the

_save

method in my custom class 😄 thx

🙏🏼 1

Open in Slack

Previous Next