Hey! I have a question about the best practice whe...
# questions
s
Hey! I have a question about the best practice when dealing with several splitting methods of the same dataset. I was thinking about doing a structure as follows:
Copy code
- 05_model_input (folder)
--- master_table_1 (folder)
------ master_table_1.csv (file)
------ split_1 (folder)
--------- X_train.csv
--------- X_test.csv
--------- y_train.csv
--------- y_test.csv
------ split_2 (folder)
--------- X_train.csv
--------- X_test.csv
--------- y_train.csv
--------- y_test.csv
Would you say this is good practice? or would you advice not saving the splits and parametrise the split method selected in the parameters.yml for instance? Thanks a lot for your help!
b
This seems fine to me -- I would be more interested in how you are generating these splits in code. Are you using modular pipelines?