Galen Seilis
08/18/2023, 2:06 PMwget --random-wait -r -l 2 -p -np -nH -e robots=off -P ../../../data/external/ -U mozilla <https://ftp.maps.canada.ca/pub/nrcan_rncan/vector/geobase_nrn_rrn/>
There are a couple of aspects that I am hoping can be improved about this approach.
The first is that it requires wget which is no problem for my Linux system but I think requires additional config to get it available in a the namespace on windows.
The second aspect is I'm not sure how to migrate something like this to Kedro. I see that there are a variety of classes available to put in the data catalog, which is great. But I am not sure if any of them can provide something functionally equivalent to this call. I see there is:
kedro_datasets.api.APIDataSet
but I don't know if it supports what I need. If not I am guessing I could learn how the class structure works for kedro data sets and develop my own data source.
Any suggestions for this noob?Deepyaman Datta
08/18/2023, 2:11 PMJuan Luis
08/18/2023, 2:20 PMKaggleDataset
https://github.com/astrojuanlu/kedro-kaggle-dataset/tree/kaggle-fs
APIDataSet
will probably not offer the degree of flexibility that you need.
I'm adding this example to our long issue https://github.com/kedro-org/kedro/issues/1936, feel free to chime in thereGalen Seilis
08/18/2023, 2:30 PMdatajoely
08/18/2023, 2:44 PMIncrememntalDataSet
may be helpful here too