Sergei Benkovich
01/11/2023, 7:14 AMobservations:
type: pandas.CSVDataSet
filepath: "${s3.raw_observations_path}/observations.csv"
credentials: dev_s3
datajoely
01/11/2023, 9:00 AMSergei Benkovich
01/11/2023, 9:01 AM{
"observations": {
"type": "pandas.CSVDataSet",
"filepath": "${s3.raw_observations_path}/observations.csv",
"credentials": "dev_s3"
}
}
when i use local paths everything works fine
observations:
type: pandas.CSVDataSet
filepath: "${folders.raw}/observations.csv"
datajoely
01/11/2023, 10:21 AMimportlib.import_module
and importing which is equivalent to from kedro.extras.datastets.pandas.csv_dataset import CSVDataSet
and putting the filepath
and credentials
arguments into that constructordatajoely
01/11/2023, 10:22 AMdatajoely
01/11/2023, 10:22 AMs3.raw_observations_path
stringdatajoely
01/11/2023, 10:22 AMSergei Benkovich
01/11/2023, 10:29 AMrunner = SequentialRunner()
project_path = Path(__file__).parent.parent.parent
conf_path = f'{project_path}/{settings.CONF_SOURCE}'
conf_loader = CONFIG_LOADER_CLASS(conf_source=conf_path, env="local", globals_pattern='globals*')
parameters = conf_loader['parameters']
credentials = conf_loader['credentials']
catalog = conf_loader['catalog']
data_catalog = DATA_CATALOG_CLASS(data_sets={
'observations': CSVDataSet.from_config('observations',
catalog['observations']
),
'processed_observations': CSVDataSet.from_config('processed_observations',
catalog['processed_observations']
),
'train_set': CSVDataSet.from_config('train_set',
catalog['train_set']
),
'test_set': CSVDataSet.from_config('test_set',
catalog['test_set']
),
'model': PickleDataSet.from_config('model',
catalog['model']
),
'inference_results': PickleDataSet.from_config('inference_results',
catalog['inference_results']
),
},
feed_dict={'params': parameters})
result = runner.run(data_extraction.create_pipeline(), data_catalog)
return result
datajoely
01/11/2023, 12:48 PMSergei Benkovich
01/11/2023, 12:50 PMdatajoely
01/11/2023, 12:51 PMCONFIG_LOADER_CLASS
is TemplatedConfigLoader right?Sergei Benkovich
01/11/2023, 12:52 PMCONF_SOURCE = "conf"
# Class that manages how configuration is loaded.
from kedro.config import TemplatedConfigLoader
CONFIG_LOADER_CLASS = TemplatedConfigLoader
CONFIG_LOADER_ARGS = {
"globals_pattern": "*globals.yml",
}
# Class that manages the Data Catalog.
from <http://kedro.io|kedro.io> import DataCatalog
DATA_CATALOG_CLASS = DataCatalog
datajoely
01/11/2023, 12:54 PMdatajoely
01/11/2023, 12:54 PMfrom_config
stuff above is that how you’re doing it normallydatajoely
01/11/2023, 12:54 PMSergei Benkovich
01/11/2023, 12:56 PMdatajoely
01/11/2023, 12:56 PMkedro new
we generate the desired repo structuredatajoely
01/11/2023, 12:57 PMSergei Benkovich
01/11/2023, 1:11 PM