https://kedro.org/ logo
#questions
Title
# questions
s

Sergei Benkovich

01/11/2023, 7:14 AM
having issues with credentials, i get following error: “‘str’ object is not a mapping. DataSet ‘observations’ must only contain arguments valid for the constructor of ‘kedro.extras.datasets.pandas.csv_dataset.CSVDataSet’.” when i try to use
Copy code
observations:
  type: pandas.CSVDataSet
  filepath: "${s3.raw_observations_path}/observations.csv"
  credentials: dev_s3
d

datajoely

01/11/2023, 9:00 AM
so this looks right, but I’m wondering if there is any weird whitespace issue in your YAML If you paste your catalog here does it look right as a json/python dict? https://onlineyamltools.com/convert-yaml-to-json
s

Sergei Benkovich

01/11/2023, 9:01 AM
yep
Copy code
{
  "observations": {
    "type": "pandas.CSVDataSet",
    "filepath": "${s3.raw_observations_path}/observations.csv",
    "credentials": "dev_s3"
  }
}
when i use local paths everything works fine
Copy code
observations:
  type: pandas.CSVDataSet
  filepath: "${folders.raw}/observations.csv"
d

datajoely

01/11/2023, 10:21 AM
so this surprising - all we’re doing is using
importlib.import_module
and importing which is equivalent to
from kedro.extras.datastets.pandas.csv_dataset import CSVDataSet
and putting the
filepath
and
credentials
arguments into that constructor
ah i saw your message
if there any whitespace in your
s3.raw_observations_path
string
perhaps that’s got a weird newline or space in there which is breaking the yaml
s

Sergei Benkovich

01/11/2023, 10:29 AM
tried formatting many times but can’t find any issues… this is my main function, creating the dataset, maybe i’m missing something here? i don’t inject credentials anywhere
Copy code
runner = SequentialRunner()

    project_path = Path(__file__).parent.parent.parent
    conf_path = f'{project_path}/{settings.CONF_SOURCE}'
    conf_loader = CONFIG_LOADER_CLASS(conf_source=conf_path, env="local", globals_pattern='globals*')

    parameters = conf_loader['parameters']
    credentials = conf_loader['credentials']
    catalog = conf_loader['catalog']

    data_catalog = DATA_CATALOG_CLASS(data_sets={
        'observations': CSVDataSet.from_config('observations',
                                               catalog['observations']
                                               ),
        'processed_observations': CSVDataSet.from_config('processed_observations',
                                                         catalog['processed_observations']
                                                         ),
        'train_set': CSVDataSet.from_config('train_set',
                                            catalog['train_set']
                                            ),
        'test_set': CSVDataSet.from_config('test_set',
                                           catalog['test_set']
                                           ),
        'model': PickleDataSet.from_config('model',
                                           catalog['model']
                                           ),
        'inference_results': PickleDataSet.from_config('inference_results',
                                                       catalog['inference_results']
                                                       ),
    },
        feed_dict={'params': parameters})
    result = runner.run(data_extraction.create_pipeline(), data_catalog)
    return result
d

datajoely

01/11/2023, 12:48 PM
do any of those CSV constructors work or is it just one of them?
s

Sergei Benkovich

01/11/2023, 12:50 PM
all of them work when i’m not using s3 but only local paths.
d

datajoely

01/11/2023, 12:51 PM
and
CONFIG_LOADER_CLASS
is TemplatedConfigLoader right?
s

Sergei Benkovich

01/11/2023, 12:52 PM
yes this is my settings.py
Copy code
CONF_SOURCE = "conf"

# Class that manages how configuration is loaded.
from kedro.config import TemplatedConfigLoader
CONFIG_LOADER_CLASS = TemplatedConfigLoader
CONFIG_LOADER_ARGS = {
    "globals_pattern": "*globals.yml",
}

# Class that manages the Data Catalog.
from <http://kedro.io|kedro.io> import DataCatalog
DATA_CATALOG_CLASS = DataCatalog
d

datajoely

01/11/2023, 12:54 PM
so I’m struggling to work this out
when you posted your
from_config
stuff above is that how you’re doing it normally
are you not using the Kedro template to read the YAML normally?
s

Sergei Benkovich

01/11/2023, 12:56 PM
i’m usually using the from config, but maybe(and probably) i’m missing something…what is the template you are referring to and where can i find examples of its usage?
d

datajoely

01/11/2023, 12:56 PM
if you run
kedro new
we generate the desired repo structure
This tutorial is the best introduction the the key concepts https://kedro.readthedocs.io/en/stable/tutorial/spaceflights_tutorial.html
s

Sergei Benkovich

01/11/2023, 1:11 PM
yeah i have been through that, but i guess i’m still missing something. i’m not running using the cli, but defining the runner and data_catalog. i didn’t see any other way to read the yamls into the datacatalog not as i did it. but i guess there is a cleaner way to load the catalog.yml file.
5 Views