having issues with credentials i get following error str obj Kedro #questions

having issues with credentials, i get following er...

Sergei Benkovich

01/11/2023, 7:14 AM

having issues with credentials, i get following error: “‘str’ object is not a mapping. DataSet ‘observations’ must only contain arguments valid for the constructor of ‘kedro.extras.datasets.pandas.csv_dataset.CSVDataSet’.” when i try to use

Copy code

observations:
  type: pandas.CSVDataSet
  filepath: "${s3.raw_observations_path}/observations.csv"
  credentials: dev_s3

datajoely

01/11/2023, 9:00 AM

so this looks right, but I’m wondering if there is any weird whitespace issue in your YAML If you paste your catalog here does it look right as a json/python dict? https://onlineyamltools.com/convert-yaml-to-json

Sergei Benkovich

01/11/2023, 9:01 AM

yep

Copy code

{
  "observations": {
    "type": "pandas.CSVDataSet",
    "filepath": "${s3.raw_observations_path}/observations.csv",
    "credentials": "dev_s3"
  }
}

when i use local paths everything works fine

Copy code

observations:
  type: pandas.CSVDataSet
  filepath: "${folders.raw}/observations.csv"

datajoely

01/11/2023, 10:21 AM

so this surprising - all we’re doing is using

importlib.import_module

and importing which is equivalent to

from kedro.extras.datastets.pandas.csv_dataset import CSVDataSet

and putting the

filepath

and

credentials

arguments into that constructor

datajoely

01/11/2023, 10:22 AM

ah i saw your message

datajoely

01/11/2023, 10:22 AM

if there any whitespace in your

s3.raw_observations_path

string

datajoely

01/11/2023, 10:22 AM

perhaps that’s got a weird newline or space in there which is breaking the yaml

Sergei Benkovich

01/11/2023, 10:29 AM

tried formatting many times but can’t find any issues… this is my main function, creating the dataset, maybe i’m missing something here? i don’t inject credentials anywhere

Copy code

runner = SequentialRunner()

    project_path = Path(__file__).parent.parent.parent
    conf_path = f'{project_path}/{settings.CONF_SOURCE}'
    conf_loader = CONFIG_LOADER_CLASS(conf_source=conf_path, env="local", globals_pattern='globals*')

    parameters = conf_loader['parameters']
    credentials = conf_loader['credentials']
    catalog = conf_loader['catalog']

    data_catalog = DATA_CATALOG_CLASS(data_sets={
        'observations': CSVDataSet.from_config('observations',
                                               catalog['observations']
                                               ),
        'processed_observations': CSVDataSet.from_config('processed_observations',
                                                         catalog['processed_observations']
                                                         ),
        'train_set': CSVDataSet.from_config('train_set',
                                            catalog['train_set']
                                            ),
        'test_set': CSVDataSet.from_config('test_set',
                                           catalog['test_set']
                                           ),
        'model': PickleDataSet.from_config('model',
                                           catalog['model']
                                           ),
        'inference_results': PickleDataSet.from_config('inference_results',
                                                       catalog['inference_results']
                                                       ),
    },
        feed_dict={'params': parameters})
    result = runner.run(data_extraction.create_pipeline(), data_catalog)
    return result

datajoely

01/11/2023, 12:48 PM

do any of those CSV constructors work or is it just one of them?

Sergei Benkovich

01/11/2023, 12:50 PM

all of them work when i’m not using s3 but only local paths.

datajoely

01/11/2023, 12:51 PM

and

CONFIG_LOADER_CLASS

is TemplatedConfigLoader right?

Sergei Benkovich

01/11/2023, 12:52 PM

yes this is my settings.py

Copy code

CONF_SOURCE = "conf"

# Class that manages how configuration is loaded.
from kedro.config import TemplatedConfigLoader
CONFIG_LOADER_CLASS = TemplatedConfigLoader
CONFIG_LOADER_ARGS = {
    "globals_pattern": "*globals.yml",
}

# Class that manages the Data Catalog.
from <http://kedro.io|kedro.io> import DataCatalog
DATA_CATALOG_CLASS = DataCatalog

datajoely

01/11/2023, 12:54 PM

so I’m struggling to work this out

datajoely

01/11/2023, 12:54 PM

when you posted your

from_config

stuff above is that how you’re doing it normally

datajoely

01/11/2023, 12:54 PM

are you not using the Kedro template to read the YAML normally?

Sergei Benkovich

01/11/2023, 12:56 PM

i’m usually using the from config, but maybe(and probably) i’m missing something…what is the template you are referring to and where can i find examples of its usage?

datajoely

01/11/2023, 12:56 PM

if you run

kedro new

we generate the desired repo structure

datajoely

01/11/2023, 12:57 PM

This tutorial is the best introduction the the key concepts https://kedro.readthedocs.io/en/stable/tutorial/spaceflights_tutorial.html

Sergei Benkovich

01/11/2023, 1:11 PM

yeah i have been through that, but i guess i’m still missing something. i’m not running using the cli, but defining the runner and data_catalog. i didn’t see any other way to read the yamls into the datacatalog not as i did it. but i guess there is a cleaner way to load the catalog.yml file.

35 Views

Open in Slack

Previous Next