Hey team I think I might be missing something super simple w Kedro #questions

Hey team - I think I might be missing something su...

Leslie Wu

11/25/2024, 10:55 AM

Hey team - I think I might be missing something super simple while setting up a project, and hoping someone with more experience can help 😅 Running into the following error while doing a

kedro run --pipeline=XXX

Copy code

TypeError: __init__() got an unexpected keyword argument 'datasets'

FYI, this is version 0.19.8

Hall

11/25/2024, 10:55 AM

Someone will reply to you shortly. In the meantime, this might help:

Leslie Wu

11/25/2024, 10:56 AM

Error:

Nok Lam Chan

11/25/2024, 11:04 AM

Hi @Leslie Wu, could you describe what's your setup? Do you get the same error if you create a fresh project with

kedro new

Leslie Wu

11/25/2024, 11:05 AM

Hi @Nok Lam Chan - this is not through setting up a new project. I have an existing project and have been trying to configure it with new nodes / pipelines / catalog etc. Running in to this error - and wonder if it might just be the way of I have defined the catalog, the pipelines etc.

Nok Lam Chan

11/25/2024, 11:17 AM

hm, let's see how I can help. Is there any custom class that are implemented in this project? You can check

settings.py

to see if it defined something like

DATA_CATLAOG_CLASS

One thing that I found odd is that from your stacktrace it's complaining

datasets

is not an argument but it clearly is. You can verify that by running this in a IPython shell (please copy and paste what you get)

Copy code

from <http://kedro.io|kedro.io> import DataCatalog
??DataCatalog

Leslie Wu

11/25/2024, 12:57 PM

The class definition from the above command (shortened due to word limits)

Copy code

class DataCatalog:
    """``DataCatalog`` stores instances of ``AbstractDataset`` implementations
    to provide ``load`` and ``save`` capabilities from anywhere in the
    program. To use a ``DataCatalog``, you need to instantiate it with
    a dictionary of data sets. Then it will act as a single point of reference
    for your calls, relaying load and save functions
    to the underlying data sets.
    """

    def __init__(  # noqa: PLR0913
        self,
        datasets: dict[str, AbstractDataset] | None = None,
        feed_dict: dict[str, Any] | None = None,
        dataset_patterns: Patterns | None = None,
        load_versions: dict[str, str] | None = None,
        save_version: str | None = None,
        default_pattern: Patterns | None = None,
    ) -> None:
        """``DataCatalog`` stores instances of ``AbstractDataset``
        implementations to provide ``load`` and ``save`` capabilities from
        anywhere in the program. To use a ``DataCatalog``, you need to
        instantiate it with a dictionary of data sets. Then it will act as a
        single point of reference for your calls, relaying load and save
        functions to the underlying data sets.

        Args:
            datasets: A dictionary of data set names and data set instances.
            feed_dict: A feed dict with data to be added in memory.
            dataset_patterns: A dictionary of data set factory patterns
                and corresponding data set configuration. When fetched from catalog configuration
                these patterns will be sorted by:
                1. Decreasing specificity (number of characters outside the curly brackets)
                2. Decreasing number of placeholders (number of curly bracket pairs)
                3. Alphabetically
                A pattern of specificity 0 is a catch-all pattern and will overwrite the default
                pattern provided through the runners if it comes before "default" in the alphabet.
                Such an overwriting pattern will emit a warning. The `"{default}"` name will
                not emit a warning.
            load_versions: A mapping between data set names and versions
                to load. Has no effect on data sets without enabled versioning.
            save_version: Version string to be used for ``save`` operations
                by all data sets with enabled versioning. It must: a) be a
                case-insensitive string that conforms with operating system
                filename limitations, b) always return the latest version when
                sorted in lexicographical order.
            default_pattern: A dictionary of the default catch-all pattern that overrides the default
                pattern provided through the runners.

        Example:
        ::

            >>> from kedro_datasets.pandas import CSVDataset
            >>>
            >>> cars = CSVDataset(filepath="cars.csv",
            >>>                   load_args=None,
            >>>                   save_args={"index": False})
            >>> catalog = DataCatalog(datasets={'cars': cars})
        """
        self._datasets = dict(datasets or {})
        self.datasets = _FrozenDatasets(self._datasets)
        # Keep a record of all patterns in the catalog.
        # {dataset pattern name : dataset pattern body}
        self._dataset_patterns = dataset_patterns or {}
        self._load_versions = load_versions or {}
        self._save_version = save_version
        self._default_pattern = default_pattern or {}
        self._use_rich_markup = _has_rich_handler()

        if feed_dict:
            self.add_feed_dict(feed_dict)

    def __repr__(self) -> str:
        return self.datasets.__repr__()

    @property
    def _logger(self) -> logging.Logger:
        return logging.getLogger(__name__)

    @classmethod
    def from_config(
        cls,
        catalog: dict[str, dict[str, Any]] | None,
        credentials: dict[str, dict[str, Any]] | None = None,
        load_versions: dict[str, str] | None = None,
        save_version: str | None = None,
    ) -> DataCatalog:
        """Create a ``DataCatalog`` instance from configuration. This is a
        factory method used to provide developers with a way to instantiate
        ``DataCatalog`` with configuration parsed from configuration files.

        Args:
            catalog: A dictionary whose keys are the data set names and
                the values are dictionaries with the constructor arguments
                for classes implementing ``AbstractDataset``. The data set
                class to be loaded is specified with the key ``type`` and their
                fully qualified class name. All ``<http://kedro.io|kedro.io>`` data set can be
                specified by their class name only, i.e. their module name
                can be omitted.
            credentials: A dictionary containing credentials for different
                data sets. Use the ``credentials`` key in a ``AbstractDataset``
                to refer to the appropriate credentials as shown in the example
                below.
            load_versions: A mapping between dataset names and versions
                to load. Has no effect on data sets without enabled versioning.
            save_version: Version string to be used for ``save`` operations
                by all data sets with enabled versioning. It must: a) be a
                case-insensitive string that conforms with operating system
                filename limitations, b) always return the latest version when
                sorted in lexicographical order.

        Returns:
            An instantiated ``DataCatalog`` containing all specified
            data sets, created and ready to use.

Nok Lam Chan

11/25/2024, 1:27 PM

And what's in your settings.py?

Leslie Wu

11/26/2024, 7:16 AM

A bit delayed response here, but the issue has since been resolved! Exactly your point on a custom class being defined for the Data Catalog. Good spot!! Thanks @Nok Lam Chan

👍🏼 1

Nok Lam Chan

11/26/2024, 7:20 AM

Glad it is resolved, curious what does the custom data catalog do?

2 Views

Open in Slack

Previous Next