Hello! I am trying to use dataset factories with c...
# questions
g
Hello! I am trying to use dataset factories with credentials:
Copy code
"{client_name}.my_partition":
  type: partitions.PartitionedDataset
  path: <s3://xxxx>-{client_name}/
  dataset:
  credentials: "dev_s3_{client_name}"
Where I define credentials for each client in credentials.yml. However, I get:
Copy code
KeyError: "Unable to find credentials 'dev_s3_{client_name}': check your data catalog and credentials configuration. See 
<https://kedro.readthedocs.io/en/stable/kedro.io.DataCatalog.html> for an example."
Is it possible to also point to credentials like this with the factory pattern? Thanks 🙂
h
Someone will reply to you shortly. In the meantime, this might help:
d
so credentials don't work like this - I'm lobbying for it, but it's actually a very old feature where it can only retrieve an exact string match
I have this custom omegaconfresolver in my project
Copy code
from __future__ import annotations

from dataclasses import dataclass
from pathlib import Path
from typing import Any

import yaml
from omegaconf import OmegaConf, errors


@dataclass
class Credential(str):
    """
    A dataclass-based Credential that acts like a normal string
    for JSON serialization, but returns a masked representation
    in repr().
    """

    _value: Any = None

    def __new__(cls, value: Any):
        """
        We override __new__ so that the final object is truly a string
        (so json.dumps() treats it as a string). If value is None,
        we'll store "None" as the internal string to match test expectations.
        """
        str_value = "None" if value is None else str(value)
        obj = super().__new__(cls, str_value)
        obj._value = value
        return obj

    def __repr__(self) -> str:
        """
        Return a masked version of the underlying string.
        E.g. 12345 => "*****", None => "****" (masking "None").
        """
        raw_str = "None" if self._value is None else str(self._value)
        return "*" * len(raw_str)

    def __eq__(self, other: object) -> bool:
        """
        Equality is based on the actual Python value
        rather than the masked representation.
        """
        if isinstance(other, Credential):
            return self._value == other._value
        return self._value == other  # pragma: no cover

    def __hash__(self) -> int:
        """Hash based on the internal value."""
        return hash(self._value)

    @property
    def value(self) -> Any:
        """
        Property to access the underlying value.
        This maintains encapsulation while providing access to the internal value.
        """
        return self._value


class KedroCredentialResolver:
    """
    A resolver class for fetching credentials from the Kedro project's
    'conf/local/credentials.yml' file dynamically, regardless of the
    current working directory.
    """

    def __init__(self):
        self.project_root = self.find_project_root()

    def find_project_root(self) -> Path:
        """
        Traverse upwards from the current working directory to find the Kedro project root.
        The project root is identified by the presence of a 'conf/' directory.

        Returns:
            Path: The path to the Kedro project root.

        Raises:
            FileNotFoundError: If the project root containing 'conf/' is not found.
        """
        current_path = Path.cwd()
        for parent in [current_path, *current_path.parents]:
            conf_path = parent / "conf"
            if conf_path.is_dir():
                return parent
        raise FileNotFoundError(
            "Could not find the Kedro project root containing the 'conf/' directory."
        )

    def get_credentials(self, top_level_key: str, key: str) -> Credential:
        """
        Retrieve a specific credential from 'conf/local/credentials.yml'.

        Args:
            top_level_key (str): The top-level key in the credentials YAML (e.g., 'database').
            key (str): The specific credential key to retrieve (e.g., 'password').

        Returns:
            Credential: The retrieved credential value, masked for security.

        Raises:
            FileNotFoundError: If the credentials file does not exist.
            KeyError: If the specified keys are not found in the credentials file.
        """
        credentials_path = self.project_root / "conf" / "local" / "credentials.yml"

        # 1. If the file doesn't exist, raise FileNotFoundError
        if not credentials_path.exists():
            raise FileNotFoundError(
                f"Credentials file not found at: {credentials_path}"
            )

        try:
            # 2. Attempt to load the YAML
            credential_data = OmegaConf.load(credentials_path)
        except yaml.parser.ParserError as e:
            # If there's an error parsing the YAML (syntax, etc.)
            raise KeyError(f"Error parsing credentials file: {e}") from e

        # 3. Attempt to retrieve the specified keys
        try:
            credential_value = credential_data[top_level_key][key]
            return Credential(credential_value)
        except errors.ConfigKeyError as e:
            # This is the typical OmegaConf error for a missing key
            raise KeyError(
                f"Credential key '{key}' not found under '{top_level_key}'."
            ) from e
g
Thanks @datajoely. I see, would be nice to have that in kedro indeed! I'll use your resolver, thanks a lot for sharing it!
d
yeah - one bug I haven't fixed in it - if the credentials.yml file doesn't exist it will fail
👍 1