Hi I have recently updated the kedro version But now I am ge Kedro #questions

Hi, I have recently updated the kedro version. But...

Guest (Inger)

02/01/2024, 1:08 PM

Hi, I have recently updated the kedro version. But now I am getting some problems with the logging. Is there a way to set the default logging configs in the settings in addition to an environment variable? Due to platform constraints we can not set environment variables outside of the run. And whenever I try to set the environment variable through a hook, kedro has already loaded the standard logging config. The problem our platform cannot handle rich logging, and as such causes problems.

Nok Lam Chan

02/01/2024, 1:10 PM

Hey! Did you upgrade to 0.19? You need to set it with an environment variables

KEDRO_LOGIGNG_CONFIG

. This migration guide would help you for the upgrade: https://github.com/kedro-org/kedro/blob/main/RELEASE.md#migration-guide-from-kedro-018-to-019

Nok Lam Chan

02/01/2024, 1:11 PM

Sorry I didn't read the full message, what platform is this?

Nok Lam Chan

02/01/2024, 1:15 PM

You are correct, hook cannot override logging because it's a bit too late (Logging need to be initialised at the beginning of the process). @Sajid Alam can you help this user?

👍 1

Guest (Inger)

02/01/2024, 1:15 PM

Yes I did, the problem is that I cannot set environment variables. I am running the pipeline as a packaged job in databricks and am not allowed to set environment variables pre-run. I even wrote a wrapper as such that triggers the pipeline, because we cannot trigger it from the command line

Copy code

import logging
from copy import deepcopy
from pathlib import Path
from typing import Any, Optional, Union

from kedro.framework.project import LOGGING, configure_project
from kedro.framework.session import KedroSession

logger = logging.getLogger(__name__)
KedroOutput = dict[str, Any]

class PipelineRunner:
    """Create and run Kedro pipelines."""

    def __init__(
        self,
        env: Union[EnvType, Path],
        pipeline_type: str,
        conf_source: Path = settings.CONF_SOURCE,
        runtime_params: Optional[dict] = None,
    ):
        """Class to start the kedro pipeline.

        Parameters
        ----------
        env: Union[str, Path]
            Specify the environment to run the pipeline in.
        conf_source: Path
            Specify the path to the configuration file
        pipeline_type: str
            Name of the pipeline type. This is the pipeline name minus auto
            or manual naming.
        runtime_params: Optional[dict]
            Pass extra parameters to the kedro pipeline
        """
        self.env = env if isinstance(env, str) else env.as_posix()
        self.conf_source = conf_source.as_posix()
        self.pipeline_base_name = pipeline_type

        self.runtime_params = deepcopy(runtime_params) or {}
        self._has_run = False
        self.kedro_output: Optional[KedroOutput] = None


    def run_pipeline(self, pipeline_name: str) -> None:
        """Run the kedro pipeline.

        The run_pipeline function is a wrapper around Kedro's `run` method.
        It allows you to run the pipeline from within Python, e.g.:

        Parameters
        ----------
        pipeline_name: str
            The name of the pipeline to be run.

        Raises
        ------
        RuntimeError
            Raises a runtime error if any errors were logged during the
            pipeline. This allows the data to still be sent, but for us to
            be aware that an error occurred.

        Returns
        -------
        None
        """
        if self._has_run:
            raise RuntimeError("Create a new pipeline for each run.")
        self._has_run = True
        package_name = Path(__file__).parent.name
        configure_project(package_name)
        with KedroSession.create(
            env=self.env,
            conf_source=self.conf_source,
            extra_params=self.runtime_params,
        ) as session:
            self.kedro_output = session.run(pipeline_name=pipeline_name)
        error_logs = get_root_error_logs(LOGGING_ERROR_LIST_HANDLER)
        if error_logs is not None:
            raise RuntimeError(
                f"Pipeline encountered errors during the run: {error_logs}"
            )
        return None

👀 1

Sajid Alam

02/01/2024, 1:20 PM

Hi Inger, you can try change the handler to stop using rich in the

logging.yml

file in the

conf/base

Sajid Alam

02/01/2024, 1:27 PM

Oh sorry looks like you can't use the file without the env variable, you might have to manually load it through your code.

Nok Lam Chan

02/01/2024, 1:29 PM

It may not be ideal, could you do

import os

and

os.environ["KEDRO_LOGGING_CONFIG"]

= <your_path> instead? before you start loading any Kedro things

Sajid Alam

02/01/2024, 1:39 PM

You could also try after replacing

rich

with

console

in the

logging.yml

load the yml file in your code and use

logging.config.dictConfig

https://docs.python.org/3/library/logging.config.html#logging.config.dictConfig

Guest (Inger)

02/01/2024, 1:41 PM

Looked at this. But seems a bit finicky. Because you are dependent on the order of which things are run. Because if the init file of the project is run before the os environment is set it does not work. So then import order becomes of importance

👍🏼 1

Guest (Inger)

02/01/2024, 1:42 PM

Will take a loot at the logging.config.dictConfig method

Nok Lam Chan

02/01/2024, 1:42 PM

You don't have to call

logging.config.dictConfig

, instead you can call LOGGING.configure(<dictionary_of_config>)

👍 1

Nok Lam Chan

02/01/2024, 1:43 PM

Kedro configure the logging for you, but doesn't stop you to re-configure the config if you need to

Guest (Inger)

02/01/2024, 1:44 PM

But does this remove the RichHandler if it has already been configured by kedro. Right now i have to use this but it does not seem ideal

Copy code

def set_up_runtime_logger() -> None:
    """Set up the runtime logger.

    Need to remove Kedro's rich logging as DB is not happy with it.
    """
    root_logger = logging.getLogger()
    for handler in root_logger.handlers.copy():
        if isinstance(handler, RichHandler):
            root_logger.removeHandler(handler)
    basicConfig(level=INFO, handlers=[HANDLER])

Nok Lam Chan

02/01/2024, 1:47 PM

As long as your dictionary didn't include

rich

, it should be fine.

Juan Luis

02/02/2024, 9:20 AM

But seems a bit finicky. Because you are dependent on the order of which things are run. Because if the init file of the project is run before the os environment is set it does not work.

ugh, that's painful. help me understand, is this because of how stdlib

logging

works? that it somehow cannot be changed after Kedro has configured it? would be happy to continue this conversation on a GitHub issue.

Guest (Inger)

02/06/2024, 9:51 AM

Hi got it to work LOGGING.configure() 😄. I have to place it on several places to make sure that it is always being run before any logging is printed. So it might still be nice to also have a settings parameter in the future 🙂

👍 1

👍🏼 1

Nok Lam Chan

02/06/2024, 11:49 AM

ugh, that's painful. help me understand, is this because of how stdlib
logging
works? that it somehow cannot be changed after Kedro has configured it?

would be happy to continue this conversation on a GitHub issue.

@Juan Luis that's not the issue here I believe,

KEDRO_LOGGING_CONFIG

is set before anything run, and it get configured at soon as

kedro.framework.project

get imported. I don't find using

LOGGING.configure

to reconfigure logging settings wrong. Having

LOGGING.configure

before Kedro run should be sufficient. @Guest (Inger), could you share where did you place it and why it need to be placed serval times?

Guest (Inger)

02/12/2024, 8:47 AM

Hi, so in theory you can only place it in one time. But it has to be all the way at the beginning of the pipeline, because kedro logging is loaded automatically. So to be safe I have to import it all the way at the beginning of our entry point, when the job is running automatically. But I also have to place it in the class that sets up the kedro pipeline, in case someone imports the class an runs it manually. Also wanted to place it in a hook to be safe, but that did not work as kedro session also provides a log before any hook is called, and this caused databricks to only show that single rich formatted log after the pipeline was run, but did not show any of the consecutive logs.

19 Views

Open in Slack

Previous Next