Hi, I have recently updated the kedro version. But...
# questions
g
Hi, I have recently updated the kedro version. But now I am getting some problems with the logging. Is there a way to set the default logging configs in the settings in addition to an environment variable? Due to platform constraints we can not set environment variables outside of the run. And whenever I try to set the environment variable through a hook, kedro has already loaded the standard logging config. The problem our platform cannot handle rich logging, and as such causes problems.
n
Hey! Did you upgrade to 0.19? You need to set it with an environment variables
KEDRO_LOGIGNG_CONFIG
. This migration guide would help you for the upgrade: https://github.com/kedro-org/kedro/blob/main/RELEASE.md#migration-guide-from-kedro-018-to-019
Sorry I didn't read the full message, what platform is this?
You are correct, hook cannot override logging because it's a bit too late (Logging need to be initialised at the beginning of the process). @Sajid Alam can you help this user?
👍 1
g
Yes I did, the problem is that I cannot set environment variables. I am running the pipeline as a packaged job in databricks and am not allowed to set environment variables pre-run. I even wrote a wrapper as such that triggers the pipeline, because we cannot trigger it from the command line
Copy code
import logging
from copy import deepcopy
from pathlib import Path
from typing import Any, Optional, Union

from kedro.framework.project import LOGGING, configure_project
from kedro.framework.session import KedroSession

logger = logging.getLogger(__name__)
KedroOutput = dict[str, Any]

class PipelineRunner:
    """Create and run Kedro pipelines."""

    def __init__(
        self,
        env: Union[EnvType, Path],
        pipeline_type: str,
        conf_source: Path = settings.CONF_SOURCE,
        runtime_params: Optional[dict] = None,
    ):
        """Class to start the kedro pipeline.

        Parameters
        ----------
        env: Union[str, Path]
            Specify the environment to run the pipeline in.
        conf_source: Path
            Specify the path to the configuration file
        pipeline_type: str
            Name of the pipeline type. This is the pipeline name minus auto
            or manual naming.
        runtime_params: Optional[dict]
            Pass extra parameters to the kedro pipeline
        """
        self.env = env if isinstance(env, str) else env.as_posix()
        self.conf_source = conf_source.as_posix()
        self.pipeline_base_name = pipeline_type

        self.runtime_params = deepcopy(runtime_params) or {}
        self._has_run = False
        self.kedro_output: Optional[KedroOutput] = None


    def run_pipeline(self, pipeline_name: str) -> None:
        """Run the kedro pipeline.

        The run_pipeline function is a wrapper around Kedro's `run` method.
        It allows you to run the pipeline from within Python, e.g.:

        Parameters
        ----------
        pipeline_name: str
            The name of the pipeline to be run.

        Raises
        ------
        RuntimeError
            Raises a runtime error if any errors were logged during the
            pipeline. This allows the data to still be sent, but for us to
            be aware that an error occurred.

        Returns
        -------
        None
        """
        if self._has_run:
            raise RuntimeError("Create a new pipeline for each run.")
        self._has_run = True
        package_name = Path(__file__).parent.name
        configure_project(package_name)
        with KedroSession.create(
            env=self.env,
            conf_source=self.conf_source,
            extra_params=self.runtime_params,
        ) as session:
            self.kedro_output = session.run(pipeline_name=pipeline_name)
        error_logs = get_root_error_logs(LOGGING_ERROR_LIST_HANDLER)
        if error_logs is not None:
            raise RuntimeError(
                f"Pipeline encountered errors during the run: {error_logs}"
            )
        return None
👀 1
s
Hi Inger, you can try change the handler to stop using rich in the
logging.yml
file in the
conf/base
.
Oh sorry looks like you can't use the file without the env variable, you might have to manually load it through your code.
n
It may not be ideal, could you do
import os
and
os.environ["KEDRO_LOGGING_CONFIG"]
= <your_path> instead? before you start loading any Kedro things
s
You could also try after replacing
rich
with
console
in the
logging.yml
load the yml file in your code and use
logging.config.dictConfig
https://docs.python.org/3/library/logging.config.html#logging.config.dictConfig
g
Looked at this. But seems a bit finicky. Because you are dependent on the order of which things are run. Because if the init file of the project is run before the os environment is set it does not work. So then import order becomes of importance
👍🏼 1
Will take a loot at the logging.config.dictConfig method
n
You don't have to call
logging.config.dictConfig
, instead you can call LOGGING.configure(<dictionary_of_config>)
👍 1
Kedro configure the logging for you, but doesn't stop you to re-configure the config if you need to
g
But does this remove the RichHandler if it has already been configured by kedro. Right now i have to use this but it does not seem ideal
Copy code
def set_up_runtime_logger() -> None:
    """Set up the runtime logger.

    Need to remove Kedro's rich logging as DB is not happy with it.
    """
    root_logger = logging.getLogger()
    for handler in root_logger.handlers.copy():
        if isinstance(handler, RichHandler):
            root_logger.removeHandler(handler)
    basicConfig(level=INFO, handlers=[HANDLER])
n
As long as your dictionary didn't include
rich
, it should be fine.
j
But seems a bit finicky. Because you are dependent on the order of which things are run. Because if the init file of the project is run before the os environment is set it does not work.
ugh, that's painful. help me understand, is this because of how stdlib
logging
works? that it somehow cannot be changed after Kedro has configured it? would be happy to continue this conversation on a GitHub issue.
g
Hi got it to work LOGGING.configure() 😄. I have to place it on several places to make sure that it is always being run before any logging is printed. So it might still be nice to also have a settings parameter in the future 🙂
👍 1
👍🏼 1
n
ugh, that's painful. help me understand, is this because of how stdlib
logging
works? that it somehow cannot be changed after Kedro has configured it?
would be happy to continue this conversation on a GitHub issue.
@Juan Luis that's not the issue here I believe,
KEDRO_LOGGING_CONFIG
is set before anything run, and it get configured at soon as
kedro.framework.project
get imported. I don't find using
LOGGING.configure
to reconfigure logging settings wrong. Having
LOGGING.configure
before Kedro run should be sufficient. @Guest (Inger), could you share where did you place it and why it need to be placed serval times?
g
Hi, so in theory you can only place it in one time. But it has to be all the way at the beginning of the pipeline, because kedro logging is loaded automatically. So to be safe I have to import it all the way at the beginning of our entry point, when the job is running automatically. But I also have to place it in the class that sets up the kedro pipeline, in case someone imports the class an runs it manually. Also wanted to place it in a hook to be safe, but that did not work as kedro session also provides a log before any hook is called, and this caused databricks to only show that single rich formatted log after the pipeline was run, but did not show any of the consecutive logs.