When a kedro project is run in the command line or...
# questions
m
When a kedro project is run in the command line or IDE it gives the logs as:
***   * /home/ec2-user/airflow_ml/logs/dag_id=gem-models/run_id=manual__2023-09-28T07:51:39.714757+00:00/task_id=model-rre-value-de/attempt=1.log
2023-09-28 07:52:47,132 - matplotlib.font_manager - INFO - generated new fontManager
`2023-09-28 075747,503 - kedro.io.data_catalog - INFO - Saving data to
df_preprocessed
(PickleDataSet)...`
2023-09-28 07:58:41,157 - common_methods.feature_engineering.process_categorical_features - WARNING - Missing value [''] in ordinal dict for feature energy_class
`2023-09-28 080719,535 - kedro.io.data_catalog - INFO - Saving data to
master_table
(PickleDataSet)...`
2023-09-28 08:07:28,766 - root - INFO - Starting hyperparameter tuning with {'feature_fraction': (0.7, 0.8), 'bagging_fraction': (0.7, 0.8), 'min_data_in_leaf': (10, 30), 'lambda_l1': (0, 5), 'lambda_l2': (0, 5), 'min_child_weight': (0.0001, 0.99), 'num_leaves': (10, 300), 'max_depth': (3, 20)} and {'num_iterations': 1000, 'early_stopping_rounds': 5, 'bagging_freq': 1, 'learning_rate': 0.05, 'verbosity': -1, 'objective': 'regression'}
`2023-09-28 090900,691 - kedro.io.data_catalog - INFO - Saving data to
parameters_final
(YAMLDataSet)...`
2023-09-28 09:09:46,000 - root - INFO - Customer report created forRREvaluationwith model version :20230928-075244
[2023-09-28, 07:51:41 UTC] {taskinstance.py:1157} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: gem-models.model-rre-value-de manual__2023-09-28T07:51:39.714757+00:00 [queued]>
[2023-09-28, 07:51:41 UTC] {taskinstance.py:1157} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: gem-models.model-rre-value-de manual__2023-09-28T07:51:39.714757+00:00 [queued]>
[2023-09-28, 07:51:41 UTC] {taskinstance.py:1359} INFO - Starting attempt 1 of 2
[2023-09-28, 07:51:41 UTC] {taskinstance.py:1380} INFO - Executing <Task(DockerOperator): model-rre-value-de> on 2023-09-28 07:51:39.714757+00:00
[2023-09-28, 07:51:41 UTC] {standard_task_runner.py:57} INFO - Started process 12206 to run task
[2023-09-28, 07:51:41 UTC] {standard_task_runner.py:84} INFO - Running: ['airflow', 'tasks', 'run', 'gem-models', 'model-rre-value-de', 'manual__2023-09-28T07:51:39.714757+00:00', '--job-id', '11', '--raw', '--subdir', 'DAGS_FOLDER/gem_models_dag.py', '--cfg-path', '/tmp/tmpageoucml']
[2023-09-28, 07:51:41 UTC] {standard_task_runner.py:85} INFO - Job 11: Subtask model-rre-value-de
[2023-09-28, 09:09:47 UTC] {local_task_job_runner.py:228} INFO - Task exited with return code 0
When the project is run in airflow using the KedroOperator() then it is giving only the below lines in the logs. It is not showing the logs that come from kedro.io, kedro.runner, root in the airflow UI
*** Found local files:
***   * /home/ec2-user/airflow_ml/logs/dag_id=rre-value-de/run_id=manual__2023-10-11T07:51:39.197845+00:00/task_id=create-report/attempt=11.log
[2023-10-11, 10:52:26 UTC] {taskinstance.py:1157} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: rre-value-de.create-report manual__2023-10-11T07:51:39.197845+00:00 [queued]>
[2023-10-11, 10:52:26 UTC] {taskinstance.py:1157} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: rre-value-de.create-report manual__2023-10-11T07:51:39.197845+00:00 [queued]>
[2023-10-11, 10:52:26 UTC] {taskinstance.py:1359} INFO - Starting attempt 11 of 12
[2023-10-11, 10:52:26 UTC] {taskinstance.py:1380} INFO - Executing <Task(KedroOperator): create-report> on 2023-10-11 07:51:39.197845+00:00
[2023-10-11, 10:52:26 UTC] {standard_task_runner.py:57} INFO - Started process 22175 to run task
[2023-10-11, 10:52:26 UTC] {standard_task_runner.py:84} INFO - Running: ['airflow', 'tasks', 'run', 'rre-value-de', 'create-report', 'manual__2023-10-11T07:51:39.197845+00:00', '--job-id', '195', '--raw', '--subdir', 'DAGS_FOLDER/rre_value_de_dag.py', '--cfg-path', '/tmp/tmpetucf_v2']
[2023-10-11, 10:52:26 UTC] {standard_task_runner.py:85} INFO - Job 195: Subtask create-report
[2023-10-11, 10:52:26 UTC] {task_command.py:415} INFO - Running <TaskInstance: rre-value-de.create-report manual__2023-10-11T07:51:39.197845+00:00 [running]> on host ip-172-31-14-127.eu-west-1.compute.internal
[2023-10-11, 10:52:26 UTC] {taskinstance.py:1660} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='rre-value-de' AIRFLOW_CTX_TASK_ID='create-report' AIRFLOW_CTX_EXECUTION_DATE='2023-10-11T07:51:39.197845+00:00' AIRFLOW_CTX_TRY_NUMBER='11' AIRFLOW_CTX_DAG_RUN_ID='manual__2023-10-11T07:51:39.197845+00:00'
`[2023-10-11, 105227 UTC] {store.py:32} INFO -
read()
not implemented for
BaseSessionStore
. Assuming empty store.`
[2023-10-11, 10:52:40 UTC] {local_task_job_runner.py:228} INFO - Task exited with return code 0
The KedroOperator is generated from the kedro-airflow v0.5.1 which looks like below:
class KedroOperator(BaseOperator):
@apply_defaults
def __init__(
self,
package_name: str,
pipeline_name: str,
node_name: str,
project_path: str,
env: str,
*args, **kwargs
) -> None:
super().__init__(*args, **kwargs)
self.package_name = package_name
self.pipeline_name = pipeline_name
self.node_name = node_name
self.project_path = project_path
self.env = env
def execute(self, context):
configure_project(self.package_name)
with KedroSession.create(self.package_name,
self.project_path,
env=self.env) as session:
session.run(self.pipeline_name, node_names=[self.node_name])
It appears like the tasks executed by the KedroOperator() are not generating or capturing the logs. Any help to capture the logs from KedroOperator() tasks to the airflow UI? #kedro-airflow
Hi, Any fix on this? Does the `execute`method of class
KedroOperator
lack the functionality to emit the logs? Using kedro, version 0.18.1, and
kedro airflow create
command to create the DAG.
j
hi @meharji arumilli, there have been some changes in the logging - are you able to upgrade to Kedro 0.18.14?
d
@meharji arumilli hi, also what is your kedro logging configuration? https://docs.kedro.org/en/stable/logging/logging.html#
m
@Dmitry Sorokin here is my logging,
Copy code
version: 1
disable_existing_loggers: True
formatters:
    simple:
        format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
    json_formatter:
        format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
        class: pythonjsonlogger.jsonlogger.JsonFormatter

handlers:
    console:
        class: logging.StreamHandler
        level: INFO
        formatter: simple
        stream: <ext://sys.stdout>

    info_file_handler:
        class: logging.handlers.RotatingFileHandler
        level: INFO
        formatter: simple
        filename: logs/info.log
        maxBytes: 10485760 # 10MB
        backupCount: 20
        encoding: utf8
        delay: True

    error_file_handler:
        class: logging.handlers.RotatingFileHandler
        level: ERROR
        formatter: simple
        filename: logs/errors.log
        maxBytes: 10485760 # 10MB
        backupCount: 20
        encoding: utf8
        delay: True

loggers:
    anyconfig:
        level: WARNING
        handlers: [console, info_file_handler, error_file_handler]
        propagate: no

    <http://kedro.io|kedro.io>:
        level: INFO
        handlers: [console, info_file_handler, error_file_handler]
        propagate: no

    kedro.pipeline:
        level: INFO
        handlers: [console, info_file_handler, error_file_handler]
        propagate: no

    kedro.journal:
        level: INFO
        propagate: no

root:
    level: INFO
    handlers: [console, info_file_handler, error_file_handler]
@Juan Luis yes, gave a try with 0.18.14, still the same behaviour, KedroOperator() is not capturing the logs that come from kedro.io, kedro.runner, root in the airflow UI.
To add more information, when the kedro project is dockerized and run as Docker image in airflow using DockerOperator() it fetches the complete logs from base.py and docker.py file sources as shown in the screenshot.
When it is packaged as .whl and run using KedroOperator(), it does not give the kedro task/project specific logs and emits the logs only from the file sources shown in the screen shot. It certainly sounds like KedroOperator() is not configured enough to capture the logs. As it does not show base.py in the file sources for the logs. Any help from the #kedro engineering or developer community will be highly appreciated as it is crucial to have logs seen in the airflow UI inorder to use #kedro-airflow or #kedro package plugins to its full capability.
Does adding kedroOperator() to airflow providers could have a solution to this logging issue? https://github.com/kedro-org/kedro/issues/1717 Has it been in the pipeline to be included? @Nok Lam Chan @datajoely @Merel
@Dmitry Sorokin does the logging.yml look fine?
n
Can you try to use the default logging that Kedro generate for you? It looks like you removed the project logging thus you can’t see it when it’s packaged as a wheel.
👍 1
And I see
kedro.journal
which signals that your logging file is quite old
Can you try a new kedro project with airflow to see if logging shows up?
Re: https://github.com/kedro-org/kedro/issues/1717 There isn’t any progress for this ticket yet, it’s not related to the problem as it only help discoverability to put this KedroOperator into
airflow
codebase, it doesn’t change any functionalities.
m
@Nok Lam Chan I did created a new logging file with the kedro new command which looks like:
Copy code
version: 1
disable_existing_loggers: False
formatters:
    simple:
        format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
    json_formatter:
        format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
        class: pythonjsonlogger.jsonlogger.JsonFormatter

handlers:
    console:
        class: logging.StreamHandler
        level: INFO
        formatter: simple
        stream: <ext://sys.stdout>

    info_file_handler:
        class: logging.handlers.RotatingFileHandler
        level: INFO
        formatter: simple
        filename: logs/info.log
        maxBytes: 10485760 # 10MB
        backupCount: 20
        encoding: utf8
        delay: True

    error_file_handler:
        class: logging.handlers.RotatingFileHandler
        level: ERROR
        formatter: simple
        filename: logs/errors.log
        maxBytes: 10485760 # 10MB
        backupCount: 20
        encoding: utf8
        delay: True

loggers:
    anyconfig:
        level: WARNING
        handlers: [console, info_file_handler, error_file_handler]
        propagate: no

    kedro.io:
        level: INFO
        handlers: [console, info_file_handler, error_file_handler]
        propagate: no

    kedro.pipeline:
        level: INFO
        handlers: [console, info_file_handler, error_file_handler]
        propagate: no

root:
    level: INFO
    handlers: [console, info_file_handler, error_file_handler]
And then
kedro package
is run to package the project. However, the logging.yml is in the conf/ so it is not included in the packaged wheel file. The conf/ is copied to the instance running the DAG, and its path is specified to the conf_source as
def execute(self, context):
36        <http://self.log.info|self.log.info>("Starting execution of KedroOperator")
37        configure_project(self.package_name)
38        with KedroSession.create(self.package_name,
39                                 self.project_path,
40                                 env=self.env, extra_params=self.extra_params,
41                                 conf_source="/home/ec2-user/airflow_ml/conf/models") as session:
42            session.run(self.pipeline_name, node_names=[self.node_name])
43        <http://self.log.info|self.log.info>("Execution of KedroOperator completed")
Now the log gives the error
*** Found local files:
***   * /home/ec2-user/airflow_ml/logs/dag_id=rre-value-de/run_id=manual__2023-10-27T08:21:46.405139+00:00/task_id=define-project-parameters/attempt=2.log
[2023-10-27, 08:26:57 UTC] {taskinstance.py:1157} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: rre-value-de.define-project-parameters manual__2023-10-27T08:21:46.405139+00:00 [queued]>
[2023-10-27, 08:26:57 UTC] {taskinstance.py:1157} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: rre-value-de.define-project-parameters manual__2023-10-27T08:21:46.405139+00:00 [queued]>
[2023-10-27, 08:26:57 UTC] {taskinstance.py:1359} INFO - Starting attempt 2 of 2
[2023-10-27, 08:26:57 UTC] {taskinstance.py:1380} INFO - Executing <Task(KedroOperator): define-project-parameters> on 2023-10-27 08:21:46.405139+00:00
[2023-10-27, 08:26:57 UTC] {standard_task_runner.py:57} INFO - Started process 12975 to run task
[2023-10-27, 08:26:57 UTC] {standard_task_runner.py:84} INFO - Running: ['airflow', 'tasks', 'run', 'rre-value-de', 'define-project-parameters', 'manual__2023-10-27T08:21:46.405139+00:00', '--job-id', '259', '--raw', '--subdir', 'DAGS_FOLDER/rre_value_de_dag.py', '--cfg-path', '/tmp/tmp8w8z_1aw']
[2023-10-27, 08:26:57 UTC] {standard_task_runner.py:85} INFO - Job 259: Subtask define-project-parameters
[2023-10-27, 08:26:57 UTC] {task_command.py:415} INFO - Running <TaskInstance: rre-value-de.define-project-parameters manual__2023-10-27T08:21:46.405139+00:00 [running]> on host ip-172-31-14-127.eu-west-1.compute.internal
[2023-10-27, 08:26:58 UTC] {taskinstance.py:1660} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='rre-value-de' AIRFLOW_CTX_TASK_ID='define-project-parameters' AIRFLOW_CTX_EXECUTION_DATE='2023-10-27T08:21:46.405139+00:00' AIRFLOW_CTX_TRY_NUMBER='2' AIRFLOW_CTX_DAG_RUN_ID='manual__2023-10-27T08:21:46.405139+00:00'
[2023-10-27, 08:26:58 UTC] {rre_value_de_dag.py:36} INFO - Starting execution of KedroOperator
[2023-10-27, 08:26:58 UTC] {logging_mixin.py:151} INFO - rre-value
[2023-10-27, 08:26:58 UTC] {logging_mixin.py:151} INFO - Model version 20231027-112147
[2023-10-27, 08:26:58 UTC] {logging_mixin.py:151} INFO - Project rre-value
[2023-10-27, 08:26:58 UTC] {logging_mixin.py:151} INFO - rre-value
[2023-10-27, 08:26:58 UTC] {local_task_job_runner.py:228} INFO - Task exited with return code -6
I wonder what does the error code -6 mean here.
@Nok Lam Chan could you kindly give comments to try anything else, if we are in same page
n
@meharji arumilli Which version of Kedro are you using?
m
kedro, version 0.18.14
n
The logging file you show is quite old.
I wonder what does the error code -6 mean here.
I have no idea, this seem to be something come from Airflow instead of Kedro
Now the log gives the error
*** Found local files:
It’s all INFO logging, can you point me to the error?