meharji arumilli
10/11/2023, 8:44 AM*** * /home/ec2-user/airflow_ml/logs/dag_id=gem-models/run_id=manual__2023-09-28T07:51:39.714757+00:00/task_id=model-rre-value-de/attempt=1.log
2023-09-28 07:52:47,132 - matplotlib.font_manager - INFO - generated new fontManager
`2023-09-28 075747,503 - kedro.io.data_catalog - INFO - Saving data to df_preprocessed
(PickleDataSet)...`
2023-09-28 07:58:41,157 - common_methods.feature_engineering.process_categorical_features - WARNING - Missing value [''] in ordinal dict for feature energy_class
`2023-09-28 080719,535 - kedro.io.data_catalog - INFO - Saving data to master_table
(PickleDataSet)...`
2023-09-28 08:07:28,766 - root - INFO - Starting hyperparameter tuning with {'feature_fraction': (0.7, 0.8), 'bagging_fraction': (0.7, 0.8), 'min_data_in_leaf': (10, 30), 'lambda_l1': (0, 5), 'lambda_l2': (0, 5), 'min_child_weight': (0.0001, 0.99), 'num_leaves': (10, 300), 'max_depth': (3, 20)} and {'num_iterations': 1000, 'early_stopping_rounds': 5, 'bagging_freq': 1, 'learning_rate': 0.05, 'verbosity': -1, 'objective': 'regression'}
`2023-09-28 090900,691 - kedro.io.data_catalog - INFO - Saving data to parameters_final
(YAMLDataSet)...`
2023-09-28 09:09:46,000 - root - INFO - Customer report created forRREvaluationwith model version :20230928-075244
[2023-09-28, 07:51:41 UTC] {taskinstance.py:1157} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: gem-models.model-rre-value-de manual__2023-09-28T07:51:39.714757+00:00 [queued]>
[2023-09-28, 07:51:41 UTC] {taskinstance.py:1157} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: gem-models.model-rre-value-de manual__2023-09-28T07:51:39.714757+00:00 [queued]>
[2023-09-28, 07:51:41 UTC] {taskinstance.py:1359} INFO - Starting attempt 1 of 2
[2023-09-28, 07:51:41 UTC] {taskinstance.py:1380} INFO - Executing <Task(DockerOperator): model-rre-value-de> on 2023-09-28 07:51:39.714757+00:00
[2023-09-28, 07:51:41 UTC] {standard_task_runner.py:57} INFO - Started process 12206 to run task
[2023-09-28, 07:51:41 UTC] {standard_task_runner.py:84} INFO - Running: ['airflow', 'tasks', 'run', 'gem-models', 'model-rre-value-de', 'manual__2023-09-28T07:51:39.714757+00:00', '--job-id', '11', '--raw', '--subdir', 'DAGS_FOLDER/gem_models_dag.py', '--cfg-path', '/tmp/tmpageoucml']
[2023-09-28, 07:51:41 UTC] {standard_task_runner.py:85} INFO - Job 11: Subtask model-rre-value-de
[2023-09-28, 09:09:47 UTC] {local_task_job_runner.py:228} INFO - Task exited with return code 0
When the project is run in airflow using the KedroOperator() then it is giving only the below lines in the logs. It is not showing the logs that come from kedro.io, kedro.runner, root in the airflow UI
*** Found local files:
*** * /home/ec2-user/airflow_ml/logs/dag_id=rre-value-de/run_id=manual__2023-10-11T07:51:39.197845+00:00/task_id=create-report/attempt=11.log
[2023-10-11, 10:52:26 UTC] {taskinstance.py:1157} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: rre-value-de.create-report manual__2023-10-11T07:51:39.197845+00:00 [queued]>
[2023-10-11, 10:52:26 UTC] {taskinstance.py:1157} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: rre-value-de.create-report manual__2023-10-11T07:51:39.197845+00:00 [queued]>
[2023-10-11, 10:52:26 UTC] {taskinstance.py:1359} INFO - Starting attempt 11 of 12
[2023-10-11, 10:52:26 UTC] {taskinstance.py:1380} INFO - Executing <Task(KedroOperator): create-report> on 2023-10-11 07:51:39.197845+00:00
[2023-10-11, 10:52:26 UTC] {standard_task_runner.py:57} INFO - Started process 22175 to run task
[2023-10-11, 10:52:26 UTC] {standard_task_runner.py:84} INFO - Running: ['airflow', 'tasks', 'run', 'rre-value-de', 'create-report', 'manual__2023-10-11T07:51:39.197845+00:00', '--job-id', '195', '--raw', '--subdir', 'DAGS_FOLDER/rre_value_de_dag.py', '--cfg-path', '/tmp/tmpetucf_v2']
[2023-10-11, 10:52:26 UTC] {standard_task_runner.py:85} INFO - Job 195: Subtask create-report
[2023-10-11, 10:52:26 UTC] {task_command.py:415} INFO - Running <TaskInstance: rre-value-de.create-report manual__2023-10-11T07:51:39.197845+00:00 [running]> on host ip-172-31-14-127.eu-west-1.compute.internal
[2023-10-11, 10:52:26 UTC] {taskinstance.py:1660} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='rre-value-de' AIRFLOW_CTX_TASK_ID='create-report' AIRFLOW_CTX_EXECUTION_DATE='2023-10-11T07:51:39.197845+00:00' AIRFLOW_CTX_TRY_NUMBER='11' AIRFLOW_CTX_DAG_RUN_ID='manual__2023-10-11T07:51:39.197845+00:00'
`[2023-10-11, 105227 UTC] {store.py:32} INFO - read()
not implemented for BaseSessionStore
. Assuming empty store.`
[2023-10-11, 10:52:40 UTC] {local_task_job_runner.py:228} INFO - Task exited with return code 0
The KedroOperator is generated from the kedro-airflow v0.5.1 which looks like below:
class KedroOperator(BaseOperator):
@apply_defaults
def __init__(
self,
package_name: str,
pipeline_name: str,
node_name: str,
project_path: str,
env: str,
*args, **kwargs
) -> None:
super().__init__(*args, **kwargs)
self.package_name = package_name
self.pipeline_name = pipeline_name
self.node_name = node_name
self.project_path = project_path
self.env = env
def execute(self, context):
configure_project(self.package_name)
with KedroSession.create(self.package_name,
self.project_path,
env=self.env) as session:
session.run(self.pipeline_name, node_names=[self.node_name])
It appears like the tasks executed by the KedroOperator() are not generating or capturing the logs. Any help to capture the logs from KedroOperator() tasks to the airflow UI? #kedro-airflowKedroOperator
lack the functionality to emit the logs? Using kedro, version 0.18.1, and kedro airflow create
command to create the DAG.Juan Luis
10/23/2023, 9:43 AMDmitry Sorokin
10/23/2023, 9:48 AMmeharji arumilli
10/25/2023, 4:46 AMversion: 1
disable_existing_loggers: True
formatters:
simple:
format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
json_formatter:
format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
class: pythonjsonlogger.jsonlogger.JsonFormatter
handlers:
console:
class: logging.StreamHandler
level: INFO
formatter: simple
stream: <ext://sys.stdout>
info_file_handler:
class: logging.handlers.RotatingFileHandler
level: INFO
formatter: simple
filename: logs/info.log
maxBytes: 10485760 # 10MB
backupCount: 20
encoding: utf8
delay: True
error_file_handler:
class: logging.handlers.RotatingFileHandler
level: ERROR
formatter: simple
filename: logs/errors.log
maxBytes: 10485760 # 10MB
backupCount: 20
encoding: utf8
delay: True
loggers:
anyconfig:
level: WARNING
handlers: [console, info_file_handler, error_file_handler]
propagate: no
<http://kedro.io|kedro.io>:
level: INFO
handlers: [console, info_file_handler, error_file_handler]
propagate: no
kedro.pipeline:
level: INFO
handlers: [console, info_file_handler, error_file_handler]
propagate: no
kedro.journal:
level: INFO
propagate: no
root:
level: INFO
handlers: [console, info_file_handler, error_file_handler]
Nok Lam Chan
10/25/2023, 12:12 PMkedro.journal
which signals that your logging file is quite oldairflow
codebase, it doesn’t change any functionalities.meharji arumilli
10/27/2023, 9:06 AMversion: 1
disable_existing_loggers: False
formatters:
simple:
format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
json_formatter:
format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
class: pythonjsonlogger.jsonlogger.JsonFormatter
handlers:
console:
class: logging.StreamHandler
level: INFO
formatter: simple
stream: <ext://sys.stdout>
info_file_handler:
class: logging.handlers.RotatingFileHandler
level: INFO
formatter: simple
filename: logs/info.log
maxBytes: 10485760 # 10MB
backupCount: 20
encoding: utf8
delay: True
error_file_handler:
class: logging.handlers.RotatingFileHandler
level: ERROR
formatter: simple
filename: logs/errors.log
maxBytes: 10485760 # 10MB
backupCount: 20
encoding: utf8
delay: True
loggers:
anyconfig:
level: WARNING
handlers: [console, info_file_handler, error_file_handler]
propagate: no
kedro.io:
level: INFO
handlers: [console, info_file_handler, error_file_handler]
propagate: no
kedro.pipeline:
level: INFO
handlers: [console, info_file_handler, error_file_handler]
propagate: no
root:
level: INFO
handlers: [console, info_file_handler, error_file_handler]
And then kedro package
is run to package the project. However, the logging.yml is in the conf/ so it is not included in the packaged wheel file. The conf/ is copied to the instance running the DAG, and its path is specified to the conf_source as
def execute(self, context):
36 <http://self.log.info|self.log.info>("Starting execution of KedroOperator")
37 configure_project(self.package_name)
38 with KedroSession.create(self.package_name,
39 self.project_path,
40 env=self.env, extra_params=self.extra_params,
41 conf_source="/home/ec2-user/airflow_ml/conf/models") as session:
42 session.run(self.pipeline_name, node_names=[self.node_name])
43 <http://self.log.info|self.log.info>("Execution of KedroOperator completed")
Now the log gives the error *** Found local files:
*** * /home/ec2-user/airflow_ml/logs/dag_id=rre-value-de/run_id=manual__2023-10-27T08:21:46.405139+00:00/task_id=define-project-parameters/attempt=2.log
[2023-10-27, 08:26:57 UTC] {taskinstance.py:1157} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: rre-value-de.define-project-parameters manual__2023-10-27T08:21:46.405139+00:00 [queued]>
[2023-10-27, 08:26:57 UTC] {taskinstance.py:1157} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: rre-value-de.define-project-parameters manual__2023-10-27T08:21:46.405139+00:00 [queued]>
[2023-10-27, 08:26:57 UTC] {taskinstance.py:1359} INFO - Starting attempt 2 of 2
[2023-10-27, 08:26:57 UTC] {taskinstance.py:1380} INFO - Executing <Task(KedroOperator): define-project-parameters> on 2023-10-27 08:21:46.405139+00:00
[2023-10-27, 08:26:57 UTC] {standard_task_runner.py:57} INFO - Started process 12975 to run task
[2023-10-27, 08:26:57 UTC] {standard_task_runner.py:84} INFO - Running: ['airflow', 'tasks', 'run', 'rre-value-de', 'define-project-parameters', 'manual__2023-10-27T08:21:46.405139+00:00', '--job-id', '259', '--raw', '--subdir', 'DAGS_FOLDER/rre_value_de_dag.py', '--cfg-path', '/tmp/tmp8w8z_1aw']
[2023-10-27, 08:26:57 UTC] {standard_task_runner.py:85} INFO - Job 259: Subtask define-project-parameters
[2023-10-27, 08:26:57 UTC] {task_command.py:415} INFO - Running <TaskInstance: rre-value-de.define-project-parameters manual__2023-10-27T08:21:46.405139+00:00 [running]> on host ip-172-31-14-127.eu-west-1.compute.internal
[2023-10-27, 08:26:58 UTC] {taskinstance.py:1660} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='rre-value-de' AIRFLOW_CTX_TASK_ID='define-project-parameters' AIRFLOW_CTX_EXECUTION_DATE='2023-10-27T08:21:46.405139+00:00' AIRFLOW_CTX_TRY_NUMBER='2' AIRFLOW_CTX_DAG_RUN_ID='manual__2023-10-27T08:21:46.405139+00:00'
[2023-10-27, 08:26:58 UTC] {rre_value_de_dag.py:36} INFO - Starting execution of KedroOperator
[2023-10-27, 08:26:58 UTC] {logging_mixin.py:151} INFO - rre-value
[2023-10-27, 08:26:58 UTC] {logging_mixin.py:151} INFO - Model version 20231027-112147
[2023-10-27, 08:26:58 UTC] {logging_mixin.py:151} INFO - Project rre-value
[2023-10-27, 08:26:58 UTC] {logging_mixin.py:151} INFO - rre-value
[2023-10-27, 08:26:58 UTC] {local_task_job_runner.py:228} INFO - Task exited with return code -6
I wonder what does the error code -6 mean here.Nok Lam Chan
10/30/2023, 11:40 AMmeharji arumilli
10/30/2023, 11:41 AMNok Lam Chan
10/30/2023, 11:41 AMI wonder what does the error code -6 mean here.I have no idea, this seem to be something come from Airflow instead of Kedro
Now the log gives the errorIt’s all INFO logging, can you point me to the error?*** Found local files: