https://kedro.org/ logo
#questions
Title
# questions
m

Melvin Kok

04/04/2023, 10:58 AM
Hi Kedro team/users! I found two unusual behaviours with kedro and would like to ask if anyone else is facing the same issues 1.
after_catalog_created
hook is triggered before
after_context_created
. However this is fixed when
kedro-telemetry
is uninstalled (I have raised an issue here) 2.
kedro-telemetry
is still sending information about the data catalog, the default pipeline etc to heapanalytics.com even if consent is set to false. Under
KedroTelemetryProjectHooks
, it is calling
_send_heap_event
without checking for consent.
🙏 1
d

datajoely

04/04/2023, 11:02 AM
hello can you please post the start of your logs this doesn’t sound right
how do you know telemetry is kicked in?
n

Nok Lam Chan

04/04/2023, 11:03 AM
https://github.com/kedro-org/kedro/issues/2492 Posting the original Github Issue here
m

Melvin Kok

04/04/2023, 11:11 AM
@datajoely Regarding telemetry kicking in: my team was getting this warning:
WARNING  Failed to send data to Heap. Exception of type 'ConnectTimeout' was raised
even though we set consent to false. Started a debugger and eventually led me to
KedroTelemetryProjectHooks
calling
_send_heap_event
Regarding logs - let me start a run with debug level logs and get back
👍🏼 1
n

Nok Lam Chan

04/04/2023, 11:15 AM
When I removed
kedro-telemetry
,
after_context_created
was triggered first. When I reinstall
kedro-telemetry
,
after_catalog_created
was triggered first.
Some log will helps to confirm this - it’s pretty unlikely.
Thanks! Can you also share your telemetry version? I could debug it in parallel
m

Melvin Kok

04/04/2023, 11:16 AM
0.2.3
n

Nok Lam Chan

04/04/2023, 11:19 AM
One more question - how do you involve the project run. Via
kedro run
or Python API?
m

Melvin Kok

04/04/2023, 11:19 AM
kedro run
, with
--pipeline pipeline_name
if that matters
Copy code
2023-04-04 18:18:06,023 - kedro.framework.session.session - INFO - Kedro project <project_name>
2023-04-04 18:18:06,031 - kedro.config.common - INFO - Config from path '<project_folder>\conf\local' will override the following existing top-level config keys: base_path, workspace
2023-04-04 18:18:06,228 - py.warnings - WARNING - <project_folder>\venv\lib\site-packages\google\rpc\__init__.py:20: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google.rpc')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See <https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages>
  pkg_resources.declare_namespace(__name__)

2023-04-04 18:18:06,257 - py.warnings - WARNING - <project_folder>\venv\lib\site-packages\pkg_resources\__init__.py:2349: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See <https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages>
  declare_namespace(parent)

2023-04-04 18:18:08,689 - py.warnings - WARNING - <project_folder>\venv\lib\site-packages\google\auth\_default.py:78: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a "quota exceeded" or "API not enabled" error. See the following page for troubleshooting: <https://cloud.google.com/docs/authentication/adc-troubleshooting/user-creds>. 
  warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)

2023-04-04 18:18:21,007 - py.warnings - WARNING - <project_folder>\venv\lib\site-packages\seaborn\rcmod.py:82: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  if LooseVersion(mpl.__version__) >= "3.0":

2023-04-04 18:18:21,021 - py.warnings - WARNING - <project_folder>\venv\lib\site-packages\setuptools\_distutils\version.py:345: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  other = LooseVersion(other)

2023-04-04 18:19:44,457 - kedro_telemetry.plugin - WARNING - Failed to send data to Heap. Exception of type 'ConnectTimeout' was raised.
2023-04-04 18:19:45,165 - kedro.io.data_catalog - INFO - Loading data from '<dataset_name>' (ParquetDataSet)...
...
Oops looks like I didn’t include the debug logs. let me try again
d

datajoely

04/04/2023, 11:29 AM
are your running this on google dataproc?
Copy code
pkg_resources.declare_namespace('google.rpc')
m

Melvin Kok

04/04/2023, 11:29 AM
Just locally, but our datasets are in GCS
d

datajoely

04/04/2023, 11:29 AM
this is super weird
the easiest solution is to remove
kedro-telemetry
from your dependencies
can you log the following please to ensure we’re talking about the right envs
Copy code
import sys

print(sys.version)
print(sys.executable)
m

Melvin Kok

04/04/2023, 11:31 AM
Copy code
2023-04-04 18:24:27,500 - kedro.framework.session.session - INFO - Kedro project Py_FuelEfficiencyPOC_svc
2023-04-04 18:24:27,505 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\globals.yml'
2023-04-04 18:24:27,509 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\local\globals.yml'
2023-04-04 18:24:27,512 - kedro.config.common - INFO - Config from path '<project_folder>\conf\local' will override the following existing top-level config keys: base_path, workspace
2023-04-04 18:24:27,523 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l01_raw.yml'
2023-04-04 18:24:27,534 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l02_intermediate.yml'
2023-04-04 18:24:27,540 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l03_primary.yml'
2023-04-04 18:24:27,546 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l04_feature.yml'
2023-04-04 18:24:27,552 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l05_model_input.yml'
2023-04-04 18:24:27,557 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l06_models.yml'
2023-04-04 18:24:27,562 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l07_model_output.yml'
2023-04-04 18:24:27,566 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l08_reporting.yml'
2023-04-04 18:24:27,585 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\local\credentials.yml'
2023-04-04 18:24:27,700 - py.warnings - WARNING - <project_folder>\venv\lib\site-packages\google\rpc\__init__.py:20: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google.rpc')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See <https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages>
  pkg_resources.declare_namespace(__name__)

2023-04-04 18:24:27,724 - py.warnings - WARNING - <project_folder>\venv\lib\site-packages\pkg_resources\__init__.py:2349: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See <https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages>
  declare_namespace(parent)

2023-04-04 18:24:30,173 - py.warnings - WARNING - <project_folder>\venv\lib\site-packages\google\auth\_default.py:78: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a "quota exceeded" or "API not enabled" error. See the following page for troubleshooting: <https://cloud.google.com/docs/authentication/adc-troubleshooting/user-creds>. 
  warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)

2023-04-04 18:24:36,907 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l01_raw.yml'
2023-04-04 18:24:36,915 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l02_intermediate.yml'
2023-04-04 18:24:36,928 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l03_primary.yml'
2023-04-04 18:24:36,936 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l04_feature.yml'
2023-04-04 18:24:36,943 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l05_model_input.yml'
2023-04-04 18:24:36,948 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l06_models.yml'
2023-04-04 18:24:36,954 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l07_model_output.yml'
2023-04-04 18:24:36,959 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l08_reporting.yml'
2023-04-04 18:24:36,968 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\local\parameters.yml'
2023-04-04 18:24:42,138 - py.warnings - WARNING - <project_folder>\venv\lib\site-packages\seaborn\rcmod.py:82: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  if LooseVersion(mpl.__version__) >= "3.0":

2023-04-04 18:24:42,152 - py.warnings - WARNING - <project_folder>\venv\lib\site-packages\setuptools\_distutils\version.py:345: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  other = LooseVersion(other)

2023-04-04 18:26:05,540 - kedro_telemetry.plugin - WARNING - Failed to send data to Heap. Exception of type 'ConnectTimeout' was raised.
2023-04-04 18:26:05,557 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l01_raw.yml'
2023-04-04 18:26:05,569 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l02_intermediate.yml'
2023-04-04 18:26:05,577 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l03_primary.yml'
2023-04-04 18:26:05,584 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l04_feature.yml'
2023-04-04 18:26:05,590 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l05_model_input.yml'
2023-04-04 18:26:05,597 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l06_models.yml'
2023-04-04 18:26:05,604 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l07_model_output.yml'
2023-04-04 18:26:05,610 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l08_reporting.yml'
2023-04-04 18:26:05,631 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\local\credentials.yml'
2023-04-04 18:26:06,354 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l01_raw.yml'
2023-04-04 18:26:06,362 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l02_intermediate.yml'
2023-04-04 18:26:06,374 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l03_primary.yml'
2023-04-04 18:26:06,381 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l04_feature.yml'
2023-04-04 18:26:06,388 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l05_model_input.yml'
2023-04-04 18:26:06,394 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l06_models.yml'
2023-04-04 18:26:06,401 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l07_model_output.yml'
2023-04-04 18:26:06,407 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l08_reporting.yml'
2023-04-04 18:26:06,414 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\local\parameters.yml'
2023-04-04 18:26:06,454 - kedro.io.data_catalog - INFO - Loading data from '<dataset_name>' (ParquetDataSet)...
...
2023-04-04 18:26:08,022 - kedro.runner.sequential_runner - INFO - Completed 3 out of 3 tasks
2023-04-04 18:26:08,025 - kedro.runner.sequential_runner - INFO - Pipeline execution completed successfully.
2023-04-04 18:26:08,028 - kedro.framework.session.store - DEBUG - 'save()' not implemented for 'BaseSessionStore'. Skipping the step.
Debug logs if it helps
Copy code
>>> print(sys.version)
3.8.10 (tags/v3.8.10:3d8993a, May  3 2021, 11:48:03) [MSC v.1928 64 bit (AMD64)]
>>> print(sys.executable)
<project_folder>\venv\Scripts\python.exe
>>>
d

datajoely

04/04/2023, 11:32 AM
is that what you expect?
you only have one python environment
m

Melvin Kok

04/04/2023, 11:33 AM
Yup it’s expected
d

datajoely

04/04/2023, 11:34 AM
can you open the
.telemetry
file in the project root?
m

Melvin Kok

04/04/2023, 11:34 AM
consent: false
certainly hope it’s not because of a typo here 😂
d

datajoely

04/04/2023, 11:37 AM
consent: false
is correct
m

Melvin Kok

04/04/2023, 11:37 AM
FWIW I tried debugging telemetry and found that the
before_command_run
hook in
KedroTelemetryCLIHooks
is catching my
.telemetry
properly, it’s just the
after_context_created
hook in
KedroTelemetryProjectHooks
that doesn’t check for consent
d

datajoely

04/04/2023, 11:37 AM
at the moment, I’m not convinced the hook execution order will affect things
Copy code
import pathlib; pathlib.Path('.telemetry').read_text()
can you please add this to your logging?
at the moment I’m not sure how this can possibly return true
m

Melvin Kok

04/04/2023, 11:41 AM
via python console:
Copy code
>>> import pathlib; pathlib.Path('.telemetry').read_text()
'consent: false'
d

datajoely

04/04/2023, 11:41 AM
not in your repl
can you do it as part of your kedro run
same as this
Copy code
print(sys.version)
print(sys.executable)
you can put it in your hook
or use a the logging module
m

Melvin Kok

04/04/2023, 11:46 AM
Ok running it
It doesn’t call
_check_for_telemetry_consent
at all
d

datajoely

04/04/2023, 12:00 PM
the consent check is in the plug-in not kedro
so if you remove the plugin it wont run full stop
n

Nok Lam Chan

04/04/2023, 12:11 PM
I’m trying to create a project to check
d

datajoely

04/04/2023, 12:24 PM
please uninstall kedro-telemetry for the time being
m

Melvin Kok

04/04/2023, 1:59 PM
yup I understand that the consent is in
kedro-telemetry
. just pointing out that
KedroTelemetryProjectHooks.after_context_created
is missing the telemetry consent check (for reference,
KedroTelemetryCLIHooks.before_command_run
contains the consent check), perhaps that’s where a fix is needed 😀
Also here are the debug logs, I have added the logging statements as you requested @datajoely
Interestingly the
after_catalog_created
on my custom hook for MLFlow is being called twice - once before
after_context_created
and once after
n

Nok Lam Chan

04/04/2023, 2:12 PM
@Melvin Kok Is this happening only with Telemetry installed/enable?
m

Melvin Kok

04/04/2023, 2:12 PM
Yup, let me provide you with the logs for the same run but with telemetry uninstalled
n

Nok Lam Chan

04/04/2023, 2:14 PM
I have a theory here, I think the order is correct
m

Melvin Kok

04/04/2023, 2:15 PM
@Nok Lam Chan Same pipeline etc, but telemetry uninstalled
n

Nok Lam Chan

04/04/2023, 2:19 PM
What happen is this - catalog is a read-only object, everytime
context.catalog
get called it get created and trigger the
after_catalog_created
hook. In the telemetry hook
after_context_created
it created
catalog
, so it trigger the
after_catalog_created
before your MLFlowHook’s
after_context_created
Is this causing any problem to your workflow? You can control the order of hooks by adding it in
settings.py
explicitly, I guess in this case the telemetry hook is triggered first.
m

Melvin Kok

04/04/2023, 2:32 PM
Now that I know the root cause I can work around it, all is well. Thank you so much @Nok Lam Chan and @datajoely!
d

datajoely

04/04/2023, 2:33 PM
Thank you for raising it @Melvin Kok
n

Nok Lam Chan

04/04/2023, 2:41 PM
Thanks @Melvin Kok. For what it’s worth I open a separate issue about the Hooks. It’s a somewhat known problem that hooks can interfere with each other, but no one ever complained. https://github.com/kedro-org/kedro/issues/2493
👍 1
y

Yetunde

04/04/2023, 2:45 PM
@Melvin Kok Thank you so much for raising the GitHub issue and also providing so much context so we can fix this. I want to comment on
kedro-telemetry
. We are going to do the following: • Ship and release an immediate fix for the plugin which means that the hook which collected anonymised information about the size of the project (number of datasets, pipelines and nodes) will observe your consent • And then we're deleting data collected from
kedro-telemetry
0.2.2 and 0.2.3 which are the affected versions • We'll also do a team retrospective to come up with additional actions to make sure that we don't miss things like this again • And we'll roll out communication to all of our users which will cover all of the above
K 3
👍 4
8 Views