Hi Kedro team/users! I found two unusual behaviour...
# questions
m
Hi Kedro team/users! I found two unusual behaviours with kedro and would like to ask if anyone else is facing the same issues 1.
after_catalog_created
hook is triggered before
after_context_created
. However this is fixed when
kedro-telemetry
is uninstalled (I have raised an issue here) 2.
kedro-telemetry
is still sending information about the data catalog, the default pipeline etc to heapanalytics.com even if consent is set to false. Under
KedroTelemetryProjectHooks
, it is calling
_send_heap_event
without checking for consent.
🙏 1
d
hello can you please post the start of your logs this doesn’t sound right
how do you know telemetry is kicked in?
n
https://github.com/kedro-org/kedro/issues/2492 Posting the original Github Issue here
m
@datajoely Regarding telemetry kicking in: my team was getting this warning:
WARNING  Failed to send data to Heap. Exception of type 'ConnectTimeout' was raised
even though we set consent to false. Started a debugger and eventually led me to
KedroTelemetryProjectHooks
calling
_send_heap_event
Regarding logs - let me start a run with debug level logs and get back
👍🏼 1
n
When I removed
kedro-telemetry
,
after_context_created
was triggered first. When I reinstall
kedro-telemetry
,
after_catalog_created
was triggered first.
Some log will helps to confirm this - it’s pretty unlikely.
Thanks! Can you also share your telemetry version? I could debug it in parallel
m
0.2.3
n
One more question - how do you involve the project run. Via
kedro run
or Python API?
m
kedro run
, with
--pipeline pipeline_name
if that matters
Copy code
2023-04-04 18:18:06,023 - kedro.framework.session.session - INFO - Kedro project <project_name>
2023-04-04 18:18:06,031 - kedro.config.common - INFO - Config from path '<project_folder>\conf\local' will override the following existing top-level config keys: base_path, workspace
2023-04-04 18:18:06,228 - py.warnings - WARNING - <project_folder>\venv\lib\site-packages\google\rpc\__init__.py:20: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google.rpc')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See <https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages>
  pkg_resources.declare_namespace(__name__)

2023-04-04 18:18:06,257 - py.warnings - WARNING - <project_folder>\venv\lib\site-packages\pkg_resources\__init__.py:2349: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See <https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages>
  declare_namespace(parent)

2023-04-04 18:18:08,689 - py.warnings - WARNING - <project_folder>\venv\lib\site-packages\google\auth\_default.py:78: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a "quota exceeded" or "API not enabled" error. See the following page for troubleshooting: <https://cloud.google.com/docs/authentication/adc-troubleshooting/user-creds>. 
  warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)

2023-04-04 18:18:21,007 - py.warnings - WARNING - <project_folder>\venv\lib\site-packages\seaborn\rcmod.py:82: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  if LooseVersion(mpl.__version__) >= "3.0":

2023-04-04 18:18:21,021 - py.warnings - WARNING - <project_folder>\venv\lib\site-packages\setuptools\_distutils\version.py:345: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  other = LooseVersion(other)

2023-04-04 18:19:44,457 - kedro_telemetry.plugin - WARNING - Failed to send data to Heap. Exception of type 'ConnectTimeout' was raised.
2023-04-04 18:19:45,165 - kedro.io.data_catalog - INFO - Loading data from '<dataset_name>' (ParquetDataSet)...
...
Oops looks like I didn’t include the debug logs. let me try again
d
are your running this on google dataproc?
Copy code
pkg_resources.declare_namespace('google.rpc')
m
Just locally, but our datasets are in GCS
d
this is super weird
the easiest solution is to remove
kedro-telemetry
from your dependencies
can you log the following please to ensure we’re talking about the right envs
Copy code
import sys

print(sys.version)
print(sys.executable)
m
Copy code
2023-04-04 18:24:27,500 - kedro.framework.session.session - INFO - Kedro project Py_FuelEfficiencyPOC_svc
2023-04-04 18:24:27,505 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\globals.yml'
2023-04-04 18:24:27,509 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\local\globals.yml'
2023-04-04 18:24:27,512 - kedro.config.common - INFO - Config from path '<project_folder>\conf\local' will override the following existing top-level config keys: base_path, workspace
2023-04-04 18:24:27,523 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l01_raw.yml'
2023-04-04 18:24:27,534 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l02_intermediate.yml'
2023-04-04 18:24:27,540 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l03_primary.yml'
2023-04-04 18:24:27,546 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l04_feature.yml'
2023-04-04 18:24:27,552 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l05_model_input.yml'
2023-04-04 18:24:27,557 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l06_models.yml'
2023-04-04 18:24:27,562 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l07_model_output.yml'
2023-04-04 18:24:27,566 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l08_reporting.yml'
2023-04-04 18:24:27,585 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\local\credentials.yml'
2023-04-04 18:24:27,700 - py.warnings - WARNING - <project_folder>\venv\lib\site-packages\google\rpc\__init__.py:20: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google.rpc')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See <https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages>
  pkg_resources.declare_namespace(__name__)

2023-04-04 18:24:27,724 - py.warnings - WARNING - <project_folder>\venv\lib\site-packages\pkg_resources\__init__.py:2349: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See <https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages>
  declare_namespace(parent)

2023-04-04 18:24:30,173 - py.warnings - WARNING - <project_folder>\venv\lib\site-packages\google\auth\_default.py:78: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a "quota exceeded" or "API not enabled" error. See the following page for troubleshooting: <https://cloud.google.com/docs/authentication/adc-troubleshooting/user-creds>. 
  warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)

2023-04-04 18:24:36,907 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l01_raw.yml'
2023-04-04 18:24:36,915 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l02_intermediate.yml'
2023-04-04 18:24:36,928 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l03_primary.yml'
2023-04-04 18:24:36,936 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l04_feature.yml'
2023-04-04 18:24:36,943 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l05_model_input.yml'
2023-04-04 18:24:36,948 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l06_models.yml'
2023-04-04 18:24:36,954 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l07_model_output.yml'
2023-04-04 18:24:36,959 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l08_reporting.yml'
2023-04-04 18:24:36,968 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\local\parameters.yml'
2023-04-04 18:24:42,138 - py.warnings - WARNING - <project_folder>\venv\lib\site-packages\seaborn\rcmod.py:82: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  if LooseVersion(mpl.__version__) >= "3.0":

2023-04-04 18:24:42,152 - py.warnings - WARNING - <project_folder>\venv\lib\site-packages\setuptools\_distutils\version.py:345: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  other = LooseVersion(other)

2023-04-04 18:26:05,540 - kedro_telemetry.plugin - WARNING - Failed to send data to Heap. Exception of type 'ConnectTimeout' was raised.
2023-04-04 18:26:05,557 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l01_raw.yml'
2023-04-04 18:26:05,569 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l02_intermediate.yml'
2023-04-04 18:26:05,577 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l03_primary.yml'
2023-04-04 18:26:05,584 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l04_feature.yml'
2023-04-04 18:26:05,590 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l05_model_input.yml'
2023-04-04 18:26:05,597 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l06_models.yml'
2023-04-04 18:26:05,604 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l07_model_output.yml'
2023-04-04 18:26:05,610 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l08_reporting.yml'
2023-04-04 18:26:05,631 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\local\credentials.yml'
2023-04-04 18:26:06,354 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l01_raw.yml'
2023-04-04 18:26:06,362 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l02_intermediate.yml'
2023-04-04 18:26:06,374 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l03_primary.yml'
2023-04-04 18:26:06,381 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l04_feature.yml'
2023-04-04 18:26:06,388 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l05_model_input.yml'
2023-04-04 18:26:06,394 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l06_models.yml'
2023-04-04 18:26:06,401 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l07_model_output.yml'
2023-04-04 18:26:06,407 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l08_reporting.yml'
2023-04-04 18:26:06,414 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\local\parameters.yml'
2023-04-04 18:26:06,454 - kedro.io.data_catalog - INFO - Loading data from '<dataset_name>' (ParquetDataSet)...
...
2023-04-04 18:26:08,022 - kedro.runner.sequential_runner - INFO - Completed 3 out of 3 tasks
2023-04-04 18:26:08,025 - kedro.runner.sequential_runner - INFO - Pipeline execution completed successfully.
2023-04-04 18:26:08,028 - kedro.framework.session.store - DEBUG - 'save()' not implemented for 'BaseSessionStore'. Skipping the step.
Debug logs if it helps
Copy code
>>> print(sys.version)
3.8.10 (tags/v3.8.10:3d8993a, May  3 2021, 11:48:03) [MSC v.1928 64 bit (AMD64)]
>>> print(sys.executable)
<project_folder>\venv\Scripts\python.exe
>>>
d
is that what you expect?
you only have one python environment
m
Yup it’s expected
d
can you open the
.telemetry
file in the project root?
m
consent: false
certainly hope it’s not because of a typo here 😂
d
consent: false
is correct
m
FWIW I tried debugging telemetry and found that the
before_command_run
hook in
KedroTelemetryCLIHooks
is catching my
.telemetry
properly, it’s just the
after_context_created
hook in
KedroTelemetryProjectHooks
that doesn’t check for consent
d
at the moment, I’m not convinced the hook execution order will affect things
Copy code
import pathlib; pathlib.Path('.telemetry').read_text()
can you please add this to your logging?
at the moment I’m not sure how this can possibly return true
m
via python console:
Copy code
>>> import pathlib; pathlib.Path('.telemetry').read_text()
'consent: false'
d
not in your repl
can you do it as part of your kedro run
same as this
Copy code
print(sys.version)
print(sys.executable)
you can put it in your hook
or use a the logging module
m
Ok running it
It doesn’t call
_check_for_telemetry_consent
at all
d
the consent check is in the plug-in not kedro
so if you remove the plugin it wont run full stop
n
I’m trying to create a project to check
d
please uninstall kedro-telemetry for the time being
m
yup I understand that the consent is in
kedro-telemetry
. just pointing out that
KedroTelemetryProjectHooks.after_context_created
is missing the telemetry consent check (for reference,
KedroTelemetryCLIHooks.before_command_run
contains the consent check), perhaps that’s where a fix is needed 😀
Also here are the debug logs, I have added the logging statements as you requested @datajoely
Interestingly the
after_catalog_created
on my custom hook for MLFlow is being called twice - once before
after_context_created
and once after
n
@Melvin Kok Is this happening only with Telemetry installed/enable?
m
Yup, let me provide you with the logs for the same run but with telemetry uninstalled
n
I have a theory here, I think the order is correct
m
@Nok Lam Chan Same pipeline etc, but telemetry uninstalled
n
What happen is this - catalog is a read-only object, everytime
context.catalog
get called it get created and trigger the
after_catalog_created
hook. In the telemetry hook
after_context_created
it created
catalog
, so it trigger the
after_catalog_created
before your MLFlowHook’s
after_context_created
Is this causing any problem to your workflow? You can control the order of hooks by adding it in
settings.py
explicitly, I guess in this case the telemetry hook is triggered first.
m
Now that I know the root cause I can work around it, all is well. Thank you so much @Nok Lam Chan and @datajoely!
d
Thank you for raising it @Melvin Kok
n
Thanks @Melvin Kok. For what it’s worth I open a separate issue about the Hooks. It’s a somewhat known problem that hooks can interfere with each other, but no one ever complained. https://github.com/kedro-org/kedro/issues/2493
👍 1
y
@Melvin Kok Thank you so much for raising the GitHub issue and also providing so much context so we can fix this. I want to comment on
kedro-telemetry
. We are going to do the following: • Ship and release an immediate fix for the plugin which means that the hook which collected anonymised information about the size of the project (number of datasets, pipelines and nodes) will observe your consent • And then we're deleting data collected from
kedro-telemetry
0.2.2 and 0.2.3 which are the affected versions • We'll also do a team retrospective to come up with additional actions to make sure that we don't miss things like this again • And we'll roll out communication to all of our users which will cover all of the above
K 3
👍 4