Melvin Kok
04/04/2023, 10:58 AMafter_catalog_created
hook is triggered before after_context_created
. However this is fixed when kedro-telemetry
is uninstalled (I have raised an issue here)
2. kedro-telemetry
is still sending information about the data catalog, the default pipeline etc to heapanalytics.com even if consent is set to false. Under KedroTelemetryProjectHooks
, it is calling _send_heap_event
without checking for consent.datajoely
04/04/2023, 11:02 AMNok Lam Chan
04/04/2023, 11:03 AMMelvin Kok
04/04/2023, 11:11 AMWARNING Failed to send data to Heap. Exception of type 'ConnectTimeout' was raised
even though we set consent to false. Started a debugger and eventually led me to KedroTelemetryProjectHooks
calling _send_heap_event
Nok Lam Chan
04/04/2023, 11:15 AMWhen I removedSome log will helps to confirm this - it’s pretty unlikely.,kedro-telemetry
was triggered first. When I reinstallafter_context_created
,kedro-telemetry
was triggered first.after_catalog_created
Melvin Kok
04/04/2023, 11:16 AMNok Lam Chan
04/04/2023, 11:19 AMkedro run
or Python API?Melvin Kok
04/04/2023, 11:19 AMkedro run
, with --pipeline pipeline_name
if that matters2023-04-04 18:18:06,023 - kedro.framework.session.session - INFO - Kedro project <project_name>
2023-04-04 18:18:06,031 - kedro.config.common - INFO - Config from path '<project_folder>\conf\local' will override the following existing top-level config keys: base_path, workspace
2023-04-04 18:18:06,228 - py.warnings - WARNING - <project_folder>\venv\lib\site-packages\google\rpc\__init__.py:20: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google.rpc')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See <https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages>
pkg_resources.declare_namespace(__name__)
2023-04-04 18:18:06,257 - py.warnings - WARNING - <project_folder>\venv\lib\site-packages\pkg_resources\__init__.py:2349: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See <https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages>
declare_namespace(parent)
2023-04-04 18:18:08,689 - py.warnings - WARNING - <project_folder>\venv\lib\site-packages\google\auth\_default.py:78: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a "quota exceeded" or "API not enabled" error. See the following page for troubleshooting: <https://cloud.google.com/docs/authentication/adc-troubleshooting/user-creds>.
warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)
2023-04-04 18:18:21,007 - py.warnings - WARNING - <project_folder>\venv\lib\site-packages\seaborn\rcmod.py:82: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
if LooseVersion(mpl.__version__) >= "3.0":
2023-04-04 18:18:21,021 - py.warnings - WARNING - <project_folder>\venv\lib\site-packages\setuptools\_distutils\version.py:345: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)
2023-04-04 18:19:44,457 - kedro_telemetry.plugin - WARNING - Failed to send data to Heap. Exception of type 'ConnectTimeout' was raised.
2023-04-04 18:19:45,165 - kedro.io.data_catalog - INFO - Loading data from '<dataset_name>' (ParquetDataSet)...
...
datajoely
04/04/2023, 11:29 AMpkg_resources.declare_namespace('google.rpc')
Melvin Kok
04/04/2023, 11:29 AMdatajoely
04/04/2023, 11:29 AMkedro-telemetry
from your dependenciesimport sys
print(sys.version)
print(sys.executable)
Melvin Kok
04/04/2023, 11:31 AM2023-04-04 18:24:27,500 - kedro.framework.session.session - INFO - Kedro project Py_FuelEfficiencyPOC_svc
2023-04-04 18:24:27,505 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\globals.yml'
2023-04-04 18:24:27,509 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\local\globals.yml'
2023-04-04 18:24:27,512 - kedro.config.common - INFO - Config from path '<project_folder>\conf\local' will override the following existing top-level config keys: base_path, workspace
2023-04-04 18:24:27,523 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l01_raw.yml'
2023-04-04 18:24:27,534 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l02_intermediate.yml'
2023-04-04 18:24:27,540 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l03_primary.yml'
2023-04-04 18:24:27,546 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l04_feature.yml'
2023-04-04 18:24:27,552 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l05_model_input.yml'
2023-04-04 18:24:27,557 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l06_models.yml'
2023-04-04 18:24:27,562 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l07_model_output.yml'
2023-04-04 18:24:27,566 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l08_reporting.yml'
2023-04-04 18:24:27,585 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\local\credentials.yml'
2023-04-04 18:24:27,700 - py.warnings - WARNING - <project_folder>\venv\lib\site-packages\google\rpc\__init__.py:20: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google.rpc')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See <https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages>
pkg_resources.declare_namespace(__name__)
2023-04-04 18:24:27,724 - py.warnings - WARNING - <project_folder>\venv\lib\site-packages\pkg_resources\__init__.py:2349: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See <https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages>
declare_namespace(parent)
2023-04-04 18:24:30,173 - py.warnings - WARNING - <project_folder>\venv\lib\site-packages\google\auth\_default.py:78: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a "quota exceeded" or "API not enabled" error. See the following page for troubleshooting: <https://cloud.google.com/docs/authentication/adc-troubleshooting/user-creds>.
warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)
2023-04-04 18:24:36,907 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l01_raw.yml'
2023-04-04 18:24:36,915 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l02_intermediate.yml'
2023-04-04 18:24:36,928 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l03_primary.yml'
2023-04-04 18:24:36,936 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l04_feature.yml'
2023-04-04 18:24:36,943 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l05_model_input.yml'
2023-04-04 18:24:36,948 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l06_models.yml'
2023-04-04 18:24:36,954 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l07_model_output.yml'
2023-04-04 18:24:36,959 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l08_reporting.yml'
2023-04-04 18:24:36,968 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\local\parameters.yml'
2023-04-04 18:24:42,138 - py.warnings - WARNING - <project_folder>\venv\lib\site-packages\seaborn\rcmod.py:82: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
if LooseVersion(mpl.__version__) >= "3.0":
2023-04-04 18:24:42,152 - py.warnings - WARNING - <project_folder>\venv\lib\site-packages\setuptools\_distutils\version.py:345: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)
2023-04-04 18:26:05,540 - kedro_telemetry.plugin - WARNING - Failed to send data to Heap. Exception of type 'ConnectTimeout' was raised.
2023-04-04 18:26:05,557 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l01_raw.yml'
2023-04-04 18:26:05,569 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l02_intermediate.yml'
2023-04-04 18:26:05,577 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l03_primary.yml'
2023-04-04 18:26:05,584 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l04_feature.yml'
2023-04-04 18:26:05,590 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l05_model_input.yml'
2023-04-04 18:26:05,597 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l06_models.yml'
2023-04-04 18:26:05,604 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l07_model_output.yml'
2023-04-04 18:26:05,610 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\catalog\l08_reporting.yml'
2023-04-04 18:26:05,631 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\local\credentials.yml'
2023-04-04 18:26:06,354 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l01_raw.yml'
2023-04-04 18:26:06,362 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l02_intermediate.yml'
2023-04-04 18:26:06,374 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l03_primary.yml'
2023-04-04 18:26:06,381 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l04_feature.yml'
2023-04-04 18:26:06,388 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l05_model_input.yml'
2023-04-04 18:26:06,394 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l06_models.yml'
2023-04-04 18:26:06,401 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l07_model_output.yml'
2023-04-04 18:26:06,407 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\base\parameters\l08_reporting.yml'
2023-04-04 18:26:06,414 - kedro.config.common - DEBUG - Loading config file: '<project_folder>\conf\local\parameters.yml'
2023-04-04 18:26:06,454 - kedro.io.data_catalog - INFO - Loading data from '<dataset_name>' (ParquetDataSet)...
...
2023-04-04 18:26:08,022 - kedro.runner.sequential_runner - INFO - Completed 3 out of 3 tasks
2023-04-04 18:26:08,025 - kedro.runner.sequential_runner - INFO - Pipeline execution completed successfully.
2023-04-04 18:26:08,028 - kedro.framework.session.store - DEBUG - 'save()' not implemented for 'BaseSessionStore'. Skipping the step.
Debug logs if it helps>>> print(sys.version)
3.8.10 (tags/v3.8.10:3d8993a, May 3 2021, 11:48:03) [MSC v.1928 64 bit (AMD64)]
>>> print(sys.executable)
<project_folder>\venv\Scripts\python.exe
>>>
datajoely
04/04/2023, 11:32 AMMelvin Kok
04/04/2023, 11:33 AMdatajoely
04/04/2023, 11:34 AM.telemetry
file in the project root?Melvin Kok
04/04/2023, 11:34 AMconsent: false
datajoely
04/04/2023, 11:37 AMconsent: false
Melvin Kok
04/04/2023, 11:37 AMbefore_command_run
hook in KedroTelemetryCLIHooks
is catching my .telemetry
properly, it’s just the after_context_created
hook in KedroTelemetryProjectHooks
that doesn’t check for consentdatajoely
04/04/2023, 11:37 AMimport pathlib; pathlib.Path('.telemetry').read_text()
can you please add this to your logging?Melvin Kok
04/04/2023, 11:41 AM>>> import pathlib; pathlib.Path('.telemetry').read_text()
'consent: false'
datajoely
04/04/2023, 11:41 AMprint(sys.version)
print(sys.executable)
Melvin Kok
04/04/2023, 11:46 AM_check_for_telemetry_consent
at alldatajoely
04/04/2023, 12:00 PMNok Lam Chan
04/04/2023, 12:11 PMdatajoely
04/04/2023, 12:24 PMMelvin Kok
04/04/2023, 1:59 PMkedro-telemetry
. just pointing out that KedroTelemetryProjectHooks.after_context_created
is missing the telemetry consent check (for reference, KedroTelemetryCLIHooks.before_command_run
contains the consent check), perhaps that’s where a fix is needed 😀after_catalog_created
on my custom hook for MLFlow is being called twice - once before after_context_created
and once afterNok Lam Chan
04/04/2023, 2:12 PMMelvin Kok
04/04/2023, 2:12 PMNok Lam Chan
04/04/2023, 2:14 PMMelvin Kok
04/04/2023, 2:15 PMNok Lam Chan
04/04/2023, 2:19 PMcontext.catalog
get called it get created and trigger the after_catalog_created
hook.
In the telemetry hook after_context_created
it created catalog
, so it trigger the after_catalog_created
before your MLFlowHook’s after_context_created
settings.py
explicitly, I guess in this case the telemetry hook is triggered first.Melvin Kok
04/04/2023, 2:32 PMdatajoely
04/04/2023, 2:33 PMNok Lam Chan
04/04/2023, 2:41 PMYetunde
04/04/2023, 2:45 PMkedro-telemetry
.
We are going to do the following:
• Ship and release an immediate fix for the plugin which means that the hook which collected anonymised information about the size of the project (number of datasets, pipelines and nodes) will observe your consent
• And then we're deleting data collected from kedro-telemetry
0.2.2 and 0.2.3 which are the affected versions
• We'll also do a team retrospective to come up with additional actions to make sure that we don't miss things like this again
• And we'll roll out communication to all of our users which will cover all of the above