Allen Ma11/01/2022, 3:17 AM
Julian Waton11/01/2022, 2:52 PM
(to save time over a full pipeline run), I often discover that some data does not exist in my environment - but it takes a while to discover this, as the code needs to run first • Then "checking whether the data exists" is a bit complex: ◦ Either check whether it is an intermediate output of the provided pipeline ◦ Or check whether it can be read from the catalog using the
method of the abstract dataset class Is this something that someone has already built, and is it a common use case?
Zirui Xu11/01/2022, 3:55 PM
Jose Alejandro Montaña Cortes11/01/2022, 4:11 PM
Lucie Gattepaille11/02/2022, 1:01 PM
Vladimir Filimonov11/02/2022, 1:56 PM
on kedro source code. 6 tests failing and seems like all of them related to parallel task runner. Have anyone had similar issue? My current hypothesis that it all fails due to
. But wondering what might’ve caused an issue. Here is list of tests failed:
ModuleNotFoundError: No module named 'tests.runner'
And here is failure of test in
FAILED tests/framework/cli/test_cli.py::TestRunCommand::test_run_successfully_parallel - assert not 1 FAILED tests/framework/session/test_session_extension_hooks.py::TestNodeHooks::test_on_node_error_hook_parallel_runner - assert 0 == 2 FAILED tests/framework/session/test_session_extension_hooks.py::TestNodeHooks::test_before_and_after_node_run_hooks_parallel_runner - assert 0 == 2 FAILED tests/framework/session/test_session_extension_hooks.py::TestDataSetHooks::test_before_and_after_dataset_loaded_hooks_parallel_runner - as... FAILED tests/framework/session/test_session_extension_hooks.py::TestDataSetHooks::test_before_and_after_dataset_saved_hooks_parallel_runner - ass... FAILED tests/framework/session/test_session_extension_hooks.py::TestBeforeNodeRunHookWithInputUpdates::test_correct_input_update_parallel - asser... FAILED tests/framework/session/test_session_extension_hooks.py::TestBeforeNodeRunHookWithInputUpdates::test_broken_input_update_parallel - Failed...
╭───────────────────── Traceback (most recent call last) ──────────────────────╮ │ <string>:1 in <module> │ │ │ │ /Users/Vladimir_Filimonov/opt/anaconda3/envs/kedro-environment/lib/python3.8 │ │ /multiprocessing/spawn.py:116 in spawn_main │ │ │ │ 113 │ │ resource_tracker._resource_tracker._fd = tracker_fd │ │ 114 │ │ fd = pipe_handle │ │ 115 │ │ parent_sentinel = os.dup(pipe_handle) │ │ ❱ 116 │ exitcode = _main(fd, parent_sentinel) │ │ 117 │ sys.exit(exitcode) │ │ 118 │ │ 119 │ │ │ │ /Users/Vladimir_Filimonov/opt/anaconda3/envs/kedro-environment/lib/python3.8 │ │ /multiprocessing/spawn.py:126 in _main │ │ │ │ 123 │ │ try: │ │ 124 │ │ │ preparation_data = reduction.pickle.load(from_parent) │ │ 125 │ │ │ prepare(preparation_data) │ │ ❱ 126 │ │ │ self = reduction.pickle.load(from_parent) │ │ 127 │ │ finally: │ │ 128 │ │ │ del process.current_process()._inheriting │ │ 129 │ return self._bootstrap(parent_sentinel) │ ╰──────────────────────────────────────────────────────────────────────────────╯ ModuleNotFoundError: No module named 'tests.runner'
Zirui Xu11/02/2022, 4:28 PM
Earl Hammond11/02/2022, 6:31 PM
we see this set of warnings, the pipeline runs fine. Just wondering what Kedro is doing at this time: WARNING: Something went wrong with getting the username to send to the Heap. Exception: [Errno 6] No such device or address WARNING: Failed to send data to Heap. Exception of type 'ConnectionError was raised. Thanks in advance!
Allen Ma11/03/2022, 5:59 AM
metadata = bootstrap_project(Path.cwd())
with KedroSession.create(metadata.package_name) as session:
can succeed but the second way can’t
Debanjan Banerjee11/03/2022, 9:37 AM
, it failed with the error
. This has never happened with me before , any ideas what might be causing this Did we change the way viz is supposed to be called ?
Error: No such command 'viz'
Debanjan Banerjee11/03/2022, 10:12 AM
Debanjan Banerjee11/03/2022, 10:12 AM
viveca11/03/2022, 3:32 PM
and I’d like to do
. A similar question was asked previously https://linen-discord.kedro.org/t/2203662/Hi-all-I-have-a-beginner-question-on-Kedro-0-18-2-I-have-a-T with writing a custom TemplatedConfigLoader as solution: https://github.com/noklam/kedro_gallery/blob/master/template_config_loader_demo/src/template_config_loader_demo/settings.py Is this the recommended approach or is there a way of achieving what I want without writing a custom TemplatedConfigLoader accessing private variables? Is there no other way to add all runtime parameters to the global dict? I’d really want to avoid that if possible in case a future kedro update changes things.
kedro run --params configurable_filepath:/path/to/file
Filip Panovski11/03/2022, 3:35 PM
I run this pipeline from my local workstation for testing purposes. My Dask Cluster is then deployed on AWS EC2 (Scheduler+Workers) and they communicate privately. I noticed that on the last node, the
dask.ParquetDataSet from s3 -> MemoryDataSet -> dask.ParquetDataSet to s3
causes the data to be transferred to my local machine where the Kedro pipeline is being run, and then transferred back to s3. Needless to say this introduces costs and lag and is not what I intended. Can I tell the workers to write this data directly to s3? If not, what is the intended way to do this? I read through the documentation, and there is some very good information on getting the Pipeline to run as either step functions or on AWS Batch, but this is not quite the deployment flow I had in mind. Is it intended for the pipeline to be run on the same infrastructure where the workers are deployed?
MemoryDataSet -> dask.ParquetDataSet to s3
Seth11/03/2022, 3:56 PM
Earl Hammond11/03/2022, 6:16 PM
viveca11/04/2022, 8:03 AM
of the catalog entry as a parameter to
but according to my other discussion with @datajoely this is not allowed in kedro by design. Has anyone else used kedro this way or should I just skip kedro for inference or similar type of pipelines with varying input?
Allen Ma11/04/2022, 1:55 PM
I got the following error message:
but I use
22/11/04 21:25:59 ERROR SparkUI: Failed to bind SparkUI java.net.BindException: Failed to bind to /0.0.0.0:9016: Service 'SparkUI' failed after 16 retries (starting from 9000)! Consider explicitly setting the appropriate port for the service 'SparkUI' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
Eduardo Lopez11/04/2022, 4:23 PM
Jonathan Javier Velásquez Quesquén11/06/2022, 6:09 PM
Jonathan Javier Velásquez Quesquén11/06/2022, 6:16 PM
Sean Westgate11/07/2022, 3:08 PM
effectively? The Spaceflights tutorial is pretty minimalist and works in that you can see the pipelines and nodes, but how is this used to document parameters or inputs and outputs in greater detail? Thank you!
user11/07/2022, 5:48 PM
user11/08/2022, 7:58 AM
Safouane Chergui11/08/2022, 9:38 AM
Jordan11/08/2022, 12:26 PM
John Melendowski11/09/2022, 1:51 AM
versions which I'm assuming is for the project management features kedro supplies...or
which needs to be downgraded from the latest anaconda release
Yuchu Liu11/09/2022, 12:25 PM
. When I try to load it from the terminal, in a virtual environment I setup for kedro, it tries to load extension
kedro jupyter notebook
from a wrong version of Python. As a result, I don't have any kedro specific commas in jupyter notebook. Here is the warning I get when loading a notebook.
I have tried to load kedro from ipython in the terminal using, and it works perfectly fine.
[I 13:20:49.212 NotebookApp] Kernel started: 21bb83e7-2e5f-4463-a43e-23744ec3ed02, name: kedro_nfr_transactions [IPKernelApp] WARNING | Error in loading extension: kedro.ipython Check your config files in /Users/yuchu_liu/.ipython/profile_default Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/logging/config.py", line 544, in configure formatters[name] = self.configure_formatter( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/logging/config.py", line 676, in configure_formatter c = _resolve(cname) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/logging/config.py", line 90, in _resolve found = __import__(used) ModuleNotFoundError: No module named 'pythonjsonlogger'
Does anyone know how to debug this issue? Thank you!
%load_ext kedro.extras.extensions.ipython %reload_kedro .
Luis Gustavo Souza11/09/2022, 12:26 PM
Rosh11/09/2022, 2:41 PM
with Spark on GCP? We are wanting to understand how Kedro would work with Spark on GCP Composer and if there is any integration for this that's already available, checked this GitHub issue here but couldn't find anything further : https://github.com/quantumblacklabs/kedro-airflow/issues/65