Allen Ma
11/01/2022, 3:17 AMJulian Waton
11/01/2022, 2:52 PM--from-nodes
and --to-nodes
(to save time over a full pipeline run), I often discover that some data does not exist in my environment - but it takes a while to discover this, as the code needs to run first
• Then "checking whether the data exists" is a bit complex:
◦ Either check whether it is an intermediate output of the provided pipeline
◦ Or check whether it can be read from the catalog using the _exists
method of the abstract dataset class
Is this something that someone has already built, and is it a common use case?Zirui Xu
11/01/2022, 3:55 PMJose Alejandro Montaña Cortes
11/01/2022, 4:11 PMLucie Gattepaille
11/02/2022, 1:01 PMVladimir Filimonov
11/02/2022, 1:56 PMmake test-no-spark
on kedro source code. 6 tests failing and seems like all of them related to parallel task runner.
Have anyone had similar issue? My current hypothesis that it all fails due to ModuleNotFoundError: No module named 'tests.runner'
. But wondering what might’ve caused an issue.
Here is list of tests failed:
FAILED tests/framework/cli/test_cli.py::TestRunCommand::test_run_successfully_parallel - assert not 1
FAILED tests/framework/session/test_session_extension_hooks.py::TestNodeHooks::test_on_node_error_hook_parallel_runner - assert 0 == 2
FAILED tests/framework/session/test_session_extension_hooks.py::TestNodeHooks::test_before_and_after_node_run_hooks_parallel_runner - assert 0 == 2
FAILED tests/framework/session/test_session_extension_hooks.py::TestDataSetHooks::test_before_and_after_dataset_loaded_hooks_parallel_runner - as...
FAILED tests/framework/session/test_session_extension_hooks.py::TestDataSetHooks::test_before_and_after_dataset_saved_hooks_parallel_runner - ass...
FAILED tests/framework/session/test_session_extension_hooks.py::TestBeforeNodeRunHookWithInputUpdates::test_correct_input_update_parallel - asser...
FAILED tests/framework/session/test_session_extension_hooks.py::TestBeforeNodeRunHookWithInputUpdates::test_broken_input_update_parallel - Failed...
And here is failure of test in test_cli.py
reporting ModuleNotFound:
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ <string>:1 in <module> │
│ │
│ /Users/Vladimir_Filimonov/opt/anaconda3/envs/kedro-environment/lib/python3.8 │
│ /multiprocessing/spawn.py:116 in spawn_main │
│ │
│ 113 │ │ resource_tracker._resource_tracker._fd = tracker_fd │
│ 114 │ │ fd = pipe_handle │
│ 115 │ │ parent_sentinel = os.dup(pipe_handle) │
│ ❱ 116 │ exitcode = _main(fd, parent_sentinel) │
│ 117 │ sys.exit(exitcode) │
│ 118 │
│ 119 │
│ │
│ /Users/Vladimir_Filimonov/opt/anaconda3/envs/kedro-environment/lib/python3.8 │
│ /multiprocessing/spawn.py:126 in _main │
│ │
│ 123 │ │ try: │
│ 124 │ │ │ preparation_data = reduction.pickle.load(from_parent) │
│ 125 │ │ │ prepare(preparation_data) │
│ ❱ 126 │ │ │ self = reduction.pickle.load(from_parent) │
│ 127 │ │ finally: │
│ 128 │ │ │ del process.current_process()._inheriting │
│ 129 │ return self._bootstrap(parent_sentinel) │
╰──────────────────────────────────────────────────────────────────────────────╯
ModuleNotFoundError: No module named 'tests.runner'
Zirui Xu
11/02/2022, 4:28 PMEarl Hammond
11/02/2022, 6:31 PMkedro run
we see this set of warnings, the pipeline runs fine. Just wondering what Kedro is doing at this time:
WARNING: Something went wrong with getting the username to send to the Heap. Exception: [Errno 6] No such device or address
WARNING: Failed to send data to Heap. Exception of type 'ConnectionError was raised.
Thanks in advance!Allen Ma
11/03/2022, 5:59 AMkedro run
and
metadata = bootstrap_project(Path.cwd())
with KedroSession.create(metadata.package_name) as session:
session.run()
kedro run
can succeed but the second way can’tDebanjan Banerjee
11/03/2022, 9:37 AMkedro viz
, it failed with the error Error: No such command 'viz'
. This has never happened with me before , any ideas what might be causing this
Did we change the way viz is supposed to be called ?Debanjan Banerjee
11/03/2022, 10:12 AMDebanjan Banerjee
11/03/2022, 10:12 AMviveca
11/03/2022, 3:32 PMcatalog.yml
has filepath: "${configurable_filepath}"
and I’d like to do kedro run --params configurable_filepath:/path/to/file
. A similar question was asked previously https://linen-discord.kedro.org/t/2203662/Hi-all-I-have-a-beginner-question-on-Kedro-0-18-2-I-have-a-T with writing a custom TemplatedConfigLoader as solution: https://github.com/noklam/kedro_gallery/blob/master/template_config_loader_demo/src/template_config_loader_demo/settings.py
Is this the recommended approach or is there a way of achieving what I want without writing a custom TemplatedConfigLoader accessing private variables? Is there no other way to add all runtime parameters to the global dict? I’d really want to avoid that if possible in case a future kedro update changes things.Filip Panovski
11/03/2022, 3:35 PMdask.ParquetDataSet from s3 -> MemoryDataSet -> dask.ParquetDataSet to s3
I run this pipeline from my local workstation for testing purposes. My Dask Cluster is then deployed on AWS EC2 (Scheduler+Workers) and they communicate privately. I noticed that on the last node, the MemoryDataSet -> dask.ParquetDataSet to s3
causes the data to be transferred to my local machine where the Kedro pipeline is being run, and then transferred back to s3. Needless to say this introduces costs and lag and is not what I intended.
Can I tell the workers to write this data directly to s3? If not, what is the intended way to do this? I read through the documentation, and there is some very good information on getting the Pipeline to run as either step functions or on AWS Batch, but this is not quite the deployment flow I had in mind. Is it intended for the pipeline to be run on the same infrastructure where the workers are deployed?Seth
11/03/2022, 3:56 PMEarl Hammond
11/03/2022, 6:16 PMds.ds1
and ds.ds2
viveca
11/04/2022, 8:03 AMfilepath
of the catalog entry as a parameter to kedro run
but according to my other discussion with @datajoely this is not allowed in kedro by design. Has anyone else used kedro this way or should I just skip kedro for inference or similar type of pipelines with varying input?Allen Ma
11/04/2022, 1:55 PMsession.run()
I got the following error message:
22/11/04 21:25:59 ERROR SparkUI: Failed to bind SparkUI
java.net.BindException: Failed to bind to /0.0.0.0:9016: Service 'SparkUI' failed after 16 retries (starting from 9000)! Consider explicitly setting the appropriate port for the service 'SparkUI' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
but I use kedro run
It workEduardo Lopez
11/04/2022, 4:23 PMJonathan Javier Velásquez Quesquén
11/06/2022, 6:09 PMJonathan Javier Velásquez Quesquén
11/06/2022, 6:16 PMSean Westgate
11/07/2022, 3:08 PMkedro build-docs
effectively? The Spaceflights tutorial is pretty minimalist and works in that you can see the pipelines and nodes, but how is this used to document parameters or inputs and outputs in greater detail? Thank you!user
11/07/2022, 5:48 PMuser
11/08/2022, 7:58 AMSafouane Chergui
11/08/2022, 9:38 AMJordan
11/08/2022, 12:26 PMJohn Melendowski
11/09/2022, 1:51 AMgit
versions which I'm assuming is for the project management features kedro supplies...or cookiecutter
which needs to be downgraded from the latest anaconda releaseYuchu Liu
11/09/2022, 12:25 PMkedro jupyter notebook
. When I try to load it from the terminal, in a virtual environment I setup for kedro, it tries to load extension kedro,ipython
from a wrong version of Python. As a result, I don't have any kedro specific commas in jupyter notebook. Here is the warning I get when loading a notebook.
[I 13:20:49.212 NotebookApp] Kernel started: 21bb83e7-2e5f-4463-a43e-23744ec3ed02, name: kedro_nfr_transactions
[IPKernelApp] WARNING | Error in loading extension: kedro.ipython
Check your config files in /Users/yuchu_liu/.ipython/profile_default
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/logging/config.py", line 544, in configure
formatters[name] = self.configure_formatter(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/logging/config.py", line 676, in configure_formatter
c = _resolve(cname)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/logging/config.py", line 90, in _resolve
found = __import__(used)
ModuleNotFoundError: No module named 'pythonjsonlogger'
I have tried to load kedro from ipython in the terminal using, and it works perfectly fine.
%load_ext kedro.extras.extensions.ipython
%reload_kedro .
Does anyone know how to debug this issue? Thank you!Luis Gustavo Souza
11/09/2022, 12:26 PMRosh
11/09/2022, 2:41 PMkedro-airflow
with Spark on GCP? We are wanting to understand how Kedro would work with Spark on GCP Composer and if there is any integration for this that's already available, checked this GitHub issue here but couldn't find anything further : https://github.com/quantumblacklabs/kedro-airflow/issues/65