Elior Cohen
11/09/2022, 3:01 PM%> kedro ipython
>>> catalog.load('companies')
%load_ext kedro.ipython
%reload_kedro path/to/my/project
~Executing this results in the attached stacktrace
What am I doing wrong?~Hervé Lauwerier
11/09/2022, 3:25 PMAVallarino
11/09/2022, 6:27 PMZihao Xu
11/09/2022, 9:48 PMkedro viz
experiment tracking: https://kedro.readthedocs.io/en/stable/tutorial/set_up_experiment_tracking.html#set-up-your-nodes-and-pipelines-to-log-metrics.
But I keep getting the error “You don’t have any experiments” within kedro viz
view.
Here are a few observations:
1. Within the spaceflight starter, I do not see the file src/settings.py
, but had to create it myself and pasted in the specified content for SQLiteStore
2. The tutorial mentions “proceed to set up the tracking datasets or set up your nodes and pipelines to log metrics; these two activities are interchangeable.” But I had to implement both steps to make it work.
3. After a few kedro run
, I do not see session_store.db
appearing within 09_tracking
folder, which could be the reason for my error?
Any insights would be greatly appreciated! Thanks team!Amala
11/09/2022, 10:29 PMCyril Verluise
11/10/2022, 1:49 PMoptimisation_programme:
type: yaml.YAMLDataSet
filepath: data/test/05_model_input/optimisation_programme.yaml
layer: model_input
However, it fails with the following error:
DataSetError: Failed while saving data to data set YAMLDataSet(filepath=/Users/cyril_verluise/Documents/GitHub/ClimaTeX/dist/apps/alhambra/rendered/alhambra/data/test/05_model_input/optimisation_programme.yaml, protocol=file,
save_args={'default_flow_style': False}).
'str' object has no attribute '__name__'
Expected behaviour
From the doc, I understood that the save function was just a wrapper around yaml.dump which should work with my `optimisation_programme`(dict)
kedro version
0.18.3
Any idea?Zihao Xu
11/10/2022, 2:32 PMkedro viz
. When I try to re-open kedro viz
for the second time, after using “^z” to terminate the 1st use, I often receive an error message like the following. Is there anything I should do differently to prevent this from happening (e.g., a different way to terminate)?
[Errno 48] error while attempting to bind on address ('127.0.0.1', 4141): server.py:156
address already in use
Hervé Lauwerier
11/10/2022, 4:51 PMnot found in the DataCatalog
user
11/11/2022, 9:18 AMIan Whalen
11/11/2022, 3:19 PMParallelRunner
: is there a way to supply the number of processes to use from command line? Couldn’t find anything here.Andrew Stewart
11/11/2022, 4:47 PMAlicja Fras
11/14/2022, 11:05 AMSasha Collin
11/15/2022, 6:35 PMZemeio
11/16/2022, 5:11 AMSafouane Chergui
11/16/2022, 10:07 AMSean Westgate
11/16/2022, 10:24 AMkedro-viz/readme
it suggests using a standalone React component with the pipeline.json
as prop. I tried it, but couldn't get it to work - I am not familiar with React. Do you have a simple example using a static website or do you need a proper React environment?
Many thanksMate Scharnitzky
11/16/2022, 10:33 AMkedro
project locally and the team we’re supporting has an Amazon EMR
spark cluster environment. We would like to be able to run this kedro
project on EMR, but we’re struggling to create a virtual environment, install requirements into that env…etc. given the security protocols. Do you have any recommendation/pointers on what steps we need to take to be able to run this kedro project on EMR? I didn’t find a supporting document in the official documentation .Safouane Chergui
11/16/2022, 3:37 PMTemplatedConfigLoader(
conf_paths,
globals_pattern="(globals*)|(another_parameters_file*)",
globals_dict={"param1": "pandas.CSVDataSet"}
)
How can I accomplish this ?Filip Panovski
11/17/2022, 10:03 AMyaml
Loader part of the ConfigLoader
somewhat configurable in any meaningful way? Or does kedro implement its own yaml
parsing mechanism? We're trying to use some custom filtering that gets passed to the kedro.extras.datasets.dask.ParquetDataSet
load_args
. Specifically, we want to be able to do something like:
# catalog.yml
raw_data:
type: dask.ParquetDataSet
filepath: 's3://...'
load_args:
filters:
- !!python/tuple ['year', '=', '2022']
- !!python/tuple ['day', '=', '3']
- !!python/tuple ['id', '=', 'someVal']
dask
(via filters
, see docs) supports row-filtering on loaded data via this way and yaml
(via tuple support in .yml
files) supports the above definition. However, yaml
unfortunately supports this using either the non-default FullLoader
or the UnsafeLoader
(for controlled environments, see here). Is it possible to configure the ConfigLoader
to use either of these?
An example use case for this would be to filter only the rows belonging to all day = 3
partitions of any month in year = 2022
.
I could alternatively write a DataSet that parses this logic from plain string lists, but I was wondering if there's any existing support for something like this.Safouane Chergui
11/17/2022, 10:19 AMkedro run
as an input to a node ?
Here is a quick example:
• Kedro run command: kedro run --pipeline my_pipeline --params first_param:first_value
• I’d like to use first_param as an input to a node without having to put it in parameters.yml just to use it as an input to my node. If not, is there a way to use it directly into code ?
Pipeline([
node(
do_something,
inputs="first_param",
outputs="some_output"
)
])
Thanks 👍Debanjan Banerjee
11/17/2022, 11:12 AMCustomDataSet
and strangely enough, it works when i invoke kedro run --xxxxx
from terminal but when i try to do catalog.load(xxxx)
in ipython
or kedro jupyter
, it fails and it raised the famous DataSet error : Dataset is not installed
here is my catalog definition :
ft_spine_prd :
type : project_name.extras.datasets.dataset_filename.DataSetClass
dataset_args :
arg1
....
Debanjan Banerjee
11/17/2022, 11:13 AMDebanjan Banerjee
11/17/2022, 1:32 PMPanos P
11/17/2022, 11:17 PMpipelines:
- test1
- test2
I want to return 2 pipelines like this:
pipes = []
for pipeline in pipelines:
pipes.append(Pipeline([
node(do_something, [f"params:{pipeline}"], [f"output_{pipeline}"], tags=pipeline)
]))
In the older versions of Kedro I was able to get the params before the creation of pipelines and then work from there.
Like this:
def get_kedro_env() -> str:
"""Get the kedro --env parameter or local
Returns:
The kedro --env parameter
"""
return os.getenv("KEDRO_ENV", "local")
def _get_config() -> ConfigLoader:
"""Get the kedro configuration context
Returns:
The kedro configuration context
"""
try:
return get_current_session().load_context().config_loader
except Exception: # NOQA
env = get_kedro_env()
return ConfigLoader(["./conf/base", f"./conf/{env}"])
def get_params() -> Dict[str, Any]:
"""Get all the parameter values from the parameters.yml as a dictionary.
Returns:
The parameter values
"""
return _get_config().get("parameters*", "parameters*/**", "**/parameters*")
It seems like in kedro 0.18.3 There is no more load_context()
any thoughts?Luis Cano
11/18/2022, 4:53 AMs3_path/train_outputs/
├── 202201.pkl
├── 202202.pkl
├── 202203.pkl
├── 202204.pkl
...
so on.
Is there a way of saving pickles this way? or maybe what would be a better way of doing this?
any thoughts?Jo Stichbury
11/18/2022, 10:12 AMVaibhav
11/18/2022, 12:05 PMproject_name
refers to ‘name of the project’ where kedro is being used, whereas project_version
refers to the the ‘version of kedro’ being used instead of the project version which is defined in project_name
kedro validates version match as part of the checks.
Is this by design that the two attributes refer to different project
entities ?MarioFeynman
11/18/2022, 1:28 PMDebanjan Banerjee
11/18/2022, 1:49 PMcredentials.yml
from /local
) and --env argument that we can specify to dictate the directionality of the pipeline.
We know that if we specify our --env , kedro "ignores" the /base and /local and overrides to whatever we pass to the --env.
Shouldn't Kedro do a search for **credentials.yml separately (not with parameters.yml/globals.yml) ?? Every user will have their own credentials so shouldnt we make kedro check for credentials.yml separately ? say only in /locals and not in --env ?Doug Warner
11/20/2022, 12:45 PM