Bernardo Branco
05/15/2023, 7:36 PMJuan Luis
05/16/2023, 7:53 AM${name:value}
strings are passed to the dataset. I can elaborate more if neededDotun O
05/16/2023, 8:18 PMAfaque Ahmad
05/17/2023, 3:41 AM0.16.6
and using the load_context
to get params
, "credentials*", "credentials*/**"
. We're upgrading Kedro
to 0.18.8
and it seems load_context
is no more accessible. How can I replicate the same functionality in the Kedro 0.18.8?Afaque Ahmad
05/18/2023, 11:35 AMspark.SparkDataSet
to load and save datasets to a delta
lake. I need to save incremental
numeric versions of data e.g 1, 2, .. as opposed to the current timestamp. Is there a way to do that in the current implementation and specify the version number while loading?Andrej Zachar
05/18/2023, 3:59 PMpython
node(
predict,
inputs=["classifier_flaml:<version_ideally_from_params>", "X_src"],
outputs=["y_src_pred", "y_src_pred_proba"],
),
Can you provide guidance on how to accomplish this task?
Thank you.Jose Nuñez
05/18/2023, 11:31 PMf
that takes a dataframe as input, makes some stuff and output a python dict
.
Is there any way to save that dict in the data catalog?
.
as a workaround I was saving it as a pandas csv and later transforming it back to a dict. but I'm tired of doing that.
Thanks in advance 😄Muhammad Ghazalli
05/19/2023, 2:33 AMMatthias Roels
05/19/2023, 6:59 AMLuis Cano
05/19/2023, 3:29 PMSneha Kumari
05/19/2023, 6:11 PM/opt/anaconda3/envs/frontline/lib/python3.8/site-packages/kedro_mlflow/framework/cli/cli.py:161 │
│ in ui
│ 158 │ ) as session:
│ 159 │ │
│ 160 │ │ context = session.load_context()
│ ❱ 161 │ │ host = host or context.mlflow.ui.host
│ 162 │ │ port = port or context.mlflow.ui.port
│ 163 │ │
│ 164 │ │ if context.mlflow.server.mlflow_tracking_uri.startswith("http"): │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'KedroContext' object has no attribute 'mlflow'
Python version: 3.8.16
Kedro: 0.18.6
kedro-mlflow: 0.11.8noam
05/21/2023, 2:02 PMconf/local/parameters.yml
) during a run, the results of the run may be affected.
For example, let's say I set hyper_tune: False
in parameters.yml,
and run kedro run
in the terminal. If I change the text in parameters.yml
to hyper_tune: True
(for example, if I am setting up the parameters for my next experiment) before the "training" node begins executing, it appears that Kedro will then read hyper_tune: True. In this example, that would mean that Kedro would execute hyperparameter tuning (despite being instructed not to do so at the beginning of the run).
Am I missing something? Is the answer as simple as passing all parameters to the pipeline one time as a whole (i.e. using a before_pipeline_runs hook) rather than to each node?fmfreeze
05/22/2023, 12:51 PMdef create_pipeline(**kwargs) -> Pipeline:
return pipeline([
node(func=do_stuff, inputs=[], outputs='MyMemDS'),
node(func=do_more_stuff, inputs=['MyMemDS'], outputs='SecondMemDS')
])
I thought my conf/base/catalog.yml
needs the entries:
MyMemDS:
type: MemoryDataSet
SecondMemDS:
type: MemoryDataSet
But when I run the pipeline - which works, also with kedro-viz - it does not utilize catalog.yml
entries at all.
The output of my first node is an empty {}
dictionary and if I rename or delete the entries in catalog.yml
it "works" like before and the first node returns an empty dictionary.
Do I need to register the catalog anywhere? I simply want to access the object which is returned by my do_stuff()
function.
What am I missing out?Juan Luis
05/23/2023, 6:00 AMConfigLoader
and OmegaConfigLoader
. while following the standalone-datacatalog starter, I notice that
ConfigLoader("conf").get("catalog.yml")
works, but
OmegaConfigLoader("conf").get("catalog.yml")
returns None
. on the other hand, OmegaConfigLoader("conf").get("catalog")
seems to work (notice no .yml
extension), and OmegaConfigLoader("conf")["catalog"]
works consistently for both config loaders.
is this intentional? compare for example https://github.com/kedro-org/kedro/blob/41f03d9/tests/config/test_config.py#L116 with https://github.com/kedro-org/kedro/blob/41f03d9/tests/config/test_omegaconf_config.py#L149Richard Bownes
05/23/2023, 8:05 AMAfaque Ahmad
05/23/2023, 9:12 AMPipelineHooks
and MLFlowHooks
and both have before_pipeline_run
. I need the hook before_pipeline_run
defined in PipelineHooks
to run before the one in MLFlowHooks
. I've specified this order below in settings.py
, but it doens't work:
HOOKS = (
PipelineHooks(),
MLFlowHooks(),
)
Is there any way to keep an order of execution?Debanjan Banerjee
05/23/2023, 2:43 PMGuilherme Parreira
05/24/2023, 12:15 PMjupyter notebook
but it gives me the following error:
%load_ext kedro.ipython
RuntimeError: Missing required keys ['project_version'] from 'pyproject.toml'.
Kedro was working fine for the last 2 weeks. I didn't do any update on kedro
.
In requirements
I have kedro~=0.18.6
In Pipfile.lock
I have kedro ==0.18.4
In project.toml
I have:
[tool.kedro]
package_name = "cashflow_ml"
project_name = "cashflow-ml"
kedro_init_version = "0.18.6"
[tool.isort]
profile = "black"
[tool.pytest.ini_options]
addopts = """
--cov-report term-missing \
--cov src/cashflow_ml -ra"""
[tool.coverage.report]
fail_under = 0
show_missing = true
exclude_lines = ["pragma: no cover", "raise NotImplementedError"]
I tried to change the kedro_init_version
to 0.18.4
but I still got the same error.
Does someone have a clue on it?Guilherme Parreira
05/24/2023, 1:00 PMprophet
package last night. But it shouldn't modify my pyproject.toml
Thank you so much. It saved my day.
Andreas_Kokolantonakis
05/25/2023, 1:23 PMHugo Evers
05/25/2023, 2:10 PMreturn pipeline(
[
node(
func=rename_columns,
inputs="pretraining_set",
outputs="renamed_df",
name="rename_columns",
),
node(
func=truncate_description,
inputs="renamed_df",
outputs="truncated_df",
name="truncate_description",
),
node(
func=drop_duplicates,
inputs="truncated_df",
outputs="deduped_df",
name="drop_duplicates",
),
node(
func=pad_zeros,
inputs="deduped_df",
outputs="padded_df",
name="pad_zeros",
),
node(
func=filter_0000,
inputs="padded_df",
outputs="filtered_df",
name="filter_0000",
),
node(
func=clean_description,
inputs="filtered_df",
outputs="cleaned_df",
name="clean_description",
),
node(
func=concat_title_description,
inputs="cleaned_df",
outputs="concatenated_df",
name="concat_title_description",
),
]
)
However, on AWS batch these would be run on separate containers, I now use the cloudpickle dataset to facilitate this, but it is actually not neccesary when i use something like dask.
I could also instead run this pipeline like this:
return (
df.pipe(rename_columns)
.pipe(truncate_description)
.pipe(drop_duplicates)
.pipe(pad_zeros)
.pipe(filter_0000)
.pipe(clean_description)
.pipe(concat_title_description)
)
The aforementioned pipeline has tags, and filtering in a modular pipeline depending on pre-training, tuning, which language, etc.
The flatten pipeline would be nice to use in the case of kedro run runner=… concat_pipeline=true, or something like that.
Is this idea worth exploring? It is really not essential, i can work around it, but the ability to have pipelines that can “fold” like this is quite appealing.Hadeel Mustafa
05/25/2023, 4:25 PMredshift-spark
in kedro before? appreciate the help if someone can show me an example on how can this be done, specifically the driver used for redshift.
Thanks in advance!Higor Carmanini
05/25/2023, 10:31 PMPartitionedDataSet
to read many CSVs into Spark DataFrames. I just found an issue where, apparently, Spark automatically appends the column position to the column name (as read from the header) to create the actual final name. See example in image.
As this sometimes is done for deduplications, I investigated whether this was something close, and sure enough there is another dataset in this same PartitionedDataSet
that reads another column of the same name. This could "explain" this funky behavior of Spark of thinking it is a duplicate. Of course, though, these are two separate DataFrames.
Has anyone stumbled upon this issue before? I can't find any references online. Thank you!
EDIT: Solved! It was due to Spark's default setting of case insensitiveness.Sidharth Jahagirdar
05/25/2023, 11:06 PMRebecca Solcia
05/26/2023, 8:13 AMfmfreeze
05/26/2023, 12:24 PMAbstractDataSet
, kedro-viz
does not display the Dataset Type and the File Path property in the details section for that Dataset.
How can I make them show up?fmfreeze
05/26/2023, 12:47 PMnode
has a (dynamic) Parameter
as output?
E.g. I have multiple "normal" parameters defined which serve as input into a process_params
node.
That node should - dependend on the normal parameter inputs - output a single parameter which might serve as input to other nodes.
Currently - by simply outputting that "parameter" value - this is by default a MemoryDataSet
Artur Dobrogowski
05/30/2023, 8:15 AMValidationError: 1 validation error for KedroMlflowConfig
tracking -> disable_tracking -> pipelines
value is not a valid list (type=type_error.list)
While the config looks like this:
tracking:
disable_tracking:
pipelines: []
Artur Dobrogowski
05/30/2023, 8:24 AMFlorian d
05/30/2023, 1:48 PMbefore_pipeline_run
hook? In some cases it would be good to have access to the loaded config at that point.