Bernardo Branco05/15/2023, 7:36 PM
Juan Luis05/16/2023, 7:53 AM
strings are passed to the dataset. I can elaborate more if needed
Dotun O05/16/2023, 8:18 PM
Afaque Ahmad05/17/2023, 3:41 AM
and using the
. We're upgrading
and it seems
is no more accessible. How can I replicate the same functionality in the Kedro 0.18.8?
Afaque Ahmad05/18/2023, 11:35 AM
to load and save datasets to a
lake. I need to save
numeric versions of data e.g 1, 2, .. as opposed to the current timestamp. Is there a way to do that in the current implementation and specify the version number while loading?
Andrej Zachar05/18/2023, 3:59 PM
Can you provide guidance on how to accomplish this task? Thank you.
python node( predict, inputs=["classifier_flaml:<version_ideally_from_params>", "X_src"], outputs=["y_src_pred", "y_src_pred_proba"], ),
Jose Nuñez05/18/2023, 11:31 PM
that takes a dataframe as input, makes some stuff and output a python
. Is there any way to save that dict in the data catalog? . as a workaround I was saving it as a pandas csv and later transforming it back to a dict. but I'm tired of doing that. Thanks in advance 😄
Muhammad Ghazalli05/19/2023, 2:33 AM
Matthias Roels05/19/2023, 6:59 AM
Luis Cano05/19/2023, 3:29 PM
Sneha Kumari05/19/2023, 6:11 PM
Python version: 3.8.16 Kedro: 0.18.6 kedro-mlflow: 0.11.8
/opt/anaconda3/envs/frontline/lib/python3.8/site-packages/kedro_mlflow/framework/cli/cli.py:161 │ │ in ui │ 158 │ ) as session: │ 159 │ │ │ 160 │ │ context = session.load_context() │ ❱ 161 │ │ host = host or context.mlflow.ui.host │ 162 │ │ port = port or context.mlflow.ui.port │ 163 │ │ │ 164 │ │ if context.mlflow.server.mlflow_tracking_uri.startswith("http"): │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ AttributeError: 'KedroContext' object has no attribute 'mlflow'
noam05/21/2023, 2:02 PM
) during a run, the results of the run may be affected. For example, let's say I set
in the terminal. If I change the text in
(for example, if I am setting up the parameters for my next experiment) before the "training" node begins executing, it appears that Kedro will then read hyper_tune: True. In this example, that would mean that Kedro would execute hyperparameter tuning (despite being instructed not to do so at the beginning of the run). Am I missing something? Is the answer as simple as passing all parameters to the pipeline one time as a whole (i.e. using a before_pipeline_runs hook) rather than to each node?
fmfreeze05/22/2023, 12:51 PM
I thought my
def create_pipeline(**kwargs) -> Pipeline: return pipeline([ node(func=do_stuff, inputs=, outputs='MyMemDS'), node(func=do_more_stuff, inputs=['MyMemDS'], outputs='SecondMemDS') ])
needs the entries:
But when I run the pipeline - which works, also with kedro-viz - it does not utilize
MyMemDS: type: MemoryDataSet SecondMemDS: type: MemoryDataSet
entries at all. The output of my first node is an empty
dictionary and if I rename or delete the entries in
it "works" like before and the first node returns an empty dictionary. Do I need to register the catalog anywhere? I simply want to access the object which is returned by my
function. What am I missing out?
Juan Luis05/23/2023, 6:00 AM
. while following the standalone-datacatalog starter, I notice that
. on the other hand,
seems to work (notice no
works consistently for both config loaders. is this intentional? compare for example https://github.com/kedro-org/kedro/blob/41f03d9/tests/config/test_config.py#L116 with https://github.com/kedro-org/kedro/blob/41f03d9/tests/config/test_omegaconf_config.py#L149
Richard Bownes05/23/2023, 8:05 AM
Afaque Ahmad05/23/2023, 9:12 AM
and both have
. I need the hook
to run before the one in
. I've specified this order below in
, but it doens't work:
Is there any way to keep an order of execution?
HOOKS = ( PipelineHooks(), MLFlowHooks(), )
Debanjan Banerjee05/23/2023, 2:43 PM
Guilherme Parreira05/24/2023, 12:15 PM
but it gives me the following error:
Kedro was working fine for the last 2 weeks. I didn't do any update on
RuntimeError: Missing required keys ['project_version'] from 'pyproject.toml'.
I have kedro
I tried to change the
[tool.kedro] package_name = "cashflow_ml" project_name = "cashflow-ml" kedro_init_version = "0.18.6" [tool.isort] profile = "black" [tool.pytest.ini_options] addopts = """ --cov-report term-missing \ --cov src/cashflow_ml -ra""" [tool.coverage.report] fail_under = 0 show_missing = true exclude_lines = ["pragma: no cover", "raise NotImplementedError"]
but I still got the same error. Does someone have a clue on it?
Guilherme Parreira05/24/2023, 1:00 PM
package last night. But it shouldn't modify my
Thank you so much. It saved my day.
Andreas_Kokolantonakis05/25/2023, 1:23 PM
Hugo Evers05/25/2023, 2:10 PM
However, on AWS batch these would be run on separate containers, I now use the cloudpickle dataset to facilitate this, but it is actually not neccesary when i use something like dask. I could also instead run this pipeline like this:
return pipeline( [ node( func=rename_columns, inputs="pretraining_set", outputs="renamed_df", name="rename_columns", ), node( func=truncate_description, inputs="renamed_df", outputs="truncated_df", name="truncate_description", ), node( func=drop_duplicates, inputs="truncated_df", outputs="deduped_df", name="drop_duplicates", ), node( func=pad_zeros, inputs="deduped_df", outputs="padded_df", name="pad_zeros", ), node( func=filter_0000, inputs="padded_df", outputs="filtered_df", name="filter_0000", ), node( func=clean_description, inputs="filtered_df", outputs="cleaned_df", name="clean_description", ), node( func=concat_title_description, inputs="cleaned_df", outputs="concatenated_df", name="concat_title_description", ), ] )
The aforementioned pipeline has tags, and filtering in a modular pipeline depending on pre-training, tuning, which language, etc. The flatten pipeline would be nice to use in the case of kedro run runner=… concat_pipeline=true, or something like that. Is this idea worth exploring? It is really not essential, i can work around it, but the ability to have pipelines that can “fold” like this is quite appealing.
return ( df.pipe(rename_columns) .pipe(truncate_description) .pipe(drop_duplicates) .pipe(pad_zeros) .pipe(filter_0000) .pipe(clean_description) .pipe(concat_title_description) )
Hadeel Mustafa05/25/2023, 4:25 PM
in kedro before? appreciate the help if someone can show me an example on how can this be done, specifically the driver used for redshift. Thanks in advance!
Higor Carmanini05/25/2023, 10:31 PM
to read many CSVs into Spark DataFrames. I just found an issue where, apparently, Spark automatically appends the column position to the column name (as read from the header) to create the actual final name. See example in image. As this sometimes is done for deduplications, I investigated whether this was something close, and sure enough there is another dataset in this same
that reads another column of the same name. This could "explain" this funky behavior of Spark of thinking it is a duplicate. Of course, though, these are two separate DataFrames. Has anyone stumbled upon this issue before? I can't find any references online. Thank you! EDIT: Solved! It was due to Spark's default setting of case insensitiveness.
Sidharth Jahagirdar05/25/2023, 11:06 PM
Rebecca Solcia05/26/2023, 8:13 AM
fmfreeze05/26/2023, 12:24 PM
does not display the Dataset Type and the File Path property in the details section for that Dataset. How can I make them show up?
fmfreeze05/26/2023, 12:47 PM
has a (dynamic)
as output? E.g. I have multiple "normal" parameters defined which serve as input into a
node. That node should - dependend on the normal parameter inputs - output a single parameter which might serve as input to other nodes. Currently - by simply outputting that "parameter" value - this is by default a
Artur Dobrogowski05/30/2023, 8:15 AM
While the config looks like this:
ValidationError: 1 validation error for KedroMlflowConfig tracking -> disable_tracking -> pipelines value is not a valid list (type=type_error.list)
tracking: disable_tracking: pipelines: 
Artur Dobrogowski05/30/2023, 8:24 AM
Florian d05/30/2023, 1:48 PM
hook? In some cases it would be good to have access to the loaded config at that point.