Leslie Wu
06/21/2023, 5:55 PMkedro viz
to work within Amazon SageMaker Studio? I am in the terminal of a studio instance.Jonah Blumstein
06/21/2023, 10:07 PM%load_ext kedro.ipython
for the first time in a notebook?
basically this but in a notebook environment: https://github.com/kedro-org/kedro/issues/1640Mate Scharnitzky
06/22/2023, 2:05 PMkedro==0.18.3
that pins pytest~=6.2
which is conflict with pandera
, a new dependency we want to introduce. kedro==0.18.5
has already pytest~=7.2
, so we’re not far from resolving this conflict.
On the other hand, in order to upgrade to a higher Kedro version, we would need to change our custom JinjaTemplatedConfigLoader
that inherits from AbstractConfigLoader
, as both 0.18.4
and 0.18.5
introduced changes to configuration management, the latter OmegaConf
specifically. Also, 0.18.6
fixes some regressions in 0.18.5
.
Question: based on the above context, what Kedro would you suggest for us to upgrade to?
• It seems at least we need to go to 0.18.6
but maybe we can aim all the way to 0.18.10
?
• Also, do you have a migration guide about how to migrate from a custom config loader to OmegaConf bearing in mind that we need to use multi-runner
as well?
Thank you!
@Kasper Janehag, @Jaskaran Singh SidanaLucas Hattori
06/22/2023, 3:10 PMkedro-mlflow
for the first time in a project. Would this be also a appropriate theme for questions here? 😅
if so, regarding parameters, my kedro project has a lot of parameters. Many of them are not crucial for me to be logged in mlflow experiments. How can I easily select which parameters I’d like to have it logged? I have an idea on how to it if I were to build the mlflow hooks from scratch, but I’d love to leverage kedro-mlflow
for simplicityCamilo López
06/23/2023, 1:56 AMManagedTableDataSet
with Databricks Unity Catalog and I didn't find a way to store tables on a external location (ABFS Azure). There's a way of storing a external table with pure-spark : df.write.mode(mode).option("path", table_path).saveAsTable(f"{catalog_name}.{schema_name}.{table_name}")
, where table_path
its the path for the external location like <abfss://container@storage_account.dfs.core.windows.net/raw>
.
There's a way to pass this path to the ManagedTableDataSet
when saving the data? Or should I go and create a CustomManagedTableDataSet
with this capability?Sivasubramanian.S
06/23/2023, 4:02 AMMarc Gris
06/23/2023, 7:58 AMdata_processing_node
and my model_training_node
have conflicting dependencies.
How would you handle such a (unfortunately common) situation ?
I know that in MLFlow it is possible to have task-specific-venv…
Does kedro offer such a possibility ?
If not, how would could one circumvent the issue ? 🙂
Many thanks in advance,
M.Artur Dobrogowski
06/23/2023, 1:19 PM${oc.env:SOME_VAR}
, but how to use it to interpolate with parameters defined in params.yml
?
let's say I have param some_var=42
and I want to make it fall back to it when there is no env. Is this correct? ${oc.env:SOME_VAR, ${some_var}}
?Panos P
06/23/2023, 4:44 PMkedro.config.config - INFO - Config from path "/conf/dev" will override the following existing top-level config keys
This messages appear for about 30 minutes before the kedro pipeline even runs.
Do you have any ideas, recommendations of speeding this up?Hoàng Nguyễn
06/26/2023, 5:07 PMAndreas_Kokolantonakis
06/27/2023, 8:06 AMZemeio
06/27/2023, 8:53 AMHugo Evers
06/27/2023, 2:01 PMsample_size
to a config):
node(
func=train_test_split,
inputs={"df": "input", "sample_size": 50},
...
),
However, this doesn’t seem to work and I get an error refering to a separator error.. I noticed that in the modular pipeline, a similar syntax is allowed.
Is that on purpose?
What does work is:
node(
func=lambda df: train_test_split(df, sample_size=50),
inputs="input",
...
)
Alina Glukhonemykh
06/27/2023, 4:33 PMDataSetError: Save path '.../data/08_reporting/init/metrics.json/2023-06-27T16.17.18.857Z/metrics.json' for MetricsDataSet(filepath=.../data/08_reporting/init/metrics.json, protocol=file,
save_args={'indent': 2}, version=Version(load=None, save='2023-06-27T16.17.18.857Z')) must not exist if versioning is enabled.
Here is how I define the file in catalog:
metrics:
type: tracking.MetricsDataSet
filepath: data/08_reporting/init/metrics.json
Marc Gris
06/28/2023, 11:13 AMnode(pre_process,
inputs = ['dataset', 'params:pre_process'],
outputs = "pre_processed_dataset")
Kedro will pass 'params:pre_process'
as a dict to pre_process
which results in a bit of an “opaque” function’s signature:
def pre_process(df: pd.DataFrame, params: dict): ...
Is there a “kedro way” of unpacking this dict and therefore have more “transparent” signature, with individual params specified ?
Thx
MSebastian Cardona Lozano
06/28/2023, 9:28 PMHugo Evers
06/29/2023, 8:23 AMdata_dir
as input, instead of filepath
, it took me roughly an hour of debugging to figure out why i loading the dataset was now dependant on the current working directory, and just wouldn;t load if i gave it a relative path (data/01_raw/..) instead of workspace/project_name/data/01_raw/….
Anyway, the issue was that filepath has a (buried) custom resolver in AbstractDataSet baseclass.
So would it be a good idea to add to the docs for custom datasets that filepath
has that behaviour, and maybe we could add an example of a how to make a FolderDataset. since all the current datasets in kedro-datasets point to specific files, but i’d wager there are folks out there who would want to read an entire folders’ worth of data.Balazs Konig
06/29/2023, 11:22 AMfolder1
projects
project1
conf
data
src
folder2
data
And I want to save to folder2/data
- when I try relative paths, it seems to append that to folder1/projects/project1/<relative_path>
(as in, it adds the dots to the path as well).
How can I achieve this?Harry Vargas Rodríguez
06/29/2023, 1:50 PMmodel = pickle.load(open('models/model.pkl','rb'))
This is how my catalog looks like
best_model:
type: pickle.PickleDataSet
filepath: models/model.pkl
layer: models
It works just fine after I load kedro using %load_ext kedro.ipython
Thanks in advance for your helpAhmed Alawami
06/30/2023, 7:49 AMdate_parser
in the catalog. Is there a way to specify a lambda function in the YAML file?Markus Sagen
06/30/2023, 12:32 PMMarkus Sagen
06/30/2023, 2:00 PMEmilio Gagliardi
06/30/2023, 11:36 PMMarkus Sagen
07/01/2023, 8:12 AMinstall
and test
listed in the docs here are deprecated. Is there a preferred place to report issues or add fixes to the docs?
https://docs.kedro.org/en/stable/development/set_up_vscode.html#setting-up-tasksChoon Ho Loi
07/03/2023, 2:34 PMHugo Evers
07/03/2023, 3:46 PMEmilio Gagliardi
07/03/2023, 6:39 PMdef register_pipelines() -> Dict[str, Pipeline]:
"""Register the project's pipelines.
Returns:
A mapping from pipeline names to ``Pipeline`` objects.
"""
pipelines = find_pipelines()
pipelines["__default__"] = sum(pipelines.values())
return pipelines
However, in the spaceflights tutorial videos I'm watching, the host doesn't use the above code. instead they add the following:
data_processing_pipeline = dp.create_pipeline()
return{"__default__": data_processing_pipeline,
"dp":data_processing_pipeline}
So I'm unclear what I'm supposed to do for my own project. Do I just use the sum(pipelines.values()) or do I manually add pipelines as in the second block? THanks kindly,Hugo Evers
07/04/2023, 12:54 PMrun_viz
command almost begs for the ability to do run_viz(pipeline)
. Where pipeline is an actual pipeline object. (although it would also be nice as to be able to pass the name of a pipeline to filter like the CLI command w.r.t issue 1).
This way, one doesn’t need nbdev (which is a slightly controversial tool), one can develop pipelines easier, without any adjustment to the original project.
Since kedro viz can already filter, i can imagine such changes being possible.
Also, debugging kedro pipelines from the vscode notebook cell debugger is actually quite nice (i would arque a lot nicer than using the debug configs).
Has anyone faced similar issues, or thought of a different solution?Emilio Gagliardi
07/04/2023, 5:38 PMMarc Gris
07/05/2023, 10:20 AMconf/base/parameters.yml
I have
model:
init:
k: 3
loss: warp
no_embeddings: 50
learning_schedule: adagrad
rho: 0.95
epsilon: 1.e-6
random_state: ${random_state}
How can I “update” a single specific field “locally”
I’ve first tried in conf/local/parameters.yml
model:
init:
no_embeddings: 100
But this actually completely over-writes the model section and, of course, breaks everything.
Granted: I could cp conf/base/parameters.yml conf/local/parameters
and then update no_embeddings
But this ends up being very “noisy”, not really “highlighting” the specificities of the local config…
Is there a way to do such local / “surgical over-write” ?
Thx 🙂