Caroline Lei
03/08/2023, 9:09 AMconf/local/mlflow.yml
file. I am wondering if I can pass in the run name via kedro commend line. I tried kedro run --params=tracking.run.name:"test_name"
but it didn’t work.Juan Luis
03/08/2023, 10:42 AMkedro pipeline list
or similarOla Cupriak
03/08/2023, 4:28 PMSuryansh Soni
03/08/2023, 6:33 PMRyan Ng
03/09/2023, 8:37 AMJuan Luis
03/09/2023, 12:24 PMopenrepair-0_3-events-raw:
type: polars.CSVDataSet
filepath: data/01_raw/OpenRepairData_v0.3_aggregate_202210.csv
but if I try to load the data from a notebook in notebooks/
with this code:
conf_loader = ConfigLoader("../conf")
conf_catalog = conf_loader.get("catalog.yml")
catalog = DataCatalog.from_config(conf_catalog)
catalog.load("openrepair-0_3-events-raw")
then I get a "file not found" error. however, if I change the filepath:
to ../data/...
or I move the notebook one directory up or if I use the kedro.ipython
extension, the error goes away.
my aim is to show how to gradually move from non-Kedro to Kedro, and as an intermediate stage, I'm loading the catalog manually. I suppose there's some extra magic happening under the hood that properly resolves the paths?Juan Luis
03/09/2023, 12:37 PMdtypes
to the upcoming polars.CSVDataSet
, not sure if there's a way to specify non-primitive types in the catalog YAML? https://github.com/kedro-org/kedro-plugins/issues/124Ana Man
03/09/2023, 4:33 PMRebecca Solcia
03/09/2023, 4:55 PMSlackbot
03/09/2023, 5:28 PMBrandon Meek
03/09/2023, 5:55 PMAndrew Stewart
03/10/2023, 12:19 AMfrom mypipeline.__main__ import main
main()
..can anyone think of a reason why any custom datasets under mypipeline.extras.datasets.MyDataSet
would not be installed along with the wheel?
kedro.io.core.DataSetError: Class 'mypipeline.extras.datasets.MyDataSet' not found or one of its dependencies has not been installed.
Ana Man
03/10/2023, 1:22 PMOmegaConfigLoader
. Apart from adding CONFIG_LOADER_CLASS = OmegaConfigLoader
to the settings.py, what other minimum changes are needed in your project to use this loader? having issues with running it 'out the box' (btw relatively new to kedro ecosystem)Jorge sendino
03/10/2023, 4:55 PMRicardo Araújo
03/10/2023, 7:21 PMed johnson
03/10/2023, 9:47 PMkedro run --pipeline <pipeline_i>
commands defined inside a shell script, but i'm wondering if there is a better way perhaps using the run config.yml capability?Andrew Stewart
03/10/2023, 11:45 PMAthenaDataSet
dataset (or even just code reviewing) ?Sebastian Cardona Lozano
03/11/2023, 12:26 AMVersionNotFoundError: Did not find any versions for SparkDataSet(file_format=parquet,
filepath=<gs://bdb-gcp-cds-pr-ac-ba-analitica-avanzada/banca-masiva/599_profundizacion/data/05_model_input/master_model_input.pa>
rquet, load_args={'header': True, 'inferSchema': True}, save_args={}, version=Version(load=None,
save='2023-03-10T23.44.07.085Z'))
In the catalog.yml I have this:
master_model_input:
type: spark.SparkDataSet
filepath: <gs://bdb-gcp-cds-pr-ac-ba-analitica-avanzada/banca-masiva/599_profundizacion/data/05_model_input/master_model_input.parquet> #<gs://uri> de cloud storage
file_format: parquet
layer: model_input
versioned: True
load_args:
header: True
inferSchema: True
However, the parquet file is generated correctly in GCS (see the image attached).
Thanks for your help! 🙂Sebastian Cardona Lozano
03/11/2023, 1:09 AMrss
03/12/2023, 12:58 AMhttps://i.stack.imgur.com/KeeZJ.png▾
Rebecca Solcia
03/13/2023, 12:03 PM05_07_FocusDatasource_PKL:
type: kedro.extras.datasets.pickle.PickleDataSet
filepath: data/02_intermediate/05_07_FocusDatasource.pkl
But when I call catalog.load('05_07_FocusDatasource_PKL')
it tells me that it is a function
<function focus_pickle at 0x7fbff82af040>
Any suggestions on how I can load that dataset?Shubham Agrawal
03/13/2023, 2:31 PMRobertqs
03/14/2023, 6:30 AMMichal Szlupowicz
03/14/2023, 11:57 AMJan
03/14/2023, 1:09 PMAna Man
03/14/2023, 4:20 PMDharmesh Soni
03/14/2023, 5:25 PM├── main_folder.zip
│ ├── folder1
│ │ └── text_file.txt
│ └── text_file.txt
Walber Moreira
03/14/2023, 8:11 PMTom C
03/15/2023, 6:29 AMruns
table as a string instead of nested json. This is causing a error when attempting to visualise the runs in viz. I've created a ticket, but want to ask here for people who don't follow the issue boards.Jonas Kemper
03/15/2023, 12:03 PMdata_science:
active_modelling_pipeline:
model_options:
test_size: 0.2...
and I load it via
conf_loader = kedro.config.ConfigLoader(".")
parameters = conf_loader['parameters']
that returns me
{'data_science': {'active_modelling_pipeline': {'model_options': {'test_size': 0.2,
. When in another place, I
data_catalog = DataCatalog.from_config(catalog, credentials)
data_catalog.add_feed_dict(parameters)
this won't work, because eventually that'll land me
ValueError: Pipeline input(s) {'params:data_science.candidate_modelling_pipeline.model_options.random_state', ...} not found in the DataCatalog
What's the intermediate step that I'm missing?