Nikos Kaltsas
02/15/2023, 12:11 AMdor zazon
02/15/2023, 11:01 AMdor zazon
02/15/2023, 11:02 AMVassilis Kalofolias
02/15/2023, 2:42 PMdataset.confirm()
?
Documentation is not clear, also it is not implemented in any dataset except IncrementalDataSet
.Alexander Johns
02/15/2023, 6:19 PMsrc/<my_project>/extras
├── __init__.py
└── datasets
├── __init__.py
└── <my_custom_dataset>.py
catalog entry:
raw_custom_dataset:
type: <my_project>.extras.datasets.<my_custom_dataset>.<MyCustomDataSet>
filepath: 01_raw/folder/*
when I run the node keep getting the following error:
An exception occurred when parsing config for DataSet 'raw_custom_dataset':
Class '<my_project>.extras.datasets.<my_custom_dataset>.<MyCustomDataSet>' not found or one of
its dependencies has not been installed.
Kedro =0.18.3Matthias Roels
02/15/2023, 8:23 PMpipelines/
- feat/
__init__.py
pipelines.py. <—- contains all subpipelines in this folder e.g feat_sales
- feat_sales/
__init__.py
nodes.py
pipelines.py
- …
Would this be the right approach? And if not, what is the recommended way to structure this? Do we use modular pipelines or regular pipelines?Alex Ferrero
02/16/2023, 10:48 AMVassilis Kalofolias
02/16/2023, 11:06 AMkedro run --params round_occupancy:False
However the False
is read as a string. Is there a way to pass a boolean instead? Note that the original param is correctly read from the Yaml file as a bool.Keith Edmonds
02/16/2023, 10:53 PMSebastian Pehle
02/17/2023, 9:36 AMSolomon Yu
02/17/2023, 3:34 PMmy_excel_file:
type: pandas.ExcelDataSet
filepath: some-excel-file.xlsx
load_args:
sheet_name: None
and I get Worksheet named 'None' not found
None
which is null
or ~
or .
Thanks in advance!Chris Santiago
02/17/2023, 6:26 PMpyproject.toml
file in the root project directory and then a separate setup.py
in the src
directory? Trying to understand their separate roles.
I'd like to introduce kedro
to my team at work. We use a custom cookiecutter to setup all of our projects so that they're pip-installable across various platforms. Our current update uses only pyproject.toml
, and we've removed last remnants of setup.py
and setup.cfg
.
Specifically, I'm trying to understand how I could structure a custom starter, incorporating our existing cookiecutter, that would allow for editable installs with extras-- but I don't want to disturb any existing kedro functionality. How does the kedro cli use the src/setup.py
file, if at all; same with the pyproject.toml
in the root folderRicardo Araújo
02/18/2023, 3:39 PMAlexis Eutrope
02/18/2023, 8:22 PMDustin
02/20/2023, 3:29 AMJuan Luis
02/20/2023, 10:36 AMkedro new
in non-interactive ways so it's compatible with Jupyter shell commands (!kedro new ...
). I see two ways:
• yes "Project Name" | kedro new --starter=xxx
: works, but it's UNIX-only (don't think this will work on Windows), assumes there is only one question, and looks a bit arcane.
• `vim kedro.yaml ... && kedro new --starter=xxx --config=kedro.yaml`: works, but I'm creating a file that I will only use once, plus it's not very easy to discover what structure should the file have (one has to navigate to the source code of the starter in question, locate the prompts.yml
, and mimic those keys)
I see that this has been unchanged since basically "forever" but I'm wondering what are folks opinions on having a way to pass these configs to the CLI? something like kedro new --starter=xxx --project_name=yyy
Juan Luis
02/20/2023, 11:53 AMkedro jupyter notebook
)" but actually this depend on the starter that got used - for example, projects created with standalone-datacatalog
do not have it. is this a docs issue (we should amend those to explain how to get that command working regardless of the starter used) or a starter issue (all starters should have kedro jupyter notebook
)?Lan Bui
02/20/2023, 2:08 PMMassinissa Saïdi
02/20/2023, 5:54 PMkedro run --params key:false
or kedro run --params key:False
return string 'False' or 'false'
. I know i cant set parameter to 0 or '' to have the false condition but there is a better way ? thx 🙂Laura Oñate
02/21/2023, 2:25 AMRobertqs
02/21/2023, 4:43 AMJan
02/21/2023, 10:26 AMOlivier Ho
02/21/2023, 10:48 AMArmen Paronikyan
02/21/2023, 10:56 AMNicolas Oulianov
02/21/2023, 7:55 PMdatajoely
02/22/2023, 8:00 AMFrancisco Alejandro Leal Tovar
02/22/2023, 1:34 PMSolomon Yu
02/22/2023, 2:18 PMmy_dataset:
type: pandas.CSVDataSet
filepath: path-to-my-file.csv
load_args:
parse_dates: ['col_3']
dtype: dtypes_dict_var
So that catalog.yml won't be too many lines long.
I'd like this dtype dict to live within conf/base/parameters/my_pipeline.yml, as:
dtypes_dict_var: {
"col_1": int,
"col_2": str,
"col_3": DateTime<'Y-m-d'>, # assumes YAML API syntax will be converted to datetime object
}
Another question here is how to pass in datetime object type to load_args:dtype
I'd like this dtype dict to affect only loading my_dataset, and not use as a global var if possible. A separate case could be that I'd like to load the same dataset with different dtypes in different pipelines, which could utilise TemplatedConfigLoader..
Passing in certain parameters doesn't seem very straightforward tbh :/
Thanks in advance!Ian Whalen
02/22/2023, 2:32 PMsettings.py
and looping over it in the jinja-esque style to define catalog entries.
Couldn’t immediately tell from the docs, though I haven’t had much time to work with the new loader. I am excited too of course 🙂Shiv Pratap Singh
02/22/2023, 3:04 PM