Nikola Shahpazov
03/15/2023, 1:17 PMcatalog.yml
passing some argument\parameter.
Example:
yaml
person:
type: pandas.SQLQueryDataSet
sql: "SELECT * FROM public.people WHERE id = ${id};"
credentials: db_credentials
Thanks in advance!rss
03/15/2023, 1:18 PMNikola Shahpazov
03/15/2023, 2:43 PMOlivier Ho
03/16/2023, 3:00 PMRicardo Araújo
03/16/2023, 11:31 PMInterpolationKeyError: Interpolation key 'temp' not found
).Andrew Stewart
03/16/2023, 11:49 PMOlivier Ho
03/17/2023, 9:16 AMOlivier Ho
03/17/2023, 11:06 AMSlackbot
03/17/2023, 12:08 PMAbhishek Gupta
03/17/2023, 2:52 PMRicardo Araújo
03/17/2023, 6:15 PMAndrew Stewart
03/18/2023, 2:46 AMAndrej Zachar
03/20/2023, 1:36 AMnode(
first_namespace_fn,
inputs=["some_input"],
outputs="shared_name_so_it_can_reused_somewhere_else",
namespace="first"
),
node(
second_namespace_fn,
inputs=None,
outputs="shared_name_so_it_can_reused_somewhere_else",
namespace="second"
),
node(
third_common_fn,
inputs='shared_name_so_it_can_reused_somewhere_else',
outputs="final_output",
),
Thank you!Andrej Zachar
03/20/2023, 1:40 AMChew Lee
03/20/2023, 8:11 AMgsutil
to read and write files to the bucket. Kedro run also successfully reads/writes files from/to GCS. But when trying to load a dataset from the Catalog in jupyter notebook, I get a 401 access denied.
I have a credentials.yml file set up with
my_gcp_credentials:
client_id: <REDACTED>
client_secret: <REDACTED>
refresh_token: <REDACTED>
type: <REDACTED>
which was obtained using
gcloud auth login
gcloud auth application-default login
and copying the contents of the resulting jsonArmen Paronikyan
03/20/2023, 11:25 AMAK
03/20/2023, 1:54 PMJavier del Villar
03/20/2023, 4:01 PMpandas.SQLQueryDataSet
in spark? can I get the same functionality in spark?
I can not make queries with spark.SparkJDBCDataSet
, am I missing something?
Thanks in advance!Cyril Verluise
03/20/2023, 9:53 PMTypeError: HEAD is a detached symbolic reference as it points to
'dc15ea87ce9d917bafb09d5d7bddb2aaf44f5989'
Full error log and GH action config in thread.
What I tried
I have tried to checkout with fetch depth 0 but this did not fix the issue (I had a similar issue when building doc from GH action which was fixed using the above trick).
Environment
kedro version: 0.18.6
OS: ubuntu latest
Any ideas?sujdurai
03/21/2023, 2:39 AMnode
that is used in two pipelines. They use the same input tables, but I expect the node
in the second pipeline to run only after my first pipeline, because, the input files for the node
in the second pipeline will be updated as part of the first pipeline run.
Because I have registered both the pipelines to run as default in the registry
, the node
from the second pipeline runs sooner than I expect - I don’t want that.
# Pipeline A
Input X, Y --> node1 + node2 + node3 --> Output X (i.e Input X after update)
# Pipeline B
Input X(after update from Pipeline A), Y --> node1 + node4 + node5. --> Output Z
Order of execution (node_Pipelinename)
node1_A
node1_B
node3_A
node2_A
node4_B
node5_B
Expected order of execution
node1_A
node3_A
node2_A
node1_B
node4_B
node5_B
Dotun O
03/21/2023, 1:28 PMR P
03/21/2023, 7:09 PMkedro run --env=test
when I need to run a quick pipeline check. However, I have some code in my "settings.py" file that I must not run when I'm using the "conf/test" env, but I'm not managing to get this environment information in the "settings.py" code so I can write a simple if/else condition. What is the best way to do this?
Thanks for this awesome open-source tool!Javier del Villar
03/21/2023, 7:35 PMAnjali Datta
03/22/2023, 1:00 AMfrom <http://kedro.io|kedro.io> import DataCatalog
from <http://kedro.io|kedro.io> import PartitionedDataSet
from kedro.extras.datasets.pandas import CSVDataSet
from kedro.config import ConfigLoader
conf_paths = ['conf/base', 'conf/local']
conf_loader = ConfigLoader(conf_paths)
atlas_regions = conf_loader.get('atlas_regions*') # A .yml file consisting of regions with names
catalog_dictionary = {}
for region in atlas_regions['regions']:
name = region['name']
# catalog_dictionary[f'{name}_data_right'] = PartitionedDataSet(path = '../ClinicalDTI/R_VIM/', \
# dataset = '<http://programmatic_datasets.io|programmatic_datasets.io>.nifti.NIfTIDataSet', filename_suffix = f'seedmasks/{name}_R_T1.nii.gz')
catalog_dictionary[f'{name}_data_right'] = CSVDataSet(filepath = "../data/01_raw/iris.csv")
# catalog_dictionary[f'{name}_data_right_output'] = CSVDataSet(filepath = "../data/01_raw/iris.csv")
io = DataCatalog(catalog_dictionary)
print(io.list())
(Kedro version 0.17.7)
Running catalog.py prints the expected list of datasets. But what do I need to do to be able to use these datasets in a pipeline?Balachandran Ponnusamy
03/22/2023, 2:51 PMStephane Durfort
03/22/2023, 4:22 PMOmegaConfigLoader
to eventually replace the TemplateConfigLoader
in my pipeline, I noticed that
• variable interpolation does not seem to be applied on nested parameters (as in the model_options
example mentioned in the documentation)
• using kedro run --params
only update parameters but does not propagate to references of these parameters in the configuration ?
Am I doing something wrong ?Priyanka Patil
03/22/2023, 5:43 PMraw_dataset:
type: spark.SparkDataSet
filepath: "/data/01_raw/data.csv"
file_format: csv
load_args:
header: True
inferSchema: True
index: False
columns: ["a", "b", "c"]
Valentin Martinez Gama
03/22/2023, 9:42 PMclass CustomClass((BaseEstimator, TransformerMixin)
I have created an object of that class and saved it to my Kedro catalog as a pickle object. Now the problem is when I try using catalog.load()
on a pipeline to load that object I get the following error:
DataSetError: Failed while loading data from data set PickleDataSet(backend=pickle,
filepath=……./data/06_models/custom_model_V1.pkl,
load_args={}, protocol=file, save_args={}).
Can’t get attribute ‘CustomClass’ on <module ‘__main__’ from ‘……venv/bin/kedro’>I was able to make it work on a notebooks by first importint the class from the py file where it was defined:
from custom_classes import CustomClass
But when runing a kedro pipeline that uses this object as an input loaded from the catalog adding the import at the top of the pipeline fill did not fix it. Any usggestions on how to fix this?Kenny B
03/23/2023, 10:27 PMMaxime Steinmetz
03/23/2023, 11:20 PM