user
08/05/2022, 6:28 PMuser
08/06/2022, 7:38 AMTom Taylor-Vigrass
08/11/2022, 1:09 PMAttributeError: 'TranscodedDataNode' object has no attribute 'original_version'
user
08/16/2022, 8:08 AMuser
08/20/2022, 10:58 PMuser
08/23/2022, 2:28 PMMavis Tian
08/25/2022, 4:29 PMMavis Tian
08/25/2022, 4:30 PMAndrew Stewart
08/31/2022, 6:34 PMproject_version
in pyproject.toml
seems to correspond to the version of Kedro, not the actual project at hand. Is the package version in src/setup.py
the right place, or is that being controlled by some higher level process?Faisal Malik
09/07/2022, 9:13 AM0.17.4
but we want to convert our kedro pipeline into prefect flow using this approach but I notice this approach only available starting from kedro `0.18.0` while on latest kedro 0.17
it's not present. I tried to install prefect on the same environment as my kedro 0.17.4
but looks like it causes a dependency issue. Should I upgrade my kedro? and if that so, how hard it'll be to upgrade from kedro 0.17.4
to kedro 0.18.2
?Toni
09/09/2022, 1:21 PMset_targets = ['a', 'b', 'c']
, we can loop the same pipeline for each value of that list without "copying" that pipeline? We may have a different length and names for that "`set_of_targets`", and thus we want to avoid manual work... Also, we need the outputs to have "dynamic" names in the catalog in order to save all the outputs (score_{{target}}
... score_a
, score_b
, score_c
)... I think this could be done with jinja
, but no idea where to start...
Thank you very much!user
09/14/2022, 5:48 AMYetunde
09/14/2022, 8:43 AMToni
09/14/2022, 9:38 AMdata catalog
uses versioned: True
, when I use catalog.load(...)
in a notebook, does it always load the last version of that entry? How can I indicate the version to load?
Thank you!Riley Brady
09/14/2022, 8:16 PM0.18.1
) It seems that kedro run --tag some_tag1,some_tag2
will run any nodes with some_tag1
OR some_tag2
. Is there any functionality to use AND instead of OR? My workaround right now is to create a custom tag of some_tag1-some_tag2
and then calling that directly.
It would be nice if I could list out a few tags and only run nodes that have all of them. But I understand why OR is the default.Kasper Janehag
09/15/2022, 9:42 AM0.17.7
). Hi! I have some problems with running Kedro on a with a self-hosted Hadoop cluster. As part of a pipeline, I have a transcoded registered dataset table@pandas
and a table@spark
, with the following settings.
...table@pandas:
type: "${datasets.parquet}"
filepath: "${base_path_spark}/…/master_table"
..._table@spark:
<<: *pq
filepath: "${base_path_spark}/…/master_table"
The base_path_spark
is a HDFS location. These are then used in a pipeline in the following matter.
spark_to_pandas = pipeline(
pipe=Pipeline(
[
node(
func=spark_utils.to_pandas,
…
outputs=f"..._table@spark",
)
]
)
)
data_cleaning = pipeline(
pipe=Pipeline(
[
node(
func=enforce_schema_using_dict,
inputs={
"data": f"..._table@pandas",
},
…
)
]
)
)
The data_cleaning
node is suppose to pick up the output from the spark_to_pandas
node, using the transcoded dataset. However, an DataSetError
is raised with the following message
Exception has occurred: DataSetError
[Errno 2] No such file or directory: 'hadoop': 'hadoop'
Failed to instantiate Dataset 'telco_churn.master_table@pandas' of type 'kedro.extras.datasets.pandas.parquet_dataset.ParquetDataSet'.
If we remove the transcoding in the DataCatalog and register the datasets as individual registries the error disappears.
Anyone know how to proceed from this kind of error? Could it be related to client specific Hadoop environment? How can we proceed with trouble shooting?Toni
09/16/2022, 9:24 AMnp.array
with the catalog
? Is there a way to save this np.array
as CSV "easily"? I cannot use the pandas.CSVDataSet
because it is not a dataframe.
I think that this can be done with trascoding datasets, but I do not know if there is a dataset
for np.arrays
in kedro.user
09/17/2022, 10:58 AMOlivia Lihn
09/20/2022, 11:11 PM--master yarn
and --deploy-mode cluster
, not locally or client-mode. Has anyone tried this? If so, what are the extra files/code you added to make spark-submit
work?Jonas Kemper
09/27/2022, 10:30 AMlightweight HTTP API
? I'm thinking one POST request
to start a run and then a GET request
to poll the run status etc. ? Is there any reference material that you could point me to?user
10/03/2022, 1:48 PMuser
10/07/2022, 7:58 AMuser
10/07/2022, 2:18 PMuser
10/07/2022, 4:58 PMuser
10/08/2022, 6:38 PMhttps://i.stack.imgur.com/wkDIJ.jpg▾
user
10/13/2022, 8:18 AMuser
10/13/2022, 2:38 PMuser
10/14/2022, 8:58 AMMaren Eckhoff
10/14/2022, 6:17 PMnode(my_fun,
inputs = {input_data: my_data, input_params: params:my_params, constant: 4}
outputs = output_data})
user
10/17/2022, 3:18 PM