Mohammed Samir
01/29/2023, 11:06 AMkedro run --env env_name
the pipelines nodes are interchangeable in running order , meaning that it runs as below
pipeline 1 --> Node 1
pipeline 2 ---> Node 1
pipeline 2 --> Node 2
pipeline 3 --> Node 1
pipeline 1 --> Node 2
pipeline 3 --> Node 2
(Note Nodes order in each pipeline is correct but kedro run a node from each pipeline)
However i want them to run in the below order,
pipeline 1 --> Node 1
pipeline 1---> Node 2
pipeline 2 --> Node 1
pipeline 2 --> Node 2
pipeline 3 --> Node 1
pipeline 3 --> Node 2
I have the following config in pipeline_registry -->
return {"__default__": pipeline1 + pipeline2+ pipeline3 + pipeline4 + pipeline5, }
Rob
01/29/2023, 6:21 PMspark.yml
file on the configuration folder, this to run the code from a databricks cluster
(using a workflow job, so my run.py
is in the DBFS), is required to specify the spark master URL?
Or is there an alternative to omit the spark.yml
to let Databricks manage my configuration? (I mean, to omit the manual setting of the Master URL)
Thanks in advance!Sergei Benkovich
01/29/2023, 8:01 PMAntoine Bon
01/30/2023, 9:00 AMload_version
functionality with a catalog that is build programmatically with a hook, but I fail to do so. From my understanding of the code this is not possible, and so I raised the following ticket https://github.com/kedro-org/kedro/issues/2233
Unless someone knows of a way to do so?Massinissa Saïdi
01/30/2023, 4:17 PMMassinissa Saïdi
01/30/2023, 5:34 PM--params
in code with KedroSession
?
I have something like that
def get_session() -> Optional[MyKedroSession]:
bootstrap_project(Path.cwd())
try:
session = MyKedroSession.create()
except RuntimeError as exc:
<http://_log.info|_log.info>(f"Session doesn't exist, creating a new one. Raise: {exc}")
package_name = str(Path(__file__).resolve().parent.name)
session = MyKedroSession.create(package_name)
return session
def get_parameters():
context = get_session().load_context()
return context.params
But get_parameters
gives the parameters set in yaml and not the updated with --params
? thx !Andrew Stewart
01/30/2023, 9:59 PM## from <https://kedro.readthedocs.io/en/stable/kedro_project_setup/session.html>
from kedro.framework.session import KedroSession
from kedro.framework.startup import bootstrap_project
from pathlib import Path
bootstrap_project(Path.cwd())
with KedroSession.create() as session:
session.run()
vs
## from <https://kedro.readthedocs.io/en/stable/tutorial/package_a_project.html>
from kedro_tutorial.__main__ import main
main(
["--pipeline", "__default__"]
) # or simply main() if you don't want to provide any arguments
Alexandra Lorenzo
01/31/2023, 4:48 PM"create_client() got multiple values for keyword argument 'aws_access_key_id'."
credentials.yml
dev_s3:
client_kwargs:
aws_access_key_id: AWS_ACCESS_KEY_ID
aws_secret_access_key: AWS_SECRET_ACCESS_KEY
catalog.yml
raw_images:
type: PartitionedDataSet
dataset:
type: flair_one.extras.datasets.satellite_image.SatelliteImageDataSet
credentials: dev_s3
path: <s3://ignchallenge/train>
filename_suffix: .tif
layer: raw
kedro = 0.17.7
s3fs = 0.4.2
Anyone as an idea ? Thanks in advanceJoão Areias
01/31/2023, 5:01 PMkedro jupyter convert
being deprecated? And is there going to be an easy way of turning notebooks into nodes and pipelines following this decision on kedro 0.19?Elias
01/31/2023, 5:54 PMOlivia Lihn
01/31/2023, 7:28 PMAndrew Stewart
02/01/2023, 1:35 AMSebastian Cardona Lozano
02/01/2023, 4:43 AMkedro-mlflow
plugin to achieve what we want. Here are the questions: Once you have the mlflow artifact can we still use the kedro-docker plugin to create the image or do we have to create the Docker image from scratch? From the other hand, can we still use the other plugins to export the pipeline to Airflow or Vertex Pipelines?
2. On that basis, we start to question if is it better to use mlflow for tracking and model registry taking advantage of the Kedro plugins, than the Vertex AI APIs. I would like to know your opinion about this or recommendations about how to combine both worlds.
Thanks in advance.
#questions #plugins-integrationsAnirudh Dahiya
02/01/2023, 1:14 PMException: Java gateway process exited before sending its port number
Has anyone faced this error before?Massinissa Saïdi
02/02/2023, 9:59 AMtag
in code ? (kedro run --tag NAME
)Larissa Siqueira
02/02/2023, 2:28 PMArtur Dobrogowski
02/02/2023, 3:58 PMdatajoely
02/02/2023, 3:58 PMFilip Panovski
02/02/2023, 5:01 PMdask.yml
in my conf/base
which contains the following (real config is much larger, but this gets the point across):
dask_cloudprovider:
region: eu-central-1
instance_type: t3.xlarge
n_workers: 36
And a dask.yml
in another environment, e.g. conf/low
with the following:
dask_cloudprovider:
instance_type: t3.small
n_workers: 8
Which I activate using kedro run --env=low
.
Now, I would have expected the config_loader
(TemplatedConfigLoader
) to contain something like {'dask_cloudprovider': {'region: 'eu-central-1', 'instance_type': 't3.small', 'n_workers': 8}}
.
However, it overrides the entire entry, resulting in the config_loader
containing: {'dask_cloudprovider': {'instance_type': 't3.small', 'n_workers': 8}}
.
Is there any way to get what I was expecting out of the box? I don't really want to copy my entire configuration N-times for each environment, especially since only a few of the keys change. Is the intended use case for environments different to what I'm trying to use it for (say, only for top-level entries)?WEN XIN (Jessie 文馨)
02/03/2023, 4:47 AMspark
job to EMR
through livy
for a kedro
project?Evžen Šírek
02/03/2023, 10:01 AMfastparquet
engine with the ParquetDataSet?
There is a possibility to specify the engine in the catalog entry:
dataset:
type: pandas.ParquetDataSet
filepath: data/dataset.parquet
load_args:
engine: fastparquet
save_args:
engine: fastparquet
However, when I do that, I get the DataSetError
with I/O operation on closed file
when Kedro tries to save the dataset.
When I manually save the data with pandas
and engine=fastparquet
(which is what Kedro should do according to the docs), it works well.
Is this expected? Thanks! :))
Environment:
python==3.10.4, pandas==1.5.1, kedro==0.18.4, fastparquet==2023.1.0
Massinissa Saïdi
02/03/2023, 10:45 AMVeenu Yadav
02/03/2023, 1:18 PMGiven configuration path either does not exist or is not a valid directory: /usr/local/airflow/conf/base
while deploying Kedro pipeline on on Apache Airflow with Astronomer . Any clues?Veenu Yadav
02/03/2023, 1:20 PM/usr/local/airflow/conf/base
is not even present in webserver container.Sergei Benkovich
02/03/2023, 3:29 PMRafał Nowak
02/05/2023, 6:54 PMjson.JSONDataSet
supporting gzip compression, so the filepath would be *.json.gz
I haven’t found such backend in kedro.datasets
Have anyone already implemented such dataset?Sergei Benkovich
02/05/2023, 8:05 PMModuleNotFoundError: No module named 'pipelines'
any suggestions on how to handle it?Ankar Yadav
02/06/2023, 12:19 PMsep
in save_args, it gives me an error:
prm_customer:
type: pandas.CSVDataSet
filepath: ${base_path}/${folders.prm}/
save_args:
index: False
sep: "|"
Any idea how to fix this?
I am using kedro 0.18.1
Yanni
02/06/2023, 1:59 PMDebanjan Banerjee
02/06/2023, 2:03 PM