Artur Dobrogowski
04/04/2024, 2:27 PM<class 'str'>: (<class 'pyspark.errors.exceptions.captured.IllegalArgumentException'>, IllegalArgumentException())
Nok Lam Chan
04/04/2024, 2:36 PMArtur Dobrogowski
04/04/2024, 2:36 PMNok Lam Chan
04/04/2024, 2:36 PMArtur Dobrogowski
04/04/2024, 2:36 PMNok Lam Chan
04/04/2024, 2:36 PMArtur Dobrogowski
04/04/2024, 2:36 PMArtur Dobrogowski
04/04/2024, 2:37 PMNok Lam Chan
04/04/2024, 2:38 PMArtur Dobrogowski
04/04/2024, 2:38 PMArtur Dobrogowski
04/04/2024, 2:39 PMkedro==0.19.3
kedro-datasets==2.0.0
kedro-viz==7.1.0
Artur Dobrogowski
04/04/2024, 2:39 PMArtur Dobrogowski
04/04/2024, 2:40 PMNok Lam Chan
04/04/2024, 2:40 PMkedro ipython
context.config_loader["spark"]
Can you check if the output of this looks normal?Artur Dobrogowski
04/04/2024, 2:41 PMArtur Dobrogowski
04/04/2024, 2:42 PMNok Lam Chan
04/04/2024, 2:42 PMArtur Dobrogowski
04/04/2024, 2:42 PMArtur Dobrogowski
04/04/2024, 2:42 PMArtur Dobrogowski
04/04/2024, 2:43 PMNok Lam Chan
04/04/2024, 2:43 PMArtur Dobrogowski
04/04/2024, 2:43 PMArtur Dobrogowski
04/04/2024, 2:44 PMArtur Dobrogowski
04/04/2024, 2:48 PMArtur Dobrogowski
04/04/2024, 2:49 PMNok Lam Chan
04/04/2024, 2:50 PMArtur Dobrogowski
04/04/2024, 2:50 PMArtur Dobrogowski
04/04/2024, 2:50 PMArtur Dobrogowski
04/04/2024, 2:51 PMArtur Dobrogowski
04/04/2024, 2:51 PMNok Lam Chan
04/04/2024, 2:53 PMNok Lam Chan
04/04/2024, 2:56 PMArtur Dobrogowski
04/04/2024, 3:05 PMArtur Dobrogowski
04/04/2024, 3:07 PMNok Lam Chan
04/04/2024, 4:35 PMthis info might be worth including in https://docs.kedro.org/en/stable/resources/migration.htmlWhat should be added there? Is this a spark issue or Kedro?
Nok Lam Chan
04/04/2024, 4:36 PMArtur Dobrogowski
04/05/2024, 10:54 AMNok Lam Chan
04/05/2024, 11:08 AMpip freeze
would help before/after everytime you update a library.
◦ You may add additional constrains during upgrade to make sure it doesn't change your spark version. pip install kedro==0.19 pyspark==3.4.2
(assuming your pyspark was 3.4.2)
My intuition here is either the configuration is wrong. Or something with Spark version/datasets.Artur Dobrogowski
04/05/2024, 11:11 AMArtur Dobrogowski
04/05/2024, 11:12 AMNok Lam Chan
04/05/2024, 11:15 AMNok Lam Chan
04/05/2024, 11:17 AMconfig_loader["spark"]
? Kedro doesn't do anything with Spark so it's weird that upgrading Kedro break SparkArtur Dobrogowski
04/05/2024, 11:25 AMraw_data_import:
type: spark.SparkDataset
file_format: parquet
filepath: "${globals: raw_folder}/foo/data.parquet"
load_args:
header: True
inferSchema: True
save_args:
mode: overwrite
Artur Dobrogowski
04/05/2024, 11:26 AMArtur Dobrogowski
04/05/2024, 11:33 AMArtur Dobrogowski
04/05/2024, 11:38 AMNok Lam Chan
04/05/2024, 11:38 AMkedro
library. So if you choose PySpark as a tool, there will be a default SparkHook created for you.
I've seen that kedro and kedro-datasets were split in version 0.19 - is that correct?It was splitted earlier, in 0.18.x
kedro.extras.datasets
is in frozen state so you can either use the datasets from kedro
or kedro-datasets
(kedro-datasets take priority if detected)Artur Dobrogowski
04/05/2024, 11:39 AMNok Lam Chan
04/05/2024, 11:39 AMDataset
rename