Hi team After yesterdays release of `kedro datasets==1 5 0` Kedro #questions

Hi team, After yesterdays release of `kedro-datas...

Elena Mironova

08/01/2023, 1:24 PM

Hi team, After yesterdays release of

kedro-datasets==1.5.0

, our CI started failing during system tests which do a

kedro run

for a pipeline with spark (see the screenshot). As far as i can see,

SparkDataSet

is still defined with the same name as before. When we used

kedro-datasets==1.4.2

the same tests were running smoothly. I also couldn't find anything specific in the release notes - do we have to update our code (mb some import statements or how it is specified within the requirements)?

👀 1

Deepyaman Datta

08/01/2023, 1:47 PM

You're not pointing to

kedro_datasets

in the apparent broken import

Deepyaman Datta

08/01/2023, 1:48 PM

Can you share the catalog entry?

Elena Mironova

08/01/2023, 3:38 PM

this is how it looks like in the catalog:

Copy code

_csv: &csv
  type: spark.SparkDataSet
  file_format: csv
  load_args:
    sep: ","
    header: True
    inferSchema: True
  save_args:
    header: True
    mode: overwrite

prm_observation_time_frame:
  <<: *csv
  filepath: data/03_primary/prm_observation_time_frame.csv
  layer: primary

what confused me the most was that in 1.4.2 it worked

Deepyaman Datta

08/01/2023, 4:18 PM

Hmm...

Deepyaman Datta

08/01/2023, 4:19 PM

I wonder if the lazy loading affects the path discovery in some way. I'm not totally sure why it's not looking in

kedro_datasets

Erwin

08/01/2023, 4:23 PM

Hi!

Erwin

08/01/2023, 4:23 PM

i fixed it like this in requirements:

kedro-datasets[spark-sparkdataset]~=1.5

👍 1

👀 1

Erwin

08/01/2023, 4:23 PM

happened to me yesterday, all my pipelines were broken hhaha

Deepyaman Datta

08/01/2023, 4:25 PM

How was it working in 1.4.2 without the

spark-sparkdataset

extra I wonder?

Deepyaman Datta

08/01/2023, 4:26 PM

@Elena Mironova can you make sure you have the appropriate extras installed? If not, I can try to find some time to investigate whether

__all__

is getting populated in the discovery here (I thought I did check it when implementing, but not sure if something isn't working as expected); otherwise, nothing seems like it shouldn't work in my cursory pass through...

Erwin

08/01/2023, 4:28 PM

👍 1

Deepyaman Datta

08/01/2023, 4:29 PM

Ah, yes,

spark.SparkDataSet

extra on

kedro-datasets

will do nothing. 🙂

👍 1

Nok Lam Chan

08/01/2023, 4:45 PM

Flag https://github.com/kedro-org/kedro-plugins/pull/263, I don’t recalled we changed this behavior, likely a bug

Nok Lam Chan

08/01/2023, 4:46 PM

I don’t think

kedro-datasets[spark-sparkdataset]~=1.5

is what we intended, could be a temporary fix.

Elena Mironova

08/01/2023, 4:57 PM

yeah, i ended up pinning kedro-datasets to an older version that worked as well, i just was hoping for a more robust fix 🤓 @Deepyaman Datta, we have all the relevant extras mentioned in

setup.cfg

of the starter, exactly the same as it was before, so i'd assume that correct extras are installed (however, can't confirm 100%, cause our CI commands only list full packages through

pip freeze

)

Nok Lam Chan

08/01/2023, 5:00 PM

@Elena Mironova Can you open an issue on https://github.com/kedro-org/kedro-plugins/issues/new/choose? Thanks for flagging this, ideally if we are gonna fix it we should fix it in the 1.5.x series

Elena Mironova

08/02/2023, 7:13 AM

Sure! done: https://github.com/kedro-org/kedro-plugins/issues/290

🙌🏼 1

Nok Lam Chan

08/15/2023, 12:02 PM

@Elena Mironova 1.5.2 is out and should have fixed this issue.

Nok Lam Chan

08/15/2023, 12:03 PM

kedro-datasets[spark-sparkdataset]

This will not be supported since it was added unintentionally.

Elena Mironova

08/15/2023, 12:24 PM

Thank you!! So should we just have

kedro-datasets

in requirements, without optional extras?

Nok Lam Chan

08/15/2023, 1:46 PM

If you need spark then

pip install kedro-datasets[spark.SparkDataSet]

👍 1

Open in Slack

Previous Next