https://kedro.org/ logo
#questions
Title
# questions
e

Elena Mironova

08/01/2023, 1:24 PM
Hi team, After yesterdays release of
kedro-datasets==1.5.0
, our CI started failing during system tests which do a
kedro run
for a pipeline with spark (see the screenshot). As far as i can see,
SparkDataSet
is still defined with the same name as before. When we used
kedro-datasets==1.4.2
the same tests were running smoothly. I also couldn't find anything specific in the release notes - do we have to update our code (mb some import statements or how it is specified within the requirements)?
πŸ‘€ 1
d

Deepyaman Datta

08/01/2023, 1:47 PM
You're not pointing to
kedro_datasets
in the apparent broken import
Can you share the catalog entry?
e

Elena Mironova

08/01/2023, 3:38 PM
this is how it looks like in the catalog:
Copy code
_csv: &csv
  type: spark.SparkDataSet
  file_format: csv
  load_args:
    sep: ","
    header: True
    inferSchema: True
  save_args:
    header: True
    mode: overwrite

prm_observation_time_frame:
  <<: *csv
  filepath: data/03_primary/prm_observation_time_frame.csv
  layer: primary
what confused me the most was that in 1.4.2 it worked
d

Deepyaman Datta

08/01/2023, 4:18 PM
Hmm...
I wonder if the lazy loading affects the path discovery in some way. I'm not totally sure why it's not looking in
kedro_datasets
.
e

Erwin

08/01/2023, 4:23 PM
Hi!
i fixed it like this in requirements:
kedro-datasets[spark-sparkdataset]~=1.5
πŸ‘ 1
happened to me yesterday, all my pipelines were broken hhaha
d

Deepyaman Datta

08/01/2023, 4:25 PM
How was it working in 1.4.2 without the
spark-sparkdataset
extra I wonder?
@Elena Mironova can you make sure you have the appropriate extras installed? If not, I can try to find some time to investigate whether
__all__
is getting populated in the discovery here (I thought I did check it when implementing, but not sure if something isn't working as expected); otherwise, nothing seems like it shouldn't work in my cursory pass through...
e

Erwin

08/01/2023, 4:28 PM
image.png
πŸ‘ 1
d

Deepyaman Datta

08/01/2023, 4:29 PM
Ah, yes,
spark.SparkDataSet
extra on
kedro-datasets
will do nothing. πŸ™‚
πŸ‘ 1
n

Nok Lam Chan

08/01/2023, 4:45 PM
Flag https://github.com/kedro-org/kedro-plugins/pull/263, I don’t recalled we changed this behavior, likely a bug
I don’t think
kedro-datasets[spark-sparkdataset]~=1.5
is what we intended, could be a temporary fix.
e

Elena Mironova

08/01/2023, 4:57 PM
yeah, i ended up pinning kedro-datasets to an older version that worked as well, i just was hoping for a more robust fix πŸ€“ @Deepyaman Datta, we have all the relevant extras mentioned in
setup.cfg
of the starter, exactly the same as it was before, so i'd assume that correct extras are installed (however, can't confirm 100%, cause our CI commands only list full packages through
pip freeze
)
n

Nok Lam Chan

08/01/2023, 5:00 PM
@Elena Mironova Can you open an issue on https://github.com/kedro-org/kedro-plugins/issues/new/choose? Thanks for flagging this, ideally if we are gonna fix it we should fix it in the 1.5.x series
e

Elena Mironova

08/02/2023, 7:13 AM
n

Nok Lam Chan

08/15/2023, 12:02 PM
@Elena Mironova 1.5.2 is out and should have fixed this issue.
kedro-datasets[spark-sparkdataset]
This will not be supported since it was added unintentionally.
e

Elena Mironova

08/15/2023, 12:24 PM
Thank you!! So should we just have
kedro-datasets
in requirements, without optional extras?
n

Nok Lam Chan

08/15/2023, 1:46 PM
If you need spark then
pip install kedro-datasets[spark.SparkDataSet]
πŸ‘ 1