hello there, I hope that this finds you well. Potential bug with partititioned dataset lazy saving I'm working with partitioned dataset, specifically using lazy saving. I have a list of items (dict). Each item has both a refname and a bbox. In order to implement lazy saving, following the doc, my node returns the following:
return {item["refname"]: lambda: get_image(item["bbox"], parameters) for item in items}
However, when I do that, the refname and the bbox are messing up -> images (bbox) are saved under the wrong refnames (the refname of another image-bbox). Quick fix If I don't implement lazy loading, everything works as expected (expected image-bbox under the related refname)
return {item["refname"]: get_image(item["bbox"], parameters) for item in items}
That said, I still need lazy loading. Set up kedro version: 0.18.3 OS: mac Questions Can you confirm that my implementation should e correct? If yes, do we have any experience with this bug? Is there any known fix? Should I raise an issue?
cc @Roberto P. Palomares
Note: waiting for a simplified example
I think I had the same very strange issue a while ago. I don't remember the inner details of python's internals, but if I remember well this is due to
not being properly redefined. I'll try to reproduce tomorrow and get back to you.
Hi, you potentially have to complete your lambda with a parameter list (for it to have the actual value of what you pass to get_image.) Not tested but can an example like this help you?
return {item["refname"]: lambda bbox=item["bbox"]: get_image(bbox, parameters) for item in items}
The template i use usually is the following (with an explicit parameter list.
return {
        partition_key: (
            lambda partition_load_func=partition_load_func, partition_key=partition_key: _my_function(
        for partition_key, partition_load_func in loaded.items()
Hi @Cyril Verluise, sorry for the delay! I think I remember what happened. There is a scoping conflict with lambda which don't evaluate parameter properly in a loop. I think the fix went like this (I know it looks stupid, but I am pretty sure it was my fix back then) :
def _create_lambda(bbox, parameters):
return lambda: get_image(bbox, parameters)
and then :
return {item["refname"]: _create_lambda(bbox, parameters) for item in items}
I haven't found the blog post about python scope for variables resolution which was an interesting read, but I did not find very carefully, if someone finds the reference please tell me!
By the way @Nok Lam Chan I'll try to create a reproducible example if Cyril confirms this is the right solution because this is very hard to debug and may be worth documenting.
thanks a lot! let me try that tomorrow
Great to have such a wonderful community. Let's check that and create a proper issue/request when it's done!
This approach works fine!
return {
        partition_key: (
            lambda partition_load_func=partition_load_func, partition_key=partition_key: _my_function(
        for partition_key, partition_load_func in loaded.items()
Same for Yolan proposal!
btw, aren't the two approaches equivalent?
@Nok Lam Chan, what's the next step? raising an issue directly in GH summarizing the above and asking for doc clarification? Lmk Happy to do it!
Yes, actually both approches force the resolution by evaluating variables in a higher scope (either with default args or a "proper" function), this is quite equivalent
Issue raised here: https://github.com/kedro-org/kedro/issues/3052#issue-1904569322 Feel free to subscribe to receive related news and/or to comment! Thanks a lot for your help!
thank you both
So I finally have time to play around with it. From my understanding, this is not a Kedro problem. It’s how Lambda variable scope work. See this example
In [7]: iterable = [lambda: print(x) for x in range(4)]
   ...: for i in iterable:
   ...:     i()
   ...: print("Assign the variable to lambda scope")
   ...: iterable = [lambda x=x : print(x) for x in range(4)]
   ...: for i in iterable:
   ...:     i()
Assign the variable to lambda scope
This StackOverFlow thread explains better: <https://stackoverflow.com/questions/938429/scope-of-lambda-functions-and-their-parameters>
Yes exactly , this is a python problem, not a kedro one. That said I think this is a common error with PartitionedDataSet (and a silent one), so it may be useful to warn about it
That’s fair, I think we can add a
section to warn about this. I just want to confirm this is not a bug that Kedro introduced. Actually should there be any lint tool that can pick this up? My guess is this should exists already.