https://kedro.org/ logo
#questions
Title
# questions
m

Marc Gris

11/06/2023, 9:27 AM
Hi everyone, Anyone familiar with:
Copy code
DatasetError: <class 'pandera.api.pandas.container.DataFrameSchema'> was not serialised due to: Can't pickle <function custom_check_is_valid_barcode at 0x11a461310>:
it's not the same object as data_validation.pipelines.validate.nodes.custom_check_is_valid_barcode
Thx in advance, Regards M
b

Ben Horsburgh

11/06/2023, 9:33 AM
It looks like your
DataFrameSchema
class might contain a
lambda
function? Or similar. If so, such functions are not pickle-able.
Check the
DataFrameSchema
- if you see any
lambda
functions then refactor them into concrete functions and see if it helps
m

Marc Gris

11/06/2023, 9:34 AM
oh… Thx @Ben Horsburgh I “tricked myself”. I properly registered a custom check, but in the body of the check there is indeed a lambda hanging around 😅 Thx !!! 🙂
🎉 1
@Ben Horsburgh Strangely enough, I’ve created a non-anonymous function in the body of my custom check, but still get the same error.. Can you identify anything that might cause the issue ? thx
Copy code
@pa.extensions.register_check_method()
def custom_check_is_valid_barcode(barcodes: "pd.Series[str]", 
                                  check: bool) -> "pd.Series[bool]":

    def check_barcode_validity(row):
        return barcodenumber.check_code(row.BARCODE_TYPE, row.BARCODE)

    if not isinstance(check, bool):
        raise ValueError(f"`check` should be `bool` not {type(check)}")

    barcodes = barcodes.to_frame()
    barcodes.columns = ['BARCODE']
    barcodes['BARCODE_TYPE'] = barcodes.fillna('').map(identify_barcode)
    
    if check:
        barcodes['IS_VALID'] = barcodes.apply(check_barcode_validity, axis=1)
    else:
        barcodes['IS_VALID'] = True
    
    return barcodes['IS_VALID']
b

Ben Horsburgh

11/06/2023, 9:46 AM
Try moving
check_barcode_validity
outside of the decorated function, so that it is a module-level function
m

Marc Gris

11/06/2023, 9:47 AM
oh.. I get it ! Let me try that right away. Thx.
b

Ben Horsburgh

11/06/2023, 9:49 AM
I'm not totally sure if that will be it though - since this should just be considered code inside the check. Might be something else
m

Marc Gris

11/06/2023, 9:51 AM
unfortunately the problem remains…
For now, I’ve simply dodged the problem by “non-persisting” the schema, but if anyone has a solution / explanation, please let me know 🙂 🙏
@Yolan Honoré-Rougé I hope that you won’t mind this _out of the blue “ping_”. But given you’re work on
kedro-pandera
I thought that you might have some insight into the above. Many thanks in advance, Regards Marc
y

Yolan Honoré-Rougé

11/06/2023, 12:47 PM
Sorry, I don't see anything obvious from the code. Maybe you can avoid deep copying by persisting the schema with another pickle backend to see if it fixes the issue?
👍🏼 1
m

Marc Gris

11/06/2023, 12:50 PM
Thanks Yolan for your message and suggestion. I’ll give it a try and let you know. Merci beaucoup 🙂 🙏🏼 Marc