Hi everyone, Anyone familiar with: ```DatasetErr...
# questions
m
Hi everyone, Anyone familiar with:
Copy code
DatasetError: <class 'pandera.api.pandas.container.DataFrameSchema'> was not serialised due to: Can't pickle <function custom_check_is_valid_barcode at 0x11a461310>:
it's not the same object as data_validation.pipelines.validate.nodes.custom_check_is_valid_barcode
Thx in advance, Regards M
b
It looks like your
DataFrameSchema
class might contain a
lambda
function? Or similar. If so, such functions are not pickle-able.
Check the
DataFrameSchema
- if you see any
lambda
functions then refactor them into concrete functions and see if it helps
m
oh… Thx @Ben Horsburgh I “tricked myself”. I properly registered a custom check, but in the body of the check there is indeed a lambda hanging around 😅 Thx !!! 🙂
🎉 1
@Ben Horsburgh Strangely enough, I’ve created a non-anonymous function in the body of my custom check, but still get the same error.. Can you identify anything that might cause the issue ? thx
Copy code
@pa.extensions.register_check_method()
def custom_check_is_valid_barcode(barcodes: "pd.Series[str]", 
                                  check: bool) -> "pd.Series[bool]":

    def check_barcode_validity(row):
        return barcodenumber.check_code(row.BARCODE_TYPE, row.BARCODE)

    if not isinstance(check, bool):
        raise ValueError(f"`check` should be `bool` not {type(check)}")

    barcodes = barcodes.to_frame()
    barcodes.columns = ['BARCODE']
    barcodes['BARCODE_TYPE'] = barcodes.fillna('').map(identify_barcode)
    
    if check:
        barcodes['IS_VALID'] = barcodes.apply(check_barcode_validity, axis=1)
    else:
        barcodes['IS_VALID'] = True
    
    return barcodes['IS_VALID']
b
Try moving
check_barcode_validity
outside of the decorated function, so that it is a module-level function
m
oh.. I get it ! Let me try that right away. Thx.
b
I'm not totally sure if that will be it though - since this should just be considered code inside the check. Might be something else
m
unfortunately the problem remains…
For now, I’ve simply dodged the problem by “non-persisting” the schema, but if anyone has a solution / explanation, please let me know 🙂 🙏
@Yolan Honoré-Rougé I hope that you won’t mind this _out of the blue “ping_”. But given you’re work on
kedro-pandera
I thought that you might have some insight into the above. Many thanks in advance, Regards Marc
y
Sorry, I don't see anything obvious from the code. Maybe you can avoid deep copying by persisting the schema with another pickle backend to see if it fixes the issue?
👍🏼 1
m
Thanks Yolan for your message and suggestion. I’ll give it a try and let you know. Merci beaucoup 🙂 🙏🏼 Marc