Hi everyone, I’m getting this warning, which I f...
# questions
m
Hi everyone, I’m getting this warning, which I find odd, in the logs while running
kedro run -p data_processing --tags=users
Copy code
[08/21/23 16:39:45] WARNING  /home/kedro_docker/.local/lib/python3.9/site-packages/kedro/framework/project/__init__.py:359: UserWarning:   warnings.py:109
                             An error occurred while importing the 'dodo_kedro.pipelines.filtering' module. Nothing defined therein will                  
                             be returned by 'find_pipelines'.                                                                                             
                                                                                                                                                          
                             Traceback (most recent call last):                                                                                           
                               File "/home/kedro_docker/.local/lib/python3.9/site-packages/kedro/framework/project/__init__.py", line 357,                
                             in find_pipelines                                                                                                            
                                 pipeline_module = importlib.import_module(pipeline_module_name)                                                          
                               File "/usr/local/lib/python3.9/importlib/__init__.py", line 127, in import_module                                          
                                 return _bootstrap._gcd_import(name[level:], package, level)                                                              
                               File "<frozen importlib._bootstrap>", line 1030, in _gcd_import                                                            
                               File "<frozen importlib._bootstrap>", line 1007, in _find_and_load                                                         
                               File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked                                                 
                               File "<frozen importlib._bootstrap>", line 680, in _load_unlocked                                                          
                               File "<frozen importlib._bootstrap_external>", line 850, in exec_module                                                    
                               File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed                                               
                               File "/workspaces/dodo-kedro/src/dodo_kedro/pipelines/filtering/__init__.py", line 6, in <module>                          
                                 from .pipeline import create_pipeline                                                                                    
                               File "/workspaces/dodo-kedro/src/dodo_kedro/pipelines/filtering/pipeline.py", line 7, in <module>                          
                                 from . import nodes                                                                                                      
                               File "/workspaces/dodo-kedro/src/dodo_kedro/pipelines/filtering/nodes.py", line 41, in <module>                            
                                 for _filter in catalog.load("params:filters")]                                                                           
                               File "/home/kedro_docker/.local/lib/python3.9/site-packages/kedro/io/data_catalog.py", line 473, in load                   
                                 dataset = self._get_dataset(name, version=load_version)                                                                  
                               File "/home/kedro_docker/.local/lib/python3.9/site-packages/kedro/io/data_catalog.py", line 406, in                        
                             _get_dataset                                                                                                                 
                                 raise DatasetNotFoundError(error_msg)                                                                                    
                             kedro.io.core.DatasetNotFoundError: Dataset 'params:filters' not found in the catalog - did you mean one of                  
                             these instead: parameters, params:models, params:columns.users                                                               
                                                                                                                                                          
                               warnings.warn(
I’m wondering why it is trying to load the filters params for the filtering pipeline while I’m running only the data_processing pipeline ? Those
filters
are indeed not defined, since I had “commented-out” those in order to debug something… Granted: This is not a “drama” since it only raises a warning and still allows to run the data_processing pipeline. Yet, I must say it’s a little confusing to see a warning for a pipeline that was not called This seems to me to related to the lazy loading of datasets for which I have opened a feature request on github. Looking forwards to reading your thoughts / comments. Regards M
m
@Aleksander Jaworski sounds familiar?
a
yes it does
and I have asked about lazy loading on this slack before
I actually had an error when credentials for one of the datasets in the catalog were not defines
even though the dataset was not necessary for the pipeline I wanted to run
m
HI @marrrcin & @Aleksander Jaworski I have actually opened a feature request on github for this exact issue: https://github.com/kedro-org/kedro/issues/2829 (you comment on slack were actually posted there by Nok)
@Juan Luis & @datajoely Sorry for the attention grabbing tag 😉 Do you have any opinion / comment on the above ? Should I also post this in the lazy loading feature request on github ? Thx M
j
hey @Marc Gris, if I understand correctly, Kedro is being forgiving in this case right? a warning is emitted and the run proceeds. so, I don't think it's necessary to add it to the "lazy loading" issue, which is more concerned with use cases that block the run. although as a side effect, warnings like these would probably disappear. let me know if this makes sense
m
Hi @Juan Luis Thanks for you answer 🙂 It is indeed a nice thing that Kedro is “forgiving”. Being able to run pipeline A even if pipeline B is broken is indeed precious. Sorry for “candidly questioning the way things are done”. But I don’t really understand why the current approach is: “first load all pipelines, then filter out those that are not needed”. I’m certain that there many good reasons that I cannot perceive, but it seems to me to come a certain cost, speed being a trivial example… Could filtering be done at very start of
find_pipelines()
Thanks for your comment 🙏🏼 M
d
Whilst I think there are some design question to think through here - you can also bypass
find_pipelines
and register them yourself for now
m
Sure. Thx @datajoely But, doesn’t that seem connected to the Lazy Loading subject ? My feature request was indeed originally for datasets. But couldn’t, or even shouldn’t, it be broadened to pipelines as well ? The spirit being: Why load something that is not needed / called for ? 🙂 Thx in advance for your comments 🙏🏼 M.
@Juan Luis @datajoely @Deepyaman Datta @Nok Lam Chan Hi everyone, Sorry once again for the shameless use of attention grabbing of tags Here is, it seems to me, another lazy-loading related issue When running
kedro run -p filtering
I end up with
TypeError: Inputs of 'recommend' function expected ['scores', 'k'], but got ['filtered_scores', 'params:k', 'params:user_id_col', 'params:item_id_col', 'params:score_col']
This is a node in another pipeline… I’m therefore re-posting what I’ve just posted previously: My initial feature request was indeed originally for datasets. But couldn’t, or even shouldn’t, it be broadened to pipelines as well ? The spirit being: Why load something that is not needed / called for ? & Why should one be blocked from working on pipeline a because things are broken in pipeline b ? Many thanks in advance for you comments / suggestions. Cheers M.
P.S: Is there a specific reason for the filtering pipelines to be done post-hoc ? or is it just by happenstance ? If so, could the issue above be simply fixed by filtering before instantiating the pipelines ? (sorry if this question is stupid or naïve… We’re in such a rush at work that did not have the time to spend inspecting (admiring?) how kedro works under the hood)
j
hi @Marc Gris, sorry you're having a hard time with this. about what @datajoely suggested earlier, did you manage to sidestep this issue by tweaking the
pipeline_registry:find_pipelines
function? about the historical context on why things work this way, we lack that insight at the moment, but we haven't received similar reports in the past. so maybe users were finding ways of working around this without telling us.
m
Hi @Juan Luis Thanks for your message, and please do not feel sorry 😉 Tweaking
find_pipelines()
does the job of course 🙏🏼 My posts are not so much about “complaining” than they are about “giving feedbacks” that hopefully / humbly could be helpful to you guys 🙂 What I described above, seems to be to be a reasonable expectation from a first time kedro user… Hence my sharing this here. Thanks again & have a nice day, M.
j
yep, I totally agree! thanks @Marc Gris as always, your input is very valuable to us 🙏🏼
m
🙏🏼 🙂