Still trying to wrap my head around custom dataset...
# questions
e
Still trying to wrap my head around custom datasets and how the pipeline works. So I created a custom dataset where the _save() method saves the data to a mongo db. In the pipeline, I define the node so that the inputs equal the data and the outputs equal the custom dataset. The part I don't understand clearly is if the class handles the actual save process, what do I put in the node function? the function doesn't do anything so I'm not sure what to do with it.
Copy code
pipeline([
        node(
            func=extract_rss_feed,
                inputs='rss_feed_extract',
                outputs='rss_feed_for_transforming',
                name="extract_rss_feed",
        ),
        node(
            func=transform_rss_feed,
                inputs=['rss_feed_for_transforming', 'params:rss_1'],
                outputs='rss_feed_for_loading',
                name="transform_rss_feed",
        ),
        node(
            func=load_rss_feed,
                inputs='rss_feed_for_loading', <- incoming data (in memory)
                outputs='rss_feed_load', <- calls the _save() of the class
                name="load_rss_feed",
        ),
        
    ])
nodes.py If all the save logic is in the class, then there's nothing for the function to do...what am I missing here? what typically goes in the function whose output is a dataset?
Copy code
def load_rss_feed(preprocessed_rss_feed):
    pass
When I try to run the pipeline, I get the following error:
DatasetError: Saving 'None' to a 'Dataset' is not allowed
thanks for your thoughts!
d
The return value of the function is passed to the
_save
method, that you've commented (
<- calls the _save() of the class
). So, your
load_rss_feed
should do something like
return preprocessed_rss_feed
, not just
pass
👍 1
e
@Deepyaman Datta thank you, I was missing that. So its normal to have node functions that have no content?
d
So its normal to have node functions that have no content?
No, it's not. It's typical to do some sort of transformation in your node (and generally advisable not to have a function that's essentially no-op).