Hello everyone. Lets say i created a reporting pip...
# questions
s
Hello everyone. Lets say i created a reporting pipeline in a notebook (pull data, compute cols, export excel/csv). I then packaged everything into a kedro project and everything is fine. Then the customer wants some alterations of the reports, new columns or something like that. How would i proceed to "develop" inside kedro? Transferring dirty notebook code into clean nodes is one thing, but how would i proceed to develop once everything is a node in a pipeline? In jupyter notebooks or regular py files i can run the code until some point and then alter my dataframes as i wish. How would i approach this in a kedro framework? I hope this makes sense ;)
d
I think I understand your question, and it's something I think practitioners commonly do, so you're not alone! Whether you run the pipeline from a notebook or somewhere else, if you've persisted the outputs in a physical catalog entry, you can construct the catalog and load the dataset and make ad-hoc modifications in a notebook (or explore the data). You can run up until a particular point using options like
to_nodes
or
to_outputs
on the run. If you want to run a pipeline without persisting the outputs, a pipeline run in a notebook using the Python API returns a dictionary of "free" outputs (i.e. the outputs that didn't get consumed by other nodes or written to a catalog entry). You can load the data directly from there, and write downstream code. Let me know if I've misunderstood your ask.
s
thanks for the reply. i will play around with this to grasp the workflow of this. however, on second sight i think that running a debugger with specified break points would be sufficient as well for these purposes?
d
Is that what you want? I feel like a debugger-based workflow is very much a dev/debugging experience; on the other hand, you can mix the flexibility of running pipelines + using results in a notebook in an interactive stakeholder demo (for example), and it still looks "clean".