Hi < Nok Lam Chan> a coworker just asked me about <https git Kedro #plugins-integrations

Hi <@U03RJ2PH79D> a coworker just asked me about <...

Iñigo Hidalgo

03/01/2024, 9:35 AM

Hi @Nok Lam Chan a coworker just asked me about https://github.com/noklam/kedro-softfail-runner Do you ever use it in your daily work? or was it more of a proof-of-concept?

Nok Lam Chan

03/01/2024, 9:40 AM

👋 hey, I created this for a client originally, so it has been used at least in a project. There were few people interested in this or extending runner, so I make this pip installable for people to try out

Nok Lam Chan

03/01/2024, 9:42 AM

One caveat is that unlike other runner, this won’t return any free dataset in memory. I just didn’t have the time to finish it since this wasn’t important for the project.

👍 1

Iñigo Hidalgo

03/01/2024, 9:42 AM

Saw that part in the readme. It isn't super important for my coworker either so I'll pass that on. Thank you 🙂

Iñigo Hidalgo

03/01/2024, 9:43 AM

I'll let you know if we do actually start using it

K 1

Iñigo Hidalgo

03/01/2024, 9:43 AM

The usecase for us is supporting some code which probably shouldn't be in kedro in the first place 😅

Nok Lam Chan

03/01/2024, 9:44 AM

https://kedro.org/blog/build-a-custom-kedro-runner

Nok Lam Chan

03/01/2024, 9:44 AM

It’s handy for debugging too sometimes

Iñigo Hidalgo

03/01/2024, 9:44 AM

https://kedro-org.slack.com/archives/C03RKP2LW64/p1708296676033049?thread_ts=1708120844.771949&cid=C03RKP2LW64 I guess I hadn't read your message in detail 😆

Iñigo Hidalgo

03/01/2024, 9:46 AM

debugging

yeah i can see that. personally when im building/debugging i try to slice the pipeline to the smallest possible subset to run, but i can definitely see the appeal in wanting to get the full pipeline to run as much as possible

Nok Lam Chan

03/01/2024, 9:48 AM

Let me know if it works for you, we would like to revisit runner at some point I think there are more to it if we start to think about how to re-run pipeline from failure (adding memory to a kedro run) So far it's just idea but not much implemented

Nok Lam Chan

03/01/2024, 9:52 AM

One of the use case was the pipeline comes with a lot of almost parallel pipeline with low quality data, so error always happen in one or two node but the rest should not be affected. Arguably this is not ideal but was something that helps at the time.

Iñigo Hidalgo

03/01/2024, 10:00 AM

For us we want to run multiple "strategies" in parallel given input data, the outputs are independent of each other but the inputs overlap, so we built it all as "one big pipeline". It's the same thing I said in the thread I linked. We probably should've been structuring this code in a more modular way, so each strategy was its own pipeline, so one pipeline failing wouldn't affect the rest but 🤷

Nok Lam Chan

03/01/2024, 10:25 AM

Interesting, in this case the soft fail runner would work but probably do more than you want. You will like the strategy stops as soon as error happens. It's something that is easy to do if you do it in a procedural way but kedro resolving dependencies itself

Nok Lam Chan

03/01/2024, 10:25 AM

Is there a reason you cannot just do kedro run --pipeline a few times?

Iñigo Hidalgo

03/01/2024, 10:28 AM

each "strategy" is a single node. so we have a subset of the pipeline which calculates the "current market state" then we run multiple nodes in parallel which produce different trades and then we combine those trades into a single file which goes to market. it's mostly a problem of design. the data preparation steps are quite coupled with the strategy steps. ideally we would have one pipeline which computes the current market state, and then shoot off multiple different pipelines all of which do their own thing independently, and then have a final step which aggregates the outputs

Iñigo Hidalgo

03/01/2024, 10:30 AM

kedro is our hammer in this famous analogy

😂 5

Juan Luis

03/01/2024, 12:23 PM

lol

9 Views

Open in Slack

Previous Next