Hi Team , can we make dataset load to a kedro node...
# questions
d
Hi Team , can we make dataset load to a kedro node parallel ? I have 19 datasets that i need to read in a single function but when i see the logs it taking 2 minutes per dataset i.e. ~40 minutes of read time , any way we can make the read parallel ? I dont think they are conflicting in any way so i dont see why it needs to be sequential ?
d
please don’t tag us directly - someone will answer you. What kind of dataset?
d
This is a custom dataset we have created
pulling in heavy datasets
d
well then Kedro isn’t mandating sequential reads?
there’s two types of parallelism, you can do multi-processing in the dataset itself if you want to do some sort of chunking,
or you can use the ParallelRunner or ThreadRunner to run non-dependant tasks at the same time
d
this is a dataset agnostic question , i have 19 catalog entries , how can i make sure that they are being loaded at the same time ? im assuming by default it loads step by step ?
d
oh as in 19 dataset loads to the same node
d
indeed
d
that is a simple loop and if you want to implement your own runner you can
d
its loads like this
Copy code
00:02 INFO Loading data from xxxxx
00:04 INFO Loading data from yyyyy
d
oh sorry
we have this
kedro run --async
d
aaah lemme check this
OMG it reduced the load time to 10 minutes hahahahaha, incredible , thanks!!!
K 1
j
Nice ! I was going to ask if this is documented @datajoely but it is, and I'd completely forgotten about it 😆 I think a blog post about optimising Kedro usage, with this kind of tip, could be a good one :)
d
Fully agree! I would encourage a separate documentation for this too , this is much broader than just "running a pipeline"
d
A question for @Merel is there an argument to make this default behaviour?
m
Not that I’m aware of!
d
as in its been ruled out?
m
Oh haha I thought you meant argument as in argument to the CLI 😂
No I guess we just haven’t discussed it
d
what’s the best way to do so? An issue?
d
calls for a user feature vote ? 😛
m
Issue sounds good! You can mark it as “technical design” so we’ll discuss with the team
👍 1
j
LOL, if you add it as default I won't have the makings of an infomercial blog post any more 😆
d
@datajoely do you want me to make the issue about this ? I have screenshot et al and can put it up
d
I’m doing it
as in 90% written
d
Awesome, the only push back i can expect is with the
ThreadRunner
as async calls doesnt work with it , but lets see
d
I’m proposing a fallback
but I’m not sure if it’s posisble