Guillaume Latour

04/03/2023, 1:28 PM
Hello everyone, I am currently trying to launch a pipeline with the dask runner in a sort of distributed fashion (it's all on the same machine but there are several workers). I am facing a not so enjoyable issue which is that my dask workers are killed by the os (likely oom) AND I can't retrieve any logging information. So the logs are "displayed" on the workers side (is it possible to collect them from the workers and output/store/centralize them?) and the only message that I get is "KilledWorker: Attempted to run task <id task> on 3 different workers, but all those workers died while running it. The last worker was <ip:port>." But since the worker has been killed, (and a new one took its place) there's no way of reading those valuable pieces of information. How do you guys usually debug your app when using a dask runner? I've found a dask deployment page and a debug page in the kedro documentation but not a page that is merging those two, have I missed it? I must admit that I am kind of new to dask deployment so if it's a trivial subject, may I ask to be oriented towards pertinent documentation in that regard? Thank you all!

Damian Fiłonowicz

04/03/2023, 3:00 PM
hi, I am inexperienced in Dask, but what you can do is to collect logs from dask nodes with for example and send them to a centralized monitoring system, as you said, to preserve the history and analyze them If you have problems with the limited output or severity of your logs, you can create try to use such a configuration file or try to simply pass the dask logging configuration as stated here (depending on the way you submit your runs)