Hello everyone, I am currently trying to launch a ...
# questions
Hello everyone, I am currently trying to launch a pipeline with the dask runner in a sort of distributed fashion (it's all on the same machine but there are several workers). I am facing a not so enjoyable issue which is that my dask workers are killed by the os (likely oom) AND I can't retrieve any logging information. So the logs are "displayed" on the workers side (is it possible to collect them from the workers and output/store/centralize them?) and the only message that I get is "KilledWorker: Attempted to run task <id task> on 3 different workers, but all those workers died while running it. The last worker was <ip:port>." But since the worker has been killed, (and a new one took its place) there's no way of reading those valuable pieces of information. How do you guys usually debug your app when using a dask runner? I've found a dask deployment page and a debug page in the kedro documentation but not a page that is merging those two, have I missed it? I must admit that I am kind of new to dask deployment so if it's a trivial subject, may I ask to be oriented towards pertinent documentation in that regard? Thank you all!
hi, I am inexperienced in Dask, but what you can do is to collect logs from dask nodes with for example https://www.fluentd.org/ and send them to a centralized monitoring system, as you said, to preserve the history and analyze them https://www.fluentd.org/dataoutputs If you have problems with the limited output or severity of your logs, you can create try to use such a configuration file https://docs.kedro.org/en/stable/deployment/dask.html#set-up-dask-and-related-configuration or try to simply pass the dask logging configuration as stated here https://docs.dask.org/en/stable/how-to/debug.html#logs (depending on the way you submit your runs)