Hi everyone, I hope you’ve all had a nice week-end...
# questions
m
Hi everyone, I hope you’ve all had a nice week-end 🙂 I’m sure this is doable, but I haven’t found much “leads” in the documentation. How could I have a “*per-run log file*” that would be save as an artifact of the pipeline in the data dir ? Thx in advance Regards, Marc
🤔 1
m
That’s a nice question!
I would like to see the answer 🤔
m
Oh… I must confess that I’m flattered that you find the question interesting 🙂 Let’s see if someone has a straightforward suggestion / solution !
d
I think this is easiest with a
after_pipeline_run
hook, you have access to the
session_id
m
Thx @datajoely for you answer and suggestion. But I remain a little bit confused. If I remember correctly,
session_id
is a datetime like
str
, correct ? If so… What am I supposed to do with it to access to the run’s log ? Thx in advance for you time, advice and patience 🙏🏼 🙂
d
oh we’re talking about the logging config 🤔
m
@datajoely can it be done via logging config ? 🙂 I’ve looked into the doc but haven’t found anything. And I confess that I didn’t have time to dive into rich’s docs… I was hoping that someone more advance would have a “quick & easy” solution 😜
d
so I was wondering about this then got sucked into something else
I wonder if you can set up the logging formatter to do this
but from a lifecycle point of view I think we initialise the logging config before the session is initialised
I really don’t know here
it’s a thin team today will see if others have any ideas tomorrow
m
🙏🏼 🙂
Hi @datajoely 🙂 I was going through my old conversation / threads and saw that this one was still “pending”. Any updates / insights ? Thx in advance, M
d
Good question - let me raise this
j
this technically is similar to log rotation right? https://docs.python.org/3/library/logging.handlers.html#rotatingfilehandler but not quite, because we'd want to have 1 file per run, and log rollover when the size of the log goes over a certain threshold...
one can force manual rollover https://stackoverflow.com/a/44636428/554319 but I don't think this is possible just tweaking the logging configuration 🤔
d
@Juan Luis are
OmegaConf
resolvers available in
logging.yml
? Could you do something like
Copy code
info_file_handler:
    class: logging.handlers.RotatingFileHandler
    level: INFO
    formatter: simple
    filename: info_${oc.env:KEDRO_SESSION_ID}.log
💡 1
j
this is a clever trick - but then if the filename uses the session id, there's no need to do manual rotation right? and also,
$KEDRO_SESSION_ID
wouldn't normally be set, or am I missing something?
d
yeah you’d have to set it in an
before_pipeline_run
hook
and I’m not sure if that would even work tbh
j
continuing with the resolver idea, could there be a
${kedro:session_id}
resolver that got it from the session?
d
this would be very very useful
especially if we also make the session_id configurable
man I’m so excited by what OmegaConf is able to do
m
🍿 meow popcorn Thanks guys for considering this question / request 🙂
j
how would such a resolver be defined though? resolvers are added to omegaconf on
settings.py
, it's unclear to me how to fetch the global Kedro session or context 🤔
d
omegaconf could fetch a ENV var that doesn’t exist yet
n
Not sure if I missed anything - the standard library
logging
should support this and you just need to defined the filepath to save in
data
instead of the root directory.
d
how do you define the filepath dynamically for each run?
n
So you also need to match
session_id
? Otherwise there are easy way to rotate the files (pretty sure it was asked a few months ago). Using
OmegaConfigLoader
may not be ideal because it is decoupled from config loader in 0.19, it will be just a simple
yaml.load
i
@Marc Gris could you please share why you want to do this? And what log do you want to keep, the entire log from the logger or certain logs? What is the use case you are solving?
👍 1
m
Hi @Ivan Danov Thanks for your message. Actually, I had posted the question 2 month ago. Therefore, I hope you won’t mind if I can’t remember the exact use case… 😅 However, I can remember the “spirit”: Striving to apply MLOps best practices, it seemed to me relevant to keep track of as many thing as possible regarding my experiments etc… The log of a run being information rich, it naturally seemed to me worth persisting & versioning “individually”… Thanks again & Have a nice week-end, M.
d
if we added the session id to the actually log lines and switched to JSON logging, perhaps thats a good way of doing this? It would work nicely in ELK for example
j
yeah I think it's pretty reasonable to desire 1 log file per run 🤔 and so far seems like the best solution in Kedro is
kedro run > log_$(date).txt
i
https://stackoverflow.com/questions/44635896/how-to-create-a-new-log-file-every-time-the-application-runs One can achieve that by doing what is suggested in this stackoverflow post ☝️. Best option is creating a custom logging handler, extending the built-in
RotatingFileHandler
, which will always create a new file. The path and the name of that file can be configured through
logging.yaml
. Making that file have the same name as the session id is not that easy indeed, but doable with a few hacks. I think the work on this will make it much easier once it's done: https://github.com/kedro-org/kedro/issues/1731
🙏🏼 2