Hi everyone! I wanted first to thank all the parti...
# questions
j
Hi everyone! I wanted first to thank all the participants of the recent Kedro documentation survey thankyou and second, to apologise šŸ™‡ā€ā™€ļø. We are still waiting for our rebranded Kedro merch, so we have yet to send anything out to those who responded, but I am following up on that. šŸ‘• K kedroid Can I ask a follow up question to everyone on this channel? It's still about docs, specifically the Spaceflights tutorial. We received feedback that you'd like more examples of how to extend Spaceflights for other, more advanced, scenarios, e.g. to add S3 as a filestore, deployment options, etc. We wouldn't extend the starter but we would add extra example code and how-to sections in the docs. The question is: what do you think we should add to extend Spaceflights for common tasks and scenarios? Please leave me a comment in the šŸ§µ And, finally, if you're interested in the outcome of the documentation user research, there's now a milestone on GitHub with some of the major activities we have planned. It's all work in progress but I'm sharing here for transparency. We always appreciate feedback and suggestions!
šŸ„³ 4
Thread for ideas about spaceflights extension how-tos...
j
how to extend to use .env files for aws
šŸ‘€ 1
it would also be cool to have an example notebook that loads the data from the catalog
thankyou 1
n
@J. Camilo V. Tieck Did you hit any issue particular using
.env
? I am not sure if there are any specific about kedro.
j
you need to edit settings.py to load the .env. this is what I do:
Copy code
from kedro.config import TemplatedConfigLoader
CONFIG_LOADER_CLASS = TemplatedConfigLoader
import os
from dotenv import load_dotenv
load_dotenv()
# Keyword arguments to pass to the `CONFIG_LOADER_CLASS` constructor.
CONFIG_LOADER_ARGS = {
    "globals_pattern": "*globals.yml", # read the globals dictionary from project config
    "globals_dict": os.environ, # read environment variables -> set by .env
}
and then I template the credential file:
Copy code
estate_valuation_dev_s3bucket:
    client_kwargs:
      aws_access_key_id: ${AWS_ACCESS_KEY_ID}
      aws_secret_access_key: ${AWS_SECRET_ACCESS_KEY}
something like that, it took me a while to figure out
j
Thanks @J. Camilo V. Tieck for the suggestion. Just for my benefit, why would you want to do this? It's not a trick question btw, I don't work with Python often enough to know what you'd achieve with this and how to frame it to others as something they'd need.
n
Injecting environment variable into
globals_dict
is a common pattern. Even if you are not using
.env
file it could be still useful to inject variables. For example:
Copy code
import os

injected_variables = {}
for key, value in os.environ:
  if key.startswith("PROJECT_SPECIFIC_CONFIG"):
     injected_variables[key] = value

CONFIG_LOADER_ARGS = {
    "globals_pattern": "*globals.yml", # read the globals dictionary from project config
    "globals_dict": injected_variables, # read environment variables -> set by .env
}
@J. Camilo V. Tieck. I guess the unclear bit here is that itā€™s not clear where should you put this bit in?
Copy code
import os
from dotenv import load_dotenv
load_dotenv()
Do we have this documented somewhere? Cc @datajoely
thankyou 1
d
itā€™s definitely in a issue/discussion but when dow e say that the OmegaConf solution is the main one?
n
@J. Camilo V. Tieck FYI, with OmegaConfigLoader this will be a build-in solution. The solution is exactly like how you template currently.
Copy code
dev_s3:
  client_kwargs:
    aws_access_key_id: ${oc.env:AWS_ACCESS_KEY_ID}
    aws_secret_access_key: ${oc.env:AWS_SECRET_ACCESS_KEY}
https://docs.kedro.org/en/latest/configuration/advanced_configuration.html#how-to-load-credentials-through-environment-variables
c
Idea on extensions/how-tos: Incrementally convert existing project to Kedro I'm currently converting a non-Kedro project. To keep the existing project functional, I'm not replacing everything at once. ā€¢ I'm starting by building the data catalog for the
final_data_set
and working backwards from there. My first pipeline will be to concat
data_set1
,
data_set2
, ..., to create
final_data_set
. ā€¢ Next I'll wrap my entire existing pipeline in a single
node
before gradually refactoring the existing code to use more
nodes
and
pipelines
. While I would love to rewrite this project from scratch using Kedro, I'm trying to find the best way to adopt Kedro while retaining existing functionality. How exactly this should be done likely depends on how the original project is structured, but maybe there are some areas that would be common across all projects.
thankyou 1
šŸ‘ 1
šŸ‘šŸ¼ 2
n
@Chris Schopp +1 on the single node approach, I have done this many times already. The first step is to wrap the entire script in a pipeline, then split more node where you needed. i.e. Maybe you need to persist a particular file, so it makes sense to first convert that bit of code to a node and let Kedro handle the I/O. From there you can slowly convert more bits to Kedro.
thankyou 1
j
@Nok Lam Chan the way Iā€™m doing this is to add this code with the imports and everything in the src/my_package/settings.py file. the settings.py is usually all commented out.
Copy code
from kedro.config import TemplatedConfigLoader
CONFIG_LOADER_CLASS = TemplatedConfigLoader
import os
from dotenv import load_dotenv
load_dotenv()
# Keyword arguments to pass to the `CONFIG_LOADER_CLASS` constructor.
CONFIG_LOADER_ARGS = {
    "globals_pattern": "*globals.yml", # read the globals dictionary from project config
    "globals_dict": os.environ, # read environment variables -> set by .env
}
@Nok Lam Chan I tried using the OmegaConfigLoader, but couldnā€™t figure out where to put the load_dotenv() part, as you mentioned. I donā€™t know what is the ā€˜bestā€™ practice here. I guess you could add that part to the settings.py, an have the templating working for credentials.yml as you showed.
šŸ‘šŸ¼ 1
@Jo Stichbury good question. as @Nok Lam Chan mentioned, there are different use cases for this. for us, the main one, is to follow aws best practices with regard to keys and credentials. we always use .env files to store those, not only for kedro projects, that way we always know we are dealing with some aws deployment. this way, we can manage those credentials 100% independent of the code, for example, our devops team can setup the environment and deployment totally independent of the code.
n
Thanks! I think that makes a lot of sense. I think there is some overloading of the word ā€œenvironmentā€ here because Kedro has its own
environment
and thatā€™s where we usually suggest to keep the local credentials.