Pascal Brokmeier
08/02/2024, 1:18 PMbase
?
Context, we are setting
spark.hadoop.google.cloud.auth.service.account.enable: true
spark.hadoop.google.cloud.auth.service.account.json.keyfile: conf/local/service-account.json
to allow reading the real data in GCS from local. But our prod env leverage machine authentication is the default unless the service.account.enable
is set. We seem to be unable to force GCS to leverage spark.hadoop.google.cloud.auth.type: COMPUTE_ENGINE
and now our prod env keeps looking for the service-account.json which obv doesn't exist.
If possible, we want to avoid writing ugly if/else statements in our spark hook dependent on environment so welcoming any ideas 🤞datajoely
08/02/2024, 1:52 PMOmegaConf
resolver, but I don’t think an if else in the hook is a terrible solutionPascal Brokmeier
08/02/2024, 1:54 PM# DEBT ugly fix, ideally we overwrite this in the spark.yml config file but currently no
# known way of doing so
# if prod environment, remove all config keys that start with spark.hadoop.google.cloud.auth.service
if context.env == "prod":
parameters = {
k: v
for k, v in parameters.items()
if not k.startswith("spark.hadoop.google.cloud.auth.service")
}
it works but that doesn't make it good 😄 my OCD can't handle the hack'inessdatajoely
08/02/2024, 1:56 PMdatajoely
08/02/2024, 1:57 PM