Hi Kedro Team...Getting attached error when we sub...
# questions
b
Hi Kedro Team...Getting attached error when we submit job in dataproc cluster to run a Data Engineering pipeline, we have a datafile in ".txt.gz" format. Same if we run it in .master(local[*]) , it works fine. but fails when we submit with saprk.master:yarn and spark.submit.deploymentmode: client Any idea where it is going wrong?
d
it’s a bit hard to work out from this
where is the data being persisted?
b
data is in GCS bucket
d
and it works okay when run from a single node, but not when distributed?
and if you exclude this
txt.gz
file it works correctly?
b
yes..it works in single node. I need to load this file for further pipeline runs, so couldnot exclude this file
d
I’m unsure on how to deal with this - are you using the ThreadRunner?
b
no, Is there a way we can connect and I can show you what is happening
d
We’re really outside of my area of expertise here unfortunately
our current view of best practice is here
o
hi @Balachandran Ponnusamy looks like the distributed cluster might have security permissions missing to access the data. Have you check on that?