I m having issues because when I import GBQQueeyDataSet with Kedro #plugins-integrations

I'm having issues, because when I import GBQQueeyD...

Oscar Villa

02/06/2023, 8:22 PM

I'm having issues, because when I import GBQQueeyDataSet with the querie and project, everything run fine, taking the credentials in the env generated with

gcloud auth login

But when I run

kedro run

and its slicing variations or

kedro ipython

it rise problems with permisions.

datajoely

02/07/2023, 9:46 AM

can you post a stack trace?

datajoely

02/07/2023, 9:46 AM

and are you importing directly in Python or using the YAML api?

Oscar Villa

02/07/2023, 1:13 PM

I already erased all the thing (project and venv), because it was taking me so much time to solve it.

Oscar Villa

02/07/2023, 1:16 PM

I was follwing the documentation at https://kedro.readthedocs.io/en/stable/kedro.extras.datasets.pandas.GBQQueryDataSet.html without pass th credentials in the .yaml but previously logged on GCP by CLI. So, when I ran

kedro ipython

a issue with permision arise. And the same when I ran

kedro run

. But if I run

Copy code

from kedro.extras.datasets.pandas import GBQQueryDataSet

sql = "SELECT * FROM dataset_1.table_a"

data_set = GBQQueryDataSet(sql, project='my-project')

sql_data = data_set.load()

Oscar Villa

02/07/2023, 1:17 PM

changing the query to my proyect.dataset.table, the CLI credentials works well.

Oscar Villa

02/07/2023, 1:19 PM

What I was trying to do is to connect with Google BigQuery with the credentials in CLI, without paste the credentials in gcp-conections\conf\local\credentials.yml

datajoely

02/07/2023, 1:33 PM

you’re not providing credentials to the

GPQQueryDataSet

class

datajoely

02/07/2023, 1:34 PM

it accepts a credentials argument

datajoely

02/07/2023, 1:34 PM

https://kedro.readthedocs.io/en/stable/kedro.extras.datasets.pandas.GBQQueryDataSet.html#kedro.extras.datasets.pandas.GBQQueryDataSet

Oscar Villa

02/07/2023, 2:08 PM

Ok. So, it is mandatory to do it in that way: passing the credentials from the .yaml. I'll give it a try again.

datajoely

02/07/2023, 2:08 PM

no you can use the python api

datajoely

02/07/2023, 2:08 PM

but you do need to provide them

datajoely

02/07/2023, 2:09 PM

data_set = GBQQueryDataSet(sql, project='my-project')

this example only works if you’ve authenticated on an env level

datajoely

02/07/2023, 2:09 PM

credentials (*`Union`*[*`Dict`*[str
, Any
], Credentials
, None
]) – Credentials for accessing Google APIs. Either
google.auth.credentials.Credentials
object or dictionary with parameters required to instantiate
google.oauth2.credentials.Credentials
. Here you can find all the arguments: https://google-auth.readthedocs.io/en/latest/reference/google.oauth2.credentials.html

Oscar Villa

02/07/2023, 2:13 PM

If I add the call to

google.auth.credentials.Credentials

in the node code in Python, the dataset class will find the credentials in the env. Y prefere something like that, that do not imlly to put the credentials in the. yaml

datajoely

02/07/2023, 2:15 PM

You can also bypass authentication in Kedro and do it on an environment level https://stackoverflow.com/questions/35159967/setting-google-application-credentials-for-bigquery-python-cli

❤️ 1

datajoely

02/07/2023, 2:15 PM

https://stackoverflow.com/a/45862383/2010808

Oscar Villa

02/07/2023, 2:45 PM

Thank you so much. It was solved with

Copy code

gcloud auth application-default login

So, it's no need to store the credentials in the .yaml nor to do anything else. The data now is ingested in

kedro ipython

and when I run the pipeline with

kedro run:+1:

datajoely

02/07/2023, 2:45 PM

💪

datajoely

02/07/2023, 2:46 PM

the credentials file is designed to give you a consistent way of doing things if you don’t have an environmental version of doing this

datajoely

02/07/2023, 2:46 PM

so this is the preferred pattern

Oscar Villa

02/07/2023, 2:52 PM

I have no doubts about the consistency that Kedro give us. Thank you so much. Is becasue of that that I'm porting everything to Kedro framework. All this work around is just because of the IT department restrictions, due to security affairs.

K 1

datajoely

02/07/2023, 3:00 PM

awesome, shout if you have any other questions!

👍 1

3 Views

Open in Slack

Previous Next