Winston Ong
04/08/2025, 3:43 PMkedro run --pipeline data_processing --env=production
from the spaceflights-pandas starter.
DatasetError: Failed while loading data from dataset CSVDataset(filepath=bucket-name/companies.csv, load_args={}, protocol=s3,
save_args={'index': False}).
Forbidden
conf/production/catalog.yml:
companies:
type: pandas.CSVDataset
filepath: <s3://bucket-name/companies.csv>
credentials: prod_s3
reviews:
type: pandas.CSVDataset
filepath: <s3://bucket-name/reviews.csv>
credentials: prod_s3
shuttles:
type: pandas.ExcelDataset
filepath: <s3://bucket-name/shuttles.xlsx>
load_args:
engine: openpyxl
credentials: prod_s3
conf/production/credentials.yml:
prod_s3:
client_kwargs:
aws_access_key_id: <<access_key>>
aws_secret_access_key: <<secret_access_key>>
I'm quite sure my credentials are correct and bucket access is okay, because I ran the following script and I am able to retrieve the file.
import boto3
s3 = boto3.client(
's3',
aws_access_key_id='<<access_key>>',
aws_secret_access_key='<<secret_access_key>>'
)
response = s3.get_object(Bucket='bucket-name', Key='companies.csv')
print(response['Body'].read().decode())
datajoely
04/08/2025, 4:16 PMkedro jupyter notebook
and doing from kedro_datasets.pandas import CSVDataset
. If you can get that working with .load()
you've basically done the same thing as Kedro does behind the scenes.
If you can get it working with boto it has to be a slight config issueWinston Ong
04/09/2025, 12:14 AM