Lasith Adhikari
08/15/2024, 4:24 PMspark.SparkDataSet
from an S3 bucket without issues for up to 1 hour in Kedro. However, when I run a node that requires more than 1 hour to process, my Kedro job is aborted after 1 hour and throws the following error. I am using Kedro 0.18.7. Please let me know if you have any clues regarding this issue. Is there any timeout-related setting in the AWS SDK used by Kedro? Thank you!
24/08/06 14:42:45 ERROR Utils: Aborting task
org.apache.hadoop.fs.s3a.AWSBadRequestException: getFileStatus on <s3a://processed-data/projects/input_data/vitals.parquet/_temporary/0/_temporary/attempt_20240806144231705394881749042866_0023_m_000025_1754/hospitalID=10/unitAdmitYear=2018/part-00025-b0bdae35-5932-4179-a258-75ce64d1d156.c000.snappy.parquet>: com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: HTNMAAWDQTDPKXBJ; S3 Extended Request ID: 5Il7bsSCCQBmf/sr84F5/S3tAlPtcINtVxTMCVmRAtU23j39Nu9Q0VGcMryPOMR8Gku7ueGrEMY=; Proxy: null), S3 Extended Request ID: 5Il7bsSCCQBmf/sr84F5/S3tAlPtcINtVxTMCVmRAtU23j39Nu9Q0VGcMryPOMR8Gku7ueGrEMY=:400 Bad Request: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: HTNMAAWDQTDPKXBJ; S3 Extended Request ID: 5Il7bsSCCQBmf/sr84F5/S3tAlPtcINtVxTMCVmRAtU23j39Nu9Q0VGcMryPOMR8Gku7ueGrEMY=; Proxy: null)…
Nok Lam Chan
08/15/2024, 6:42 PMLasith Adhikari
08/16/2024, 2:10 PMcredentials: dev_s3
) I am getting the same 1-hour timeout issue regardless of whether I put the credentials in the data catalog or simply export them in the terminal.