Hey team I have a problem when converting a spark written pa Kedro #questions

Hey team, I have a problem when converting a spark...

Luca Disse

04/19/2023, 2:32 PM

Hey team, I have a problem when converting a spark written parquet file into a pandas parquet file. When reading the file with @spark it works, but when reading the catalog entry with @pandas I get the error , “No such file or directory: ’/dbfs/…“. Any hints to resolve this issue?

Luca Disse

04/19/2023, 2:33 PM

Luca Disse

04/19/2023, 2:35 PM

@Shubham Agrawal

Deepyaman Datta

04/19/2023, 4:07 PM

If your path starts with

/dbfs

, you are using Databricks dbutils in order to handle the reading. Are you able to access the file using pandas (e.g.

pd.read_csv("/dbfs/whatever/kvi_metrics/item/merged_metrics/merged_metrics.parquet")

)? I'm guessing not, so then you need to sort out your ability to access that path not using Databricks. Also, need some more info--are you running this from your local using some sort of remote execution? Pandas code runs locally and will not execute on the cluster in that case (on the off chance you're doing that). (Not really related, but are you writing a single partition? The path looks awkward, given I'd expect Spark will create a folder structure under

.../merged_metrics/merged_metrics.parquet/

-- but I could be wrong, since I haven't used Spark in forever)

Luca Disse

04/20/2023, 1:13 PM

thanks! So yeah the conclusion is also mentioned here https://community.databricks.com/s/question/0D53f00001HKHuwCAH/read-file-from-dbfs-with-pdreadcsv-using-databricksconnect It is not possible since, pd dataframes are only executed locally and therefore cannot stored on dbfs. One can link them to an S3 bucket or ADLStoreage. this takes some more steps

👍 1

2 Views

Open in Slack

Previous Next