Hey team, I have a problem when converting a spark...
# questions
l
Hey team, I have a problem when converting a spark written parquet file into a pandas parquet file. When reading the file with @spark it works, but when reading the catalog entry with @pandas I get the error , “No such file or directory: ’/dbfs/…“. Any hints to resolve this issue?
image.png
@Shubham Agrawal
d
If your path starts with
/dbfs
, you are using Databricks dbutils in order to handle the reading. Are you able to access the file using pandas (e.g.
pd.read_csv("/dbfs/whatever/kvi_metrics/item/merged_metrics/merged_metrics.parquet")
)? I'm guessing not, so then you need to sort out your ability to access that path not using Databricks. Also, need some more info--are you running this from your local using some sort of remote execution? Pandas code runs locally and will not execute on the cluster in that case (on the off chance you're doing that). (Not really related, but are you writing a single partition? The path looks awkward, given I'd expect Spark will create a folder structure under
.../merged_metrics/merged_metrics.parquet/
-- but I could be wrong, since I haven't used Spark in forever)
l
thanks! So yeah the conclusion is also mentioned here https://community.databricks.com/s/question/0D53f00001HKHuwCAH/read-file-from-dbfs-with-pdreadcsv-using-databricksconnect It is not possible since, pd dataframes are only executed locally and therefore cannot stored on dbfs. One can link them to an S3 bucket or ADLStoreage. this takes some more steps
👍 1