Raghunath Nair
02/05/2024, 1:12 PMJuan Luis
02/05/2024, 1:18 PM/dbfs/mnt/container
? or does it only fail with Kedro?Juan Luis
02/05/2024, 1:18 PMkedro-datasets
versions you're usingRaghunath Nair
02/05/2024, 1:19 PMspark.SparkDataset
works fine issue and spark we are able to save with pandas.ParquetDataset
we cannot save gets Error from kedro: Failed while saving data to data set ParquetDataset(filepath=/dbfs/mnt/container, load_args={}, protocol=file, save_args={}).
[Errno 5] Input/output error: '/dbfs/mnt/container'Raghunath Nair
02/05/2024, 1:19 PM0.18.14
python is 3.10.11
on Databricks runtime 13.3 LTS ML (includes Apache Spark 3.4.1, Scala 2.12)
Ankita Katiyar
02/05/2024, 1:22 PMRaghunath Nair
02/05/2024, 1:22 PMShubham Agrawal
02/05/2024, 1:26 PMRaghunath Nair
02/05/2024, 1:27 PMazure storage account
As i mentioned in my issue. Saving to dbfs works but does not work on the dbfs/mnt/storageShubham Agrawal
02/05/2024, 1:27 PMRaghunath Nair
02/05/2024, 1:28 PMazure storage account
As i mentioned in my issue. Saving to dbfs works but does not work on the dbfs/mnt/storage. dbfs is not meant for storing data so, we want to push into a mounted storage account and we can't use it for our security reasons without mountingShubham Agrawal
02/05/2024, 1:29 PMRaghunath Nair
02/05/2024, 1:29 PMspark.SparkDataset
works fine pandas.ParquetDataset
none of them worksRaghunath Nair
02/05/2024, 1:30 PMShubham Agrawal
02/05/2024, 1:33 PMRaghunath Nair
02/05/2024, 1:38 PMdbfs/mnt
to dbfs
right?Juan Luis
02/05/2024, 1:47 PMimport pandas as pd
df = pd.DataFrame(...) # fill it with some data
df.to_parquet("/dbfs/mnt/container/test.pq")
if it fails, it will hopefully tell you more information (and also rule out that it's a Kedro problem). if it works, we'd need to keep investigating.Nok Lam Chan
02/05/2024, 2:21 PMNok Lam Chan
02/05/2024, 2:25 PMRaghunath Nair
02/05/2024, 2:35 PM[Errno 5] Input/output error: '/dbfs/mnt/teamdata/test.pq
@Nok Lam Chan @Juan LuisRaghunath Nair
02/05/2024, 2:38 PMJuan Luis
02/05/2024, 2:43 PMNok Lam Chan
02/05/2024, 2:57 PMopen
Right now you are trying to say it's not working with pandas, I tend to think it's the opposite, which is it's only working with spark but nothing else.
https://forums.linuxmint.com/viewtopic.php?t=396045 The error itself suggested it's a common problem in mount drive.Nok Lam Chan
02/05/2024, 2:58 PMRaghunath Nair
02/05/2024, 3:01 PMRaghunath Nair
02/05/2024, 3:03 PMpandas.ParquetDataset
FYI - @Juan Luis spark.SparkDataset
works perfectly fine in mounts and all pandas save fails with the same error"
Error from kedro: Failed while saving data to data set ParquetDataset(filepath=/dbfs/mnt/container, load_args={}, protocol=file, save_args={}).
[Errno 5] Input/output error: '/dbfs/mnt/container
Raghunath Nair
02/05/2024, 3:14 PMParquetDataset
hope its failing due to same cause?Juan Luis
02/05/2024, 3:22 PMimport pandas as pd
df = pd.DataFrame(...) # fill it with some data
df.to_parquet("/dbfs/mnt/container/test.pq")
fails and spark.SparkDataset
works, then again it's not a problem of kedro_datasets.pandas.ParquetDataset
, but pandas
itself. am I missing something?Raghunath Nair
02/05/2024, 3:24 PMNok Lam Chan
02/05/2024, 3:24 PMRaghunath Nair [3:01 PM]
@Nok Lam Chan open also gave the same errorI think this is sufficient to say that it's the mount drive issue rather than a specific library?
Nok Lam Chan
02/05/2024, 3:24 PMNok Lam Chan
02/05/2024, 3:25 PMRaghunath Nair
02/05/2024, 3:25 PMNok Lam Chan
02/05/2024, 3:28 PMNok Lam Chan
02/05/2024, 3:29 PMRaghunath Nair
02/05/2024, 3:30 PMNok Lam Chan
02/05/2024, 3:34 PMdbfs/
and dbfs/mnt
is two different things, which may explains why you can save in dbfs/
but not the dbfs/mnt
Nok Lam Chan
02/05/2024, 3:37 PMAzure Databricks enables users to mount cloud object storage to the Databricks File System (DBFS) to simplify data access patterns for users that are unfamiliar with cloud concepts. Mounted data does not work with Unity Catalog, and Databricks recommends migrating away from using mounts and managing data governance with Unity Catalog.https://learn.microsoft.com/en-us/azure/databricks/dbfs/mounts, you should also aware that databricks advise migrating away from mount storage.
Raghunath Nair
02/05/2024, 3:48 PM