Eduardo Romero López
09/17/2023, 10:32 AMdfs.write.format("iceberg").mode("append").save("default.IRIS_EDU_V3")
dfs2 = spark.table("spark_catalog.default.IRIS_EDU_V3")
dfs2.show(5)
+------------+-----------+------------+-----------+-------+------+
|sepal_length|sepal_width|petal_length|petal_width|variety|target|
+------------+-----------+------------+-----------+-------+------+
| 5.1| 3.5| 1.4| 0.2| setosa| 0|
| 4.9| 3.0| 1.4| 0.2| setosa| 0|
| 4.7| 3.2| 1.3| 0.2| setosa| 0|
| 4.6| 3.1| 1.5| 0.2| setosa| 0|
| 5.1| 3.8| 1.5| 0.3| setosa| 0|
+------------+-----------+------------+-----------+-------+------+
data_raw_minio:
type: spark.SparkDataSet
filepath: <s3://warehouse/default/IRIS_EDU_V3>
load_args:
schema:
sepal_length: double
sepal_width: double
petal_length: double
petal_width: double
variety: string
target: int
Deepyaman Datta
09/18/2023, 4:22 AMfile_format: iceberg
?Juan Luis
09/18/2023, 6:28 AMEduardo Romero López
09/18/2023, 7:34 AMdata_raw_minio:
type: pandas.CSVDataSet
filepath: <s3://prueba.concepto.eduardo/data/QUERES_DATCOM_2306071810.csv>
load_args:
sep: '|'
save_args:
index: False
encoding: "utf-8"
credentials: dev_minio
layer: raw
Juan Luis
09/18/2023, 7:42 AMEduardo Romero López
09/18/2023, 11:09 AMkedro jupyter lab
and run this code spark.table("spark_catalog.default.IRIS_EDU_V3").show(5)
and I get the data:Deepyaman Datta
09/18/2023, 1:16 PMbut I no sure how have to define the catalog when I have table with iceberg format stored in minio. (edited)Try specifying
file_format: iceberg
?
Did you try this, that I mentioned above? I don't see it in your SparkDataset
catalog entry, and Spark's not going to magically figure out it's Iceberg.Eduardo Romero López
09/18/2023, 1:52 PMspark.sql("CREATE TABLE default.IRIS_EDU_V5 (sepal_length double, sepal_width double, petal_length double, petal_width double, variety string, target int) USING iceberg OPTIONS ('write.object-storage.enabled'='true', 'write.data-path'='<s3://path_to_folder/>', 'write.format'='parquet');")
This creates in minio the structure files that show in the picture.
2. Then define the catalog is simple:
my_iceberg_table:
type: spark.SparkDataSet
filepath: <s3a://path_to_folder/IRIS_EDU_V5/data/*>
file_format: parquet
Juan Luis
09/18/2023, 3:20 PM