Hi all!, I am trying to load a iris table like an...
# questions
e
Hi all!, I am trying to load a iris table like an example. I have created with the following code:
Copy code
dfs.write.format("iceberg").mode("append").save("default.IRIS_EDU_V3")

dfs2 = spark.table("spark_catalog.default.IRIS_EDU_V3")
dfs2.show(5)

+------------+-----------+------------+-----------+-------+------+
|sepal_length|sepal_width|petal_length|petal_width|variety|target|
+------------+-----------+------------+-----------+-------+------+
|         5.1|        3.5|         1.4|        0.2| setosa|     0|
|         4.9|        3.0|         1.4|        0.2| setosa|     0|
|         4.7|        3.2|         1.3|        0.2| setosa|     0|
|         4.6|        3.1|         1.5|        0.2| setosa|     0|
|         5.1|        3.8|         1.5|        0.3| setosa|     0|
+------------+-----------+------------+-----------+-------+------+
I would like to load it since catalog but I can't do it. I have tested different ways but it doesn't work. Is it possible load data since a table was created with iceberg format in minio?? thanks you in advance
image.png
Copy code
data_raw_minio:
  type: spark.SparkDataSet
  filepath: <s3://warehouse/default/IRIS_EDU_V3>
  load_args:
    schema:
      sepal_length: double
      sepal_width: double
      petal_length: double
      petal_width: double
      variety: string
      target: int
d
Try specifying
file_format: iceberg
?
j
is the problem the Iceberg format or the Minio access? I have used Minio with Kedro before but had to properly configure the credentials
e
I think that it's a problem with format iceberg because I have this entry that read csv from minio in catalog and works:
Copy code
data_raw_minio:
  type: pandas.CSVDataSet
  filepath: <s3://prueba.concepto.eduardo/data/QUERES_DATCOM_2306071810.csv>
  load_args:
    sep: '|'
  save_args:
    index: False
    encoding: "utf-8"
  credentials: dev_minio
  layer: raw
but I no sure how have to define the catalog when I have table with iceberg format stored in minio.
j
hmmm okay, that helps. does iceberg work locally, without Minio? (Not familiar with the format, unsure if this makes sense)
e
I can't solve it through the catalog but, can I add it to the catalog in another way?
I run
kedro jupyter lab
and run this code
spark.table("spark_catalog.default.IRIS_EDU_V3").show(5)
and I get the data:
d
but I no sure how have to define the catalog when I have table with iceberg format stored in minio. (edited)
Try specifying
file_format: iceberg
? Did you try this, that I mentioned above? I don't see it in your
SparkDataset
catalog entry, and Spark's not going to magically figure out it's Iceberg.
e
@Deepyaman Datta yes, I tried and it didn't work. Thank and sorry for not answering you sooner.
I have solved it, I describe it below: 1. I created again the table and I have indicated write in parquet
Copy code
spark.sql("CREATE TABLE default.IRIS_EDU_V5 (sepal_length double, sepal_width double, petal_length double, petal_width double, variety string, target int) USING iceberg OPTIONS ('write.object-storage.enabled'='true', 'write.data-path'='<s3://path_to_folder/>', 'write.format'='parquet');")
This creates in minio the structure files that show in the picture. 2. Then define the catalog is simple:
Copy code
my_iceberg_table:
  type: spark.SparkDataSet
  filepath: <s3a://path_to_folder/IRIS_EDU_V5/data/*>
  file_format: parquet
👍 1
thank you so much @Juan Luis @Deepyaman Datta
j
glad you solved it @Eduardo Romero López!