Hi all I am trying to load a iris table like an example I ha Kedro #questions

Hi all!, I am trying to load a iris table like an...

Eduardo Romero López

09/17/2023, 10:32 AM

Hi all!, I am trying to load a iris table like an example. I have created with the following code:

Copy code

dfs.write.format("iceberg").mode("append").save("default.IRIS_EDU_V3")

dfs2 = spark.table("spark_catalog.default.IRIS_EDU_V3")
dfs2.show(5)

+------------+-----------+------------+-----------+-------+------+
|sepal_length|sepal_width|petal_length|petal_width|variety|target|
+------------+-----------+------------+-----------+-------+------+
|         5.1|        3.5|         1.4|        0.2| setosa|     0|
|         4.9|        3.0|         1.4|        0.2| setosa|     0|
|         4.7|        3.2|         1.3|        0.2| setosa|     0|
|         4.6|        3.1|         1.5|        0.2| setosa|     0|
|         5.1|        3.8|         1.5|        0.3| setosa|     0|
+------------+-----------+------------+-----------+-------+------+

Eduardo Romero López

09/17/2023, 10:37 AM

I would like to load it since catalog but I can't do it. I have tested different ways but it doesn't work. Is it possible load data since a table was created with iceberg format in minio?? thanks you in advance

Eduardo Romero López

09/17/2023, 10:38 AM

Eduardo Romero López

09/17/2023, 10:39 AM

Copy code

data_raw_minio:
  type: spark.SparkDataSet
  filepath: <s3://warehouse/default/IRIS_EDU_V3>
  load_args:
    schema:
      sepal_length: double
      sepal_width: double
      petal_length: double
      petal_width: double
      variety: string
      target: int

Deepyaman Datta

09/18/2023, 4:22 AM

Try specifying

file_format: iceberg

Juan Luis

09/18/2023, 6:28 AM

is the problem the Iceberg format or the Minio access? I have used Minio with Kedro before but had to properly configure the credentials

Juan Luis

09/18/2023, 6:29 AM

for example https://github.com/kedro-org/kedro-viz/pull/1286#issuecomment-1560915585

Eduardo Romero López

09/18/2023, 7:34 AM

I think that it's a problem with format iceberg because I have this entry that read csv from minio in catalog and works:

Copy code

data_raw_minio:
  type: pandas.CSVDataSet
  filepath: <s3://prueba.concepto.eduardo/data/QUERES_DATCOM_2306071810.csv>
  load_args:
    sep: '|'
  save_args:
    index: False
    encoding: "utf-8"
  credentials: dev_minio
  layer: raw

Eduardo Romero López

09/18/2023, 7:35 AM

but I no sure how have to define the catalog when I have table with iceberg format stored in minio.

Juan Luis

09/18/2023, 7:42 AM

hmmm okay, that helps. does iceberg work locally, without Minio? (Not familiar with the format, unsure if this makes sense)

Eduardo Romero López

09/18/2023, 11:09 AM

I can't solve it through the catalog but, can I add it to the catalog in another way?

Eduardo Romero López

09/18/2023, 11:11 AM

I run

kedro jupyter lab

and run this code

spark.table("spark_catalog.default.IRIS_EDU_V3").show(5)

and I get the data:

Deepyaman Datta

09/18/2023, 1:16 PM

but I no sure how have to define the catalog when I have table with iceberg format stored in minio. (edited)

Try specifying

file_format: iceberg

? Did you try this, that I mentioned above? I don't see it in your

SparkDataset

catalog entry, and Spark's not going to magically figure out it's Iceberg.

Eduardo Romero López

09/18/2023, 1:52 PM

@Deepyaman Datta yes, I tried and it didn't work. Thank and sorry for not answering you sooner.

Eduardo Romero López

09/18/2023, 3:05 PM

I have solved it, I describe it below: 1. I created again the table and I have indicated write in parquet

Copy code

spark.sql("CREATE TABLE default.IRIS_EDU_V5 (sepal_length double, sepal_width double, petal_length double, petal_width double, variety string, target int) USING iceberg OPTIONS ('write.object-storage.enabled'='true', 'write.data-path'='<s3://path_to_folder/>', 'write.format'='parquet');")

This creates in minio the structure files that show in the picture. 2. Then define the catalog is simple:

Copy code

my_iceberg_table:
  type: spark.SparkDataSet
  filepath: <s3a://path_to_folder/IRIS_EDU_V5/data/*>
  file_format: parquet

👍 1

Eduardo Romero López

09/18/2023, 3:06 PM

thank you so much @Juan Luis @Deepyaman Datta

Juan Luis

09/18/2023, 3:20 PM

glad you solved it @Eduardo Romero López!

5 Views

Open in Slack

Previous Next