Hello Time goes by and I have more and more warnings when ru Kedro #questions

Hello! Time goes by and I have more and more warn...

Francis Duval

01/24/2024, 3:28 PM

Hello! Time goes by and I have more and more warnings when running

kedro run

and `kedro viz run`: ~~Starting Kedro Viz ...~~
~~Unable to get Charset 'cp65001' for property 'sun.stderr.encoding', using default windows-1252 and continuing.~~
~~Unable to get Charset 'cp65001' for property 'sun.stderr.encoding', using default windows-1252 and continuing.~~
~~Setting default log level to "WARN".~~
~~To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).~~
~~24/01/24 09:38:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable~~
~~24/01/24 09:38:29 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.~~
The more warnings I have, the longer it takes to run the commands. It nows takes more than one minute for kedro viz to render.🫠I was thus wondering if any of you had already encountered one of these and if yes, how you proceeded to fix them. Edit: I updated Hadoop to 3.3.6 (from 3.3.0), and I have only one warning left:

Starting Kedro Viz ...

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

24/01/24 11:39:49 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Nok Lam Chan

01/24/2024, 4:36 PM

Spark shouldn't even be initialised when you start viz

Francis Duval

01/24/2024, 4:52 PM

Do you think that an old version of Java (1.8.0_101) could slow down the rendering of the commands?

Nok Lam Chan

01/24/2024, 4:52 PM

I don't think Spark/Java should even be initialised

Nok Lam Chan

01/24/2024, 4:53 PM

As you are just starting viz, there shouldn't be any data load from Spark?

Francis Duval

01/24/2024, 4:55 PM

I don't even use spark in my code (Spark is not installed in my conda environment).

Francis Duval

01/24/2024, 4:55 PM

So I would say no

Nok Lam Chan

01/24/2024, 4:57 PM

Do you have idea where is that log coming from? it's not kedro

Francis Duval

01/24/2024, 5:09 PM

It looks like it comes from Spark 🤔

Nok Lam Chan

01/24/2024, 5:11 PM

But you mentioned you are not using Spark at all? can you check your

catalog.yml

to see if there are any spark things or external package that may bring some attention?

Francis Duval

01/24/2024, 5:12 PM

Oh yes I have this in spark.yml:

Copy code

# You can define spark specific configuration here.

spark.driver.maxResultSize: 3g
spark.hadoop.fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
spark.sql.execution.arrow.pyspark.enabled: true

# <https://docs.kedro.org/en/stable/integrations/pyspark_integration.html#tips-for-maximising-concurrency-using-threadrunner>
spark.scheduler.mode: FAIR

I think it's because I installed kedro-mlflow, and finally didn't use it.

Nok Lam Chan

01/24/2024, 5:16 PM

Can you try

kedro viz run --ignore-plugins

, I think this option only available recently, I am using the latest myself 7.1.0.

Francis Duval

01/24/2024, 5:19 PM

Yes it works! And I don't have the Spark warning anymore. So I guess I need to uninstall kedro-mlflow. Now it takes about 30 seconds for kedro viz to render.

Nok Lam Chan

01/24/2024, 5:26 PM

yep, if you don't need it just uninstall it. Optionally you can disable it in

settings.py

if you are sharing same conda env for different project

Francis Duval

01/24/2024, 5:28 PM

Perfect, thanks a lot!

Francis Duval

01/24/2024, 7:15 PM

I also found this in src/project_name/hooks.py:

Copy code

from kedro.framework.hooks import hook_impl
from pyspark import SparkConf
from pyspark.sql import SparkSession


class SparkHooks:
    @hook_impl
    def after_context_created(self, context) -> None:
        """Initialises a SparkSession using the config
        defined in project's conf folder.
        """

        # Load the spark configuration in spark.yaml using the config loader
        parameters = context.config_loader["spark"]
        spark_conf = SparkConf().setAll(parameters.items())

        # Initialise the spark session
        spark_session_conf = (
            SparkSession.builder.appName(context.project_path.name)
            .enableHiveSupport()
            .config(conf=spark_conf)
        )
        _spark_session = spark_session_conf.getOrCreate()
        _spark_session.sparkContext.setLogLevel("WARN")

I've never defined any hook, so I guess this has been created by some command I did.

Francis Duval

01/24/2024, 7:28 PM

So I just commented out these lines in settings.py, and now kedro viz works just fine (but still take ~35 seconds to render):

Copy code

# Instantiated project hooks.
from ibc_codes.hooks import SparkHooks  # comment that out

# Hooks are executed in a Last-In-First-Out (LIFO) order.
HOOKS = (SparkHooks(),) # comment that out

Nok Lam Chan

01/25/2024, 11:21 AM

when you create a new project, the prompt will ask you what extra tools do you need, the default is nothing so you probably choose

all

or spark?

Nok Lam Chan

01/25/2024, 11:25 AM

I test it on my end, it takes about 15-20 seconds to start render. It's not exactly fast but reasonable to me. Do you need to keep restart the process for some reason? It would be great if you can elaborate it? Cc @Nero Okwa

Francis Duval

01/26/2024, 5:48 PM

Yes I chose

all

! I'll try with the default, thanks!

👍🏼 1

Nero Okwa

02/01/2024, 11:41 AM

Thanks @Francis Duval for the feedback. Surprised that Kedro-Viz takes up to 35 seconds to render. How many nodes do you have in your project? CC @Rashida Kanchwala

Francis Duval

02/01/2024, 3:07 PM

48 nodes, 74 datasets, but it was rendering slowly even when my project was small!

Nero Okwa

03/26/2024, 11:09 AM

CC @Rashida Kanchwala & @Stephanie Kaiser

Rashida Kanchwala

03/26/2024, 11:24 AM

we are releasing a fix for this. in the meantime can u try doing

kedro viz run --ignore-plugins

15 Views

Open in Slack

Previous Next