Francis Duval
01/24/2024, 3:28 PMkedro run
and `kedro viz run`:
Starting Kedro Viz ...
Unable to get Charset 'cp65001' for property 'sun.stderr.encoding', using default windows-1252 and continuing.
Unable to get Charset 'cp65001' for property 'sun.stderr.encoding', using default windows-1252 and continuing.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/01/24 09:38:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
24/01/24 09:38:29 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Starting Kedro Viz ...
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/01/24 11:39:49 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Nok Lam Chan
01/24/2024, 4:36 PMFrancis Duval
01/24/2024, 4:52 PMNok Lam Chan
01/24/2024, 4:52 PMNok Lam Chan
01/24/2024, 4:53 PMFrancis Duval
01/24/2024, 4:55 PMFrancis Duval
01/24/2024, 4:55 PMNok Lam Chan
01/24/2024, 4:57 PMFrancis Duval
01/24/2024, 5:09 PMNok Lam Chan
01/24/2024, 5:11 PMcatalog.yml
to see if there are any spark things or external package that may bring some attention?Francis Duval
01/24/2024, 5:12 PM# You can define spark specific configuration here.
spark.driver.maxResultSize: 3g
spark.hadoop.fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
spark.sql.execution.arrow.pyspark.enabled: true
# <https://docs.kedro.org/en/stable/integrations/pyspark_integration.html#tips-for-maximising-concurrency-using-threadrunner>
spark.scheduler.mode: FAIR
I think it's because I installed kedro-mlflow, and finally didn't use it.Nok Lam Chan
01/24/2024, 5:16 PMkedro viz run --ignore-plugins
, I think this option only available recently, I am using the latest myself 7.1.0.Francis Duval
01/24/2024, 5:19 PMNok Lam Chan
01/24/2024, 5:26 PMsettings.py
if you are sharing same conda env for different projectFrancis Duval
01/24/2024, 5:28 PMFrancis Duval
01/24/2024, 7:15 PMfrom kedro.framework.hooks import hook_impl
from pyspark import SparkConf
from pyspark.sql import SparkSession
class SparkHooks:
@hook_impl
def after_context_created(self, context) -> None:
"""Initialises a SparkSession using the config
defined in project's conf folder.
"""
# Load the spark configuration in spark.yaml using the config loader
parameters = context.config_loader["spark"]
spark_conf = SparkConf().setAll(parameters.items())
# Initialise the spark session
spark_session_conf = (
SparkSession.builder.appName(context.project_path.name)
.enableHiveSupport()
.config(conf=spark_conf)
)
_spark_session = spark_session_conf.getOrCreate()
_spark_session.sparkContext.setLogLevel("WARN")
I've never defined any hook, so I guess this has been created by some command I did.Francis Duval
01/24/2024, 7:28 PM# Instantiated project hooks.
from ibc_codes.hooks import SparkHooks # comment that out
# Hooks are executed in a Last-In-First-Out (LIFO) order.
HOOKS = (SparkHooks(),) # comment that out
Nok Lam Chan
01/25/2024, 11:21 AMall
or spark?Nok Lam Chan
01/25/2024, 11:25 AMFrancis Duval
01/26/2024, 5:48 PMall
! I'll try with the default, thanks!Nero Okwa
02/01/2024, 11:41 AMFrancis Duval
02/01/2024, 3:07 PMNero Okwa
03/26/2024, 11:09 AMRashida Kanchwala
03/26/2024, 11:24 AMkedro viz run --ignore-plugins