charles
04/20/2023, 12:35 PMlocals/parameters.yml
file?
catalog:
parsed_documents: # Just one document for now.
type: json.JSONDataSet
filepath: '<s3://mybucket/${env}/myjson.json>
local/parameters.yml file entry: env: "main"
in kedro ipython trying to load i am getting:
DataSetError: Failed while loading data from data set JSONDataSet(filepath=mybucket/${env}/myjson.json, protocol=s3, save_args={'indent': 2}).
mybucket/${env}/myjson.json
Leo Cunha
04/20/2023, 12:56 PMcli.py
?Merel
04/20/2023, 3:17 PMpyspark
3.4.0 was released on the 13th of April and has broken our pyspark-iris
.
I’ve written up my findings so far in an issue: https://github.com/kedro-org/kedro-starters/issues/123 but it could be I’ve been approaching this all wrong and I’ve now reached the point where I could really use some help figuring out what is going on 🙏Beltra909
04/21/2023, 7:06 AMDataSetError: Failed while loading data from data set
ParquetDataSet(filepath=<my file_path>,
load_args={'engine': pyarrow}, protocol=s3, save_args={'engine': pyarrow}).
AioSession.__init__() got an unexpected keyword argument 'target_options'
. I have tried with different versions of fsspec, s3fs, kedro and python and I get the same issue. Here is what I am using currently: Python 3.10.10, Kedro 0.18.7, s3fs 2023.3.0, fsspec 2023.3.0, aiobotocore 2.4.2, pandas 1.5.3. Pip check does not show any broken requirements. Has anyone experienced this problem before? Extensive googling didn't show any result....Si Yan
04/21/2023, 8:11 PMRob
04/22/2023, 6:09 PMstorage_type
, this is my how globals
YAML looks like:
storage_mode: "local"
storage:
local: "data/"
gcp: "<gs://my-bucket/data/>"
data:
{% if storage_mode == 'local' %}
storage_type: ${storage.local}
{% elif storage_mode == 'gcp' %}
storage_type: ${storage.gcp}
{% endif %}
player_tags: ${storage_type}/01_player_tags
raw_battlelogs: ${storage_type}/02_raw_battlelogs
raw_metadata: ${storage_type}/03_raw_metadata
enriched_data: ${storage_type}/04_enriched_data
curated_data: ${storage_type}/05_curated_data
viz_data: ${storage_type}/06_viz_data
feature_store: ${storage_type}/07_feature_store
model_registry: ${storage_type}/08_model_registry
I'm not familiar with this type of syntax, and I'm getting a ScannerError
Jason
04/24/2023, 1:33 PMdataset1
|--01_raw
|--02_intermediate
|--03_primary
|--...
dataset2
|--01_raw
|--02_intermediate
|--03_primary
|--...
Giulio Morina
04/25/2023, 10:51 AMBalazs Konig
04/25/2023, 4:49 PMClaire BAUDIER
04/26/2023, 8:47 AMparams
», but using a file different from the default parameters.yml
file. Here is what I have in mind based on one of the documentation examples:
from kedro.config import ConfigLoader
from kedro.framework.project import settings
conf_path = str(project_path / settings.CONF_SOURCE)
conf_loader = ConfigLoader(conf_source=conf_path, env="local")
params = conf_loader.get(« other_parameters_file.yml")
# in node definition
def increase_volume(volume, step):
return volume + step
# in pipeline definition
node(
func=increase_volume,
inputs=["input_volume", "params:step_size"],
outputs="output_volume",
)
And the parameter step_size
would be in the other_parameters_file.yml.
My question is to know whether it is feasible with kedro to do that ? If so, how should it be done ?
Thanks a lot for your help !Iñigo Hidalgo
04/26/2023, 3:16 PMsimple_conn_pt_model_filter_predict:
date_column: date
window_length: 0d
gap: 0d
check_groups: null
continue_if_missing: true
I am trying to edit the parameter gap
through kedro run --pipeline ... --params=...
, but I need to overwrite the whole dictionaryJuan Diego
04/26/2023, 3:42 PMkedro package
? It will be useful when used to raise an error when doesn’t meet the one expected for a launcher.Agnaldo Luiz
04/27/2023, 12:04 PM#credentials.yml
win_user: 'user01'
#catalog.yml
data:
type: pandas.ExcelDataSet
filepath: C:\Users\${win_user}\data.xlsx
Rishabh Kasat
04/27/2023, 2:08 PMkedro.framework.cli.utils.KedroCliError: No module named 'pyspark_llap'
Run with --verbose to see the full exception
Error: No module named 'pyspark_llap'
Season Yang
04/27/2023, 4:03 PMipython
and would love to get help from the team. Under the same release 0.18.7 for both kedro and kedro-starter with python 3.8, kedro provides ipython~=8.1
(https://github.com/kedro-org/kedro/blob/main/test_requirements.txt#L22) while kedro-starter’s pyspark restrict ipython>=7.31.1, <8.0
(https://github.com/kedro-org/kedro-starters/blob/main/pyspark/%7B%7B%20cookiecutter.repo_name%20%7D%7D/src/requirements.txt#L3)
Would really appreciate any help on this! Thank you in advance!Kelsey Sorrels
04/27/2023, 10:56 PMJo Stichbury
04/28/2023, 4:11 PMDarshan
04/29/2023, 5:55 AMRob
04/29/2023, 10:01 PMcatalog.yml
for a parquet of type spark.SparkDataSet
?
I'm trying to use the .json
file from Google Cloud but I'm having problems not knowing how to define it in the catalog
Thanks in advance 🙂Darshan
04/30/2023, 6:51 AMcompanies:
type: pandas.CSVDataSet
filepath: s3://<your-bucket>/companies.csv
This is a sample provided by Kedro with the aws step function, might be useful.Sebastian Cardona Lozano
05/01/2023, 5:03 PMVandana Malik
05/02/2023, 9:34 AMHOOKS = (ProjectHooks(),DataValidationHook())
CONTEXT_CLASS = ProjectContext
context.py-
class ProjectContext(KedroContext):
"""Project context.
Users can override the remaining methods from the parent class here,
or create new ones (e.g. as required by plugins)
"""
hooks = ProjectHooks()
def __init__(
self,
package_name: str,
project_path: Union[Path, str],
env: str = None,
extra_params: Dict[str, Any] = None,
):
"""Init class."""
super().__init__(package_name, project_path, env, extra_params)
self.hooks = DataValidationHook()
self._spark_session = None
self._experiment_tracker = None
self._setup_env_variables()
self._init_common_env_vars()
self.init_spark_session()
Can you guide me where I can look or modify in order to check why hooks are not runningJordan
05/02/2023, 11:16 AM%load_ext kedro.ipython
.
However, in a standalone file when I am creating the catalog as follows:
from kedro.framework.session import KedroSession
from kedro.framework.startup import bootstrap_project
project_path = Path(".").resolve()
metadata = bootstrap_project(project_path)
with KedroSession.create(metadata.package_name, project_path) as session:
context = session.load_context()
catalog = context.catalog
data = catalog.load("my_metrics")
I get the following error:
DataSetError: Loading not supported for 'MetricsDataSet'
If this is true, why does it load in a notebook?Adrien
05/02/2023, 11:34 AM<http://com.google.cloud.ai|com.google.cloud.ai>.platform.common.errors.AiPlatformException: code=RESOURCE_EXHAUSTED, message=The following quota metrics exceed quota limits: <http://aiplatform.googleapis.com/custom_model_training_cpus|aiplatform.googleapis.com/custom_model_training_cpus>, cause=null; Failed to create custom job for the task. Task: Project number: 496232377396, Job id: 1189445081858310144, Task id: 6444159035313750016, Task name: preprocess-shuttles-node, Task state: DRIVER_SUCCEEDED, Execution name: projects/496232377396/locations/europe-west1/metadataStores/default/executions/14295685814278275726; Failed to create external task or refresh its state. Task:Project number: 496232377396, Job id: 1189445081858310144, Task id: 6444159035313750016, Task name: preprocess-shuttles-node, Task state: DRIVER_SUCCEEDED, Execution name: projects/496232377396/locations/europe-west1/metadataStores/default/executions/14295685814278275726; Failed to handle the pipeline task. Task: Project number: 496232377396, Job id: 1189445081858310144, Task id: 6444159035313750016, Task name: preprocess-shuttles-node, Task state: DRIVER_SUCCEEDED, Execution name: projects/496232377396/locations/europe-west1/metadataStores/default/executions/14295685814278275726
I check the quotas specified but it's not the problem because it's set to 1 and I specify 0.2 cpus for each node (kedro vertexai starter guide). I think it come from gcp but i know know witch configuration to update.
Someone has an explaination / face the same bug ? I'm on this issue for days and i can't find the solution...Thaiza
05/02/2023, 11:54 AMAfaque Ahmad
05/02/2023, 11:59 AMv0.16.x
to 0.18.7
. Is there a checklist of steps that I can follow for a smooth migration?fmfreeze
05/02/2023, 5:22 PMFlavien
05/03/2023, 10:26 AMkedro
project on Databricks (and have good hope to convince my team to go for kedro
). The documentation is very well written, thanks for that. Scrolling through the messages in Slack, I did not find a way to directly use the object spark
, the SparkSession
provided directly in the Databricks notebooks. Is there any way to do so?Vandana Malik
05/03/2023, 10:37 AMimport os
from kedro.framework.session import KedroSession
from kedro.framework.startup import bootstrap_project
from kedro.runner import SequentialRunner
from hooks import ControlTableHooks
if __name__ == "__main__":
bootstrap_project(os.path.abspath(os.environ.get("PROJECT_PATH")))
os.chdir(os.environ.get("PROJECT_PATH"))
with KedroSession.create(env=os.environ.get("kedro_environment")) as session:
runner = SequentialRunner()
context = session.load_context()
pipeline = context.pipelines[os.environ.get("pipeline_name")]
catalog = context.catalog
runner.run(pipeline, catalog)
result_dict = {"message": "Success"}
any helpPavan Naidu
05/03/2023, 10:10 PM