Ricardo Araújo
03/10/2023, 7:21 PMIan Whalen
03/10/2023, 7:23 PMParallelRunner
) you’d probably have to use some black magic which wouldn’t be too nice
Maybe this is helpful: https://stackoverflow.com/questions/29589327/train-multiple-models-in-parallel-with-sklearnRicardo Araújo
03/10/2023, 10:28 PMDeepyaman Datta
03/10/2023, 11:08 PMRicardo Araújo
03/10/2023, 11:16 PMconsumer
that gets a single number and do something with it. We also have generator
which generates a list of numbers of unknown size. It should be possible to connect these two by saying that generator
creates a sequence of outputs that consumer
can use, and Kedro under the hood would take care of this by instantiating multiple consumer
pipelines, each being passed an element provided by generator
.Deepyaman Datta
03/10/2023, 11:31 PMRicardo Araújo
03/10/2023, 11:34 PMkedro.runner.TaskRunner
that kinda would do the trick. It allowed for a pipeline name to be passed and instantiated inside a node with remapped inputs.Deepyaman Datta
03/12/2023, 2:05 PMRicardo Araújo
03/12/2023, 2:21 PMDeepyaman Datta
03/12/2023, 2:28 PMRicardo Araújo
03/12/2023, 2:30 PMkedro.framework.session.get_current_session
seems to have been deprecatedDeepyaman Datta
03/13/2023, 12:27 AMRicardo Araújo
03/13/2023, 11:30 AMfrom kedro.framework.project import configure_project
from kedro.framework.session import KedroSession
with KedroSession.create(package_name="<project_name>") as session:
context = session.load_context()
catalog = context.catalog
catalog.load('data')
will work.Deepyaman Datta
03/13/2023, 11:31 AMwith KedroSession.create() as session:
context = session.load_context()
catalog = context.catalog
Ricardo Araújo
03/13/2023, 11:32 AMDeepyaman Datta
03/13/2023, 11:32 AMpackage_name
or import configure_project
, or at least I didn't)Ricardo Araújo
03/13/2023, 11:32 AMDeepyaman Datta
03/13/2023, 11:34 AMPartitionedDataSet
can be a good way to be able to define once in a structure for each dataset, but that requires writing the pipeline in a different way and has it's own limitations, so I think what you say is good for now.(diamonds) deepyaman@Deepyamans-MacBook-Air spaceflights % kedro run
[03/13/23 07:29:03] INFO Kedro project spaceflights session.py:340
[03/13/23 07:29:04] WARNING /opt/miniconda3/envs/diamonds/lib/python3.10/site-packages/kedro/framework/project/__init__.py:359: UserWarning: An error occurred while importing the warnings.py:109
'spaceflights.pipelines.data_science' module. Nothing defined therein will be returned by 'find_pipelines'.
Traceback (most recent call last):
File "/opt/miniconda3/envs/diamonds/lib/python3.10/site-packages/kedro/framework/project/__init__.py", line 357, in find_pipelines
pipeline_module = importlib.import_module(pipeline_module_name)
File "/opt/miniconda3/envs/diamonds/lib/python3.10/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/Users/deepyaman/github/kedro-org/kedro/spaceflights/src/spaceflights/pipelines/data_science/__init__.py", line 3, in <module>
from .pipeline import create_pipeline # NOQA
File "/Users/deepyaman/github/kedro-org/kedro/spaceflights/src/spaceflights/pipelines/data_science/pipeline.py", line 10, in <module>
context = session.context
AttributeError: 'KedroSession' object has no attribute 'context'
warnings.warn(
INFO Loading data from 'companies' (CSVDataSet)... data_catalog.py:343
INFO Running node: preprocess_companies_node: preprocess_companies([companies]) -> [preprocessed_companies] node.py:327
[03/13/23 07:29:05] INFO Saving data to 'preprocessed_companies' (ParquetDataSet)... data_catalog.py:382
INFO Completed 1 out of 3 tasks sequential_runner.py:85
INFO Loading data from 'shuttles' (ExcelDataSet)... data_catalog.py:34
So maybe just something to be aware of... I can also raise an issue on this later.