user
01/04/2023, 5:28 PMuser
01/04/2023, 5:28 PMuser
01/04/2023, 5:28 PMdor zazon
01/05/2023, 8:29 AMRafael Gildin
01/05/2023, 5:52 PMDanhua Yan
01/06/2023, 5:03 PMpandas.ParquetDataSet
spark.SparkDataSet
pickle.PickleDataSet
, and using yml configs to save:
dataset:
type: pandas.ParquetDataSet
filepath: some_path
versioned: true
Jaakko
01/06/2023, 6:05 PMBrandon Meek
01/07/2023, 3:03 AMSergei Benkovich
01/09/2023, 1:41 PMdev_s3:
aws_access_key_id: AWS_ACCESS_KEY_ID
aws_secret_access_key: AWS_SECRET_ACCESS_KEY
catalog.yml:
observations:
type: pandas.CSVDataSet
filepath: "${s3.raw_observations_path}/commercial/observations/observations.csv"
credentials: "${dev_s3}"
main.py
runner = SequentialRunner()
project_path = Path(__file__).parent.parent.parent
conf_path = f'{project_path}/{settings.CONF_SOURCE}'
conf_loader = CONFIG_LOADER_CLASS(conf_source=conf_path, env="local", globals_pattern='globals*')
parameters = conf_loader.get("parameters*", "parameters*/**")
credentials = conf_loader.get("credentials*", "credentials*/**")
catalog = conf_loader.get("catalog*", "catalog*/**")
data_catalog = DATA_CATALOG_CLASS(data_sets={
'observations': CSVDataSet.from_config('observations',
catalog['observations']
),
},
feed_dict={'params': parameters})
result = runner.run(data_extraction.create_pipeline(), data_catalog)
return result
settings.py
CONF_SOURCE = "conf"
# Class that manages how configuration is loaded.
from kedro.config import TemplatedConfigLoader
CONFIG_LOADER_CLASS = TemplatedConfigLoader
CONFIG_LOADER_ARGS = {
"globals_pattern": "*globals.yml",
}
# Class that manages the Data Catalog.
from <http://kedro.io|kedro.io> import DataCatalog
DATA_CATALOG_CLASS = DataCatalog
canât get over this errorâŚ. would appreciate any help :)Seth
01/09/2023, 2:12 PMDeepyaman Datta
01/09/2023, 2:55 PMPartitionedDataSet
...
Let's say I have a catalog entry like:
my_pds:
type: PartitionedDataSet
path: data/01_raw/subjects
dataset:
type: my_project.io.MyCustomDataSet
And data like:
data/01_raw/subjects/C001/scans/0.png
data/01_raw/subjects/C001/scans/1.png
data/01_raw/subjects/C001/scans/2.png
data/01_raw/subjects/C001/test_results.csv
data/01_raw/subjects/C001/notes.png
data/01_raw/subjects/C002/scans/0.png
data/01_raw/subjects/C002/scans/1.png
data/01_raw/subjects/C002/scans/2.png
data/01_raw/subjects/C002/test_results.csv
data/01_raw/subjects/C002/notes.png
data/01_raw/subjects/T001/scans/0.png
data/01_raw/subjects/T001/scans/1.png
data/01_raw/subjects/T001/scans/2.png
data/01_raw/subjects/T001/test_results.csv
data/01_raw/subjects/T001/notes.png
What do you think the resulting partitions would be?Damian FiĹonowicz
01/10/2023, 9:43 AMdor zazon
01/10/2023, 9:56 AMdor zazon
01/10/2023, 9:56 AMdor zazon
01/10/2023, 9:56 AMMatthias Roels
01/10/2023, 2:10 PMget_current_session
function was removed. Is there a particular reason why? And is it possible to get the same functionality differently?Anderson Luiz Souza
01/10/2023, 3:12 PMPedro Abreu
01/10/2023, 6:12 PMrich
or Databricks has invalidated the previous approach?
Weâre using kedro 0.18.3.Sidharth Jahagirdar
01/10/2023, 8:53 PMDustin
01/10/2023, 10:01 PMkedro viz
an error message "No such command 'viz'" shows and kedro -h
doesn't list 'viz' option either. I followed a template project pandas-irisRicardo AraĂşjo
01/10/2023, 11:52 PMSergei Benkovich
01/11/2023, 7:14 AMobservations:
type: pandas.CSVDataSet
filepath: "${s3.raw_observations_path}/observations.csv"
credentials: dev_s3
Lorenzo Castellino
01/11/2023, 8:21 AMplotly.JSONDataSet
. The pipeline runs fine, the plot is saved to disk and displayed in the experiment tracking section but the styling applied in the fig.update_layout()
call seem to be skipped as you can see from the second image. Everything else is displayed as desired (included menu and hover data).
Any clue on what could be the issue here?
This is the code present in the node that outputs it:
def plot_loadings(loadings: NDArray) -> go.Figure:
fig = go.Figure(layout_yaxis_range=[-1, 1], layout_xaxis_range=[-1, 1])
fig.add_traces(
go.Scattergl(
x=loadings[:, 0],
y=loadings[:, 1],
mode="markers",
hovertext=[f"Var{i+1}" for i in range(loadings.shape[0])],
)
)
x_buttons = []
y_buttons = []
for i in range(loadings.shape[1]):
x_buttons.append(
dict(
method="update",
label=f"PC{i + 1}",
args=[
{"x": [loadings[:, i]]},
],
)
)
y_buttons.append(
dict(
method="update",
label=f"PC{i + 1}",
args=[
{"y": [loadings[:, i]]},
],
)
)
fig.update_layout(
updatemenus=[
dict(buttons=x_buttons, direction="up", x=0.5, y=-0.1, active=0),
dict(
buttons=y_buttons,
direction="down",
x=-0.01,
y=0.5,
active=(1 if loadings.shape[1] > 1 else 0),
),
]
)
fig.update_layout(
{
"title": {"text": "Loadings Plot", "x": 0.5},
"width": 1000,
"height": 1000,
}
)
return fig
Jorge sendino
01/11/2023, 4:59 PMVersionNotFoundError: Did not find any versions for SparkDataSet
. My kedro version is 0.18.3. Any idea how to solve it?Lorenzo Castellino
01/11/2023, 5:32 PMtracking.JSONDataSet
? I thought about (and tried) flattening the dictionary in the node. It works but is there another way I'm missing to achieving a more pleasing result?Patrick H.
01/11/2023, 5:43 PMMatthias Roels
01/11/2023, 7:17 PMOmegaConfLoader
class available in the main branch of kedro. When is it expected to be included in a release (v0.18.5 maybe)?user
01/11/2023, 11:48 PMJavier Hernandez
01/12/2023, 1:51 PMWalber Moreira
01/12/2023, 2:21 PM