Lucie Gattepaille
11/02/2022, 1:01 PMYetunde
11/02/2022, 1:05 PMJo Stichbury
11/02/2022, 1:21 PMLucie Gattepaille
11/02/2022, 2:27 PMdef compare_passenger_capacity(preprocessed_shuttles: pd.DataFrame):
return preprocessed_shuttles.groupby(["shuttle_type"]).mean().reset_index()
In data_processing/pipeline.py, I replaced by the following code:
from kedro.pipeline import Pipeline, node
from kedro.pipeline.modular_pipeline import pipeline
from .nodes import create_model_input_table, preprocess_companies, preprocess_shuttles
from .nodes import compare_passenger_capacity
def create_pipeline(**kwargs) -> Pipeline:
return pipeline(
[
node(
func=preprocess_companies,
inputs="companies",
outputs=["preprocessed_companies", "companies_columns"],
name="preprocess_companies_node",
),
node(
func=preprocess_shuttles,
inputs="shuttles",
outputs="preprocessed_shuttles",
name="preprocess_shuttles_node",
),
node(
func=create_model_input_table,
inputs=["preprocessed_shuttles", "preprocessed_companies", "reviews"],
outputs="model_input_table",
name="create_model_input_table_node",
),
node(
func=compare_passenger_capacity,
inputs="preprocessed_shuttles",
outputs="shuttle_passenger_capacity_plot",
#name="shuttle_passenger_capacity_plot_node"
),
],
namespace="data_processing",
inputs=["companies", "shuttles", "reviews"],
outputs=["model_input_table","shuttle_passenger_capacity_plot"],
)
and finally in catalog.yml, I added:
shuttle_passenger_capacity_plot:
type: plotly.PlotlyDataSet
filepath: data/08_reporting/shuttle_passenger_capacity_plot.json
plotly_args:
type: bar
fig:
x: shuttle_type
y: passenger_capacity
orientation: v
layout:
xaxis_title: Shuttles
yaxis_title: Average passenger capacity
title: Shuttle Passenger capacity
This created the plot in kedro viz (note that the orientation was wrong in the tutorial, it needs to be v to be making sense). BUT as you will see if I manage to share the picture, I also end up with a dataset called data_processing.shuttle_passenger_capacity_plot (probably some namespace misunderstandings on my part).Jo Stichbury
11/02/2022, 2:36 PMdata_processing
too, although you could equally well put it in the data_science
pipeline too. When I update the text, I'll add a third pipeline for reporting.Lucie Gattepaille
11/02/2022, 3:05 PMJo Stichbury
11/02/2022, 3:53 PM