https://kedro.org/ logo
#questions
Title
# questions
j

Jo Stichbury

01/03/2023, 2:55 PM
Please could I ask for some help with the Kedro Viz example that uses Plotly? I've made some minor changes to the spaceflights tutorial example to add a reporting pipeline that uses Plotly express and Plotly graph objects (in order to improve the documentation in this area, as per this PR). I made a few changes to the example code in the original docs, so that there's a node each for express/graph objects, named uniquely. The graph objects node works perfectly and I see a plot.
Copy code
def compare_passenger_capacity_go(preprocessed_shuttles: pd.DataFrame):

    data_frame = preprocessed_shuttles.groupby(["shuttle_type"]).mean().reset_index()
    fig = go.Figure(
        [
            go.Bar(
                x=data_frame["shuttle_type"],
                y=data_frame["passenger_capacity"],
            )
        ]
    )
    
    return fig
However, the code for Plotly express isn't working in a
kedro run
.
Copy code
def compare_passenger_capacity_exp(preprocessed_shuttles: pd.DataFrame):
    fig = px.bar(
        data_frame=preprocessed_shuttles.groupby(["shuttle_type"]).mean().reset_index(),
        x="shuttle_type",
        y="passenger_capacity",
    )
    return fig
The error returned is
Copy code
PlotlyDataSet(filepath=/Users/jo_stichbury/Documents/GitHub/stichbury/kedro-projects/kedro-tutorial/data/08_reporting/shuttle_passenger_capacity_plot_exp.json, load_args={}, 
plotly_args={'fig': {'orientation': h, 'x': shuttle_type, 'y': passenger_capacity}, 'layout': {'title': Shuttle Passenger capacity, 'xaxis_title': Shuttles, 'yaxis_title': Average 
passenger capacity}, 'type': bar}, protocol=file, save_args={}, version=Version(load=None, save='2023-01-03T14.43.36.537Z')).
Value of 'x' is not the name of a column in 'data_frame'. Expected one of [0] but received: shuttle_type
Before the holiday, I did a fair amount of trial and error to re-write the function according to various stack overflow searches, but I couldn't find a way to fix it. 🚨 Please could I get some help from anyone who knows this code (maybe @Rashida Kanchwala?) or anyone who is familiar with Plotly to get the
compare_passenger_capacity_exp
method working? 🚨 My example is here so I hope it's just a matter of taking it and revising the method in the
nodes.py
file for the reporting pipeline. I should point out that it doesn't currently work on 0.18.4 (see this issue) so it's necessary to test against 0.18.3 (using the 'old' dataset notation) for now. Everything in my example is working apart from this node.
r

Rashida Kanchwala

01/03/2023, 4:00 PM
hi Jo, On Kedro -- there are two types of plotly datasets - plotly.plotlyDataset, and plotly.JSONDataset plotlyDataset only works with plotly.express and it does all the plotly computations under the hood based on the plotly_args you have provided in the catalog.yml. So if you are using that then all you need to do is return the dataset in the node function
Copy code
# This function uses plotly.express
def compare_passenger_capacity_exp(preprocessed_shuttles: pd.DataFrame):
    return preprocessed_shuttles.groupby(["shuttle_type"]).mean().reset_index()
if you use the JSONDataset then you need to do px.bar() in the node function but then you don't provide any sort plotly_args. In your catalog.yml you simply do this
Copy code
shuttle_passenger_capacity_plot_exp:
  type: plotly.JSONDataSet
  filepath: data/08_reporting/shuttle_passenger_capacity_plot_exp.json
  versioned: true
l understand your confusion, it seems you have done both and that's why your kedro run fails
j

Jo Stichbury

01/03/2023, 4:21 PM
Aha, thanks @Rashida Kanchwala! That all works now in my example. A great case where deleting code fixed the problem 🌟
2 Views