Please could I ask for some help with the Kedro Viz example Kedro #questions

Please could I ask for some help with the Kedro Vi...

Jo Stichbury

01/03/2023, 2:55 PM

Please could I ask for some help with the Kedro Viz example that uses Plotly? I've made some minor changes to the spaceflights tutorial example to add a reporting pipeline that uses Plotly express and Plotly graph objects (in order to improve the documentation in this area, as per this PR). I made a few changes to the example code in the original docs, so that there's a node each for express/graph objects, named uniquely. The graph objects node works perfectly and I see a plot.

Copy code

def compare_passenger_capacity_go(preprocessed_shuttles: pd.DataFrame):

    data_frame = preprocessed_shuttles.groupby(["shuttle_type"]).mean().reset_index()
    fig = go.Figure(
        [
            go.Bar(
                x=data_frame["shuttle_type"],
                y=data_frame["passenger_capacity"],
            )
        ]
    )
    
    return fig

However, the code for Plotly express isn't working in a

kedro run

Copy code

def compare_passenger_capacity_exp(preprocessed_shuttles: pd.DataFrame):
    fig = px.bar(
        data_frame=preprocessed_shuttles.groupby(["shuttle_type"]).mean().reset_index(),
        x="shuttle_type",
        y="passenger_capacity",
    )
    return fig

The error returned is

Copy code

PlotlyDataSet(filepath=/Users/jo_stichbury/Documents/GitHub/stichbury/kedro-projects/kedro-tutorial/data/08_reporting/shuttle_passenger_capacity_plot_exp.json, load_args={}, 
plotly_args={'fig': {'orientation': h, 'x': shuttle_type, 'y': passenger_capacity}, 'layout': {'title': Shuttle Passenger capacity, 'xaxis_title': Shuttles, 'yaxis_title': Average 
passenger capacity}, 'type': bar}, protocol=file, save_args={}, version=Version(load=None, save='2023-01-03T14.43.36.537Z')).
Value of 'x' is not the name of a column in 'data_frame'. Expected one of [0] but received: shuttle_type

Before the holiday, I did a fair amount of trial and error to re-write the function according to various stack overflow searches, but I couldn't find a way to fix it. 🚨 Please could I get some help from anyone who knows this code (maybe @Rashida Kanchwala?) or anyone who is familiar with Plotly to get the

compare_passenger_capacity_exp

method working? 🚨 My example is here so I hope it's just a matter of taking it and revising the method in the

nodes.py

file for the reporting pipeline. I should point out that it doesn't currently work on 0.18.4 (see this issue) so it's necessary to test against 0.18.3 (using the 'old' dataset notation) for now. Everything in my example is working apart from this node.

Rashida Kanchwala

01/03/2023, 4:00 PM

hi Jo, On Kedro -- there are two types of plotly datasets - plotly.plotlyDataset, and plotly.JSONDataset plotlyDataset only works with plotly.express and it does all the plotly computations under the hood based on the plotly_args you have provided in the catalog.yml. So if you are using that then all you need to do is return the dataset in the node function

Copy code

# This function uses plotly.express
def compare_passenger_capacity_exp(preprocessed_shuttles: pd.DataFrame):
    return preprocessed_shuttles.groupby(["shuttle_type"]).mean().reset_index()

if you use the JSONDataset then you need to do px.bar() in the node function but then you don't provide any sort plotly_args. In your catalog.yml you simply do this

Copy code

shuttle_passenger_capacity_plot_exp:
  type: plotly.JSONDataSet
  filepath: data/08_reporting/shuttle_passenger_capacity_plot_exp.json
  versioned: true

l understand your confusion, it seems you have done both and that's why your kedro run fails

Jo Stichbury

01/03/2023, 4:21 PM

Aha, thanks @Rashida Kanchwala! That all works now in my example. A great case where deleting code fixed the problem 🌟

4 Views

Open in Slack

Previous Next