I want to ensure that the Data Update Node's outpu...
# questions
a
I want to ensure that the Data Update Node's output can be the input for the Excel Export Node. The problem I have is that the SQL Data Update node will have to run a stored procedure to update the SQL table. 1. SQL Data Update Node: This node receives a set of parameters defined in
parameters.yml
. It runs a SQL stored procedure to generate the latest data in the SQL database. Current node has None as output but I prefer to return the
EIU_sql
which should become the input for my next node.
Copy code
# Generates EIU data using a SQL stored procedure and stores it in the database.
node(
   func=generate_EIU,
   inputs=["params:EIU"],
   # preferred outputs
   # outputs=["EIU_sql"],
   # current output
   outputs=None, 
   name="generate_EIU",
   namespace="EIU"
),
1. Excel Data Export Node: The second node takes as input the SQL table that were updated or generated by the first node. It is responsible for writing the data from these SQL tables into an Excel file.
Copy code
# Transforms EIU data to Excel format and saves it to a file.
node(
   func=EIU_sql_to_excel,
   inputs=["EIU_sql"],
   # preferred outputs
   # outputs=["EIU_excel"],
   # current output
   outputs=None,
   name="EIU_sql_to_excel",
   namespace="EIU"
),
Additionally, I've defined the SQL table and the Excel output location in the
catalog.yml
file.
Copy code
EIU_sql:
  type: pandas.SQLQueryDataset
  sql: "SELECT ..."
  credentials: cred

EIU_excel:
  type: pandas.ExcelDataSet
  filepath: data/03_primary/EIUdata.xlsx
  load_args:
    sheet_name: data
  save_args:
    sheet_name: data
functions
Copy code
def generate_EIU(parameters: Dict[str, Any]) -> None:
    """
    Generates EIU data using a SQL stored procedure and stores it in the database.

    Input:
    - parameters: A dictionary containing configuration parameters

    Output:
    - None
    """
    # ... (code for generating and storing EIU data)

def EIU_sql_to_excel(
    EIU_sql: Optional[pd.DataFrame]) -> None:
    """
    Transforms EIU data to Excel format and saves it to a file.

    Input:
    - EIU_sql: Yearly EIU data in a DataFrame.

    Output:
    - None
    """
    # ... (code for saving EIU data to Excel)
Appreciate any tips on how can I simplify these nodes and functions and link them together. Currently they're not "properly" linked by the output/input. The stored procedure takes some time to complete. Once I have the updated SQL tables, I'll primarily use those. However, some teams need Excel files, which is why I also generate Excel outputs.
What I wanted to try is to call
catalog.load("EIU_sql")
within the
generate_EIU
function and return `EIU_sql`table after the stored procedure is completed.
I managed to resolve this, there was a lack of understanding from my part on how the Kedro dataset works, but it's good for now.
👍 1