I have the following function that I want to execu...
# questions
a
I have the following function that I want to execute within an isolated pipeline (data refresh pipeline). It's not supposed to have any input or output because its purpose is to execute a stored procedure that refreshes the database before I can use SQL queries to fetch the updated tables. Therefore, I intend to create a node without any specified input or output. However, I encountered difficulties in running the node without any input (set to
None
) and output (also set to
None
). As a workaround, I have set a DataFrame as both the input and output, although as you can see, these dataframes are not being used at all. Any best practices or tips on this kind of node?
Copy code
def exec_test(companies: pd.DataFrame) -> pd.DataFrame:
    companies_dummy = companies
    try:
        db_connection = connect_to_database()
        print("Executing SP")
        # execute the stored procedure
        db_connection.execute(
            "EXEC spUpdateData '20230601' 'parameter2' 'parameter3' 'etc' ")
        db_connection.close()
        print("SP completed")
    except Exception as e:
        # print error
        print(f"Database connection error: {str(e)}")
    return companies_dummy
n
Either dummy input output or you can use a hook to make sure it execute before the pipeline run or a specific node
K 1
a
Alright, I'll explore the hook concept