Dear all, we are trying to set up collaborative ex...
# questions
j
Dear all, we are trying to set up collaborative experiment tracking in Azure Blob Storage according to Experiment tracking in Kedro-Viz. We have abfs working for our datasets in our data catalogue. However, when we use the same abfs path for SESSION_STORE_ARGS, we get the error attached. We are wondering if anybody has a working setup for collaborative experiment tracking in Azure and could kindly give some insights? Thank you so much! PS: We have set the credentials both as credentials.yml and as environment variables as outlined here without success: Publish and share on Azure — kedro-viz 10.0.0 documentation
👀 1
n
@Ravi Kumar Pilla cc
j
PPS: Here is a screenshot of our credentials.yml
r
Hi Johann, Thank you for using experiment tracking on viz. The doc link you shared is for hosting static Kedro Viz instance on Azure. Could you see if you have the setup mentioned here - https://docs.kedro.org/projects/kedro-viz/en/stable/experiment_tracking.html#collaborative-experiment-tracking ? Thank you
j
Hi Ravi, the URL you posted is the same I also posted originally. This is exactly our setup, just that we try to use Azure instead of AWS, everything else identical.
r
okay, can you please let me know the kedro viz version you are using ?
j
kedro-viz 9.2.0
👍 1
python 3.12.*
actually we get the error when we execute
kedro run
👍 1
so when we execute
kedro run
, we get the error shown in the screenshots. it is somehow necessary to first run a proper experiment with the initialization of the session store sqlite database on the azure blob storage. but we implement the tutorial given for kedro-viz experiment tracking: https://docs.kedro.org/projects/kedro-viz/en/stable/experiment_tracking.html#collaborative-experiment-tracking
r
Thanks for the information Johann. We haven't tried Experiment tracking on Azure blob storage before. @Rashida Kanchwala do you remember trying on Azure before ?
👍 1
j
we would also be willing to implement a bugfix and make a PR for this if anybody can give us a sketch of how to implement this.
👍🏼 1
👍 1
afk now, back tomorrow morning CET 😉. thanks!
n
I'd suggest try to initialise connection with SQLiteSessionStore directly.
r
Collaborative experiment tracking uses fsspec, so ideally it should work with Azure. during dev time; we only tested it with AWS.
👍 1
j
yeah, but for azure kedro doesn't understand from where or how to take the credentials.
see screenshots
n
Copy code
class SQLiteStore(BaseSessionStore):
    """Stores the session data on the sqlite db."""

    def __init__(self, *args, remote_path: Optional[str] = None, **kwargs):
It should support
abfs
as Rashida suggested. It would be easier to test with the class directly without involving all the Kedro stuff
r
One limitation with collaborative experiment tracking, credentials must be stored as environment variables, set through an
export
command.
r
I think they set the below credentials -
Copy code
export AZURE_STORAGE_TENANT_ID="your-app-tenant-id"
export AZURE_STORAGE_CLIENT_ID="your-app-client-id"
export AZURE_STORAGE_CLIENT_SECRET="your-app-client-secret-value"
We might need to test this and include in the docs on what the credentials should be.
j
What Ravi says is correct. We do set the environment variables correctly.
@Nok Lam Chan We are setting the SQLiteStore as session storage in settings.py according to the Kedro documentation. There, we are not instantiating the class. Since you are describing exactly this, where, i.e. which script, and how would we do it? And would that solve the problem for others anyway?
@Ravi Kumar Pilla Could you kindly look at the screenshots attached and read the error message? I am confident that our environment variables are set correctly. I think it is an actual Kedro bug. If you confirm this, I would create a ticket, and we could discuss a technical solution there.
r
I want to understand if the problem is at our end or fsspec as it might be the latter. Do u also save the datasets in your DataCatalog to Azure blob storage - does that work fine ?
j
@Rashida Kanchwala yes, saving all our datasets to azure blob storage works. here is one example of a dataset defined in our data catalog, which we find also stored on azure blob as csv file:
Copy code
example:
  type: pandas.CSVDataset
  filepath: <abfs://lab-bdschad/data/stuff/example.csv>
  versioned: true
  credentials: azure_blob_storage
somehow i think that the provided azure credentials do not match with abfs. abfs requires a connection string or a key rather than a service principal. that would be my gut feeling at least. what do you think?
also the error message says that it requires a key or a connection string. i think it would be could if there would be environment variables for these azure parameters.
r
this is what it said online 1. Obtain the connection string for the Azure Blob Storage account. This can be found in the Azure Portal under Access keys in the storage account settings. 2. Set the AZURE_STORAGE_CONNECTION_STRING environment variable in the local environment to the connection string value.
1
👏 1
j
sounds great! we'll try it out right now
@Rashida Kanchwala thanks, that did the trick. with AZURE_STORAGE_CONNECTION_STRING it is working indeed.
i would propose to update these parts of the documentation: Experiment tracking in Kedro-Viz — kedro-viz 10.0.0 documentation Publish and share on Azure — kedro-viz 10.0.0 documentation should i make a PR for this?
r
Thank you, that would be very helpful. We don't need to modify the "Publish and Share" section, as it works well and is focused purely on the flowchart, not on experiment tracking. However, it would be great to add a section for the Azure credentials required in the first document 🙂
👍 1
j
@Rashida Kanchwala i'll do that! however, i would like to note that the publish and share on azure section does not work for us as it is, but it works with AZURE_STORAGE_CONNECTION_STRING. i would recommend also updating publish and share on azure: https://docs.kedro.org/projects/kedro-viz/en/stable/publish_and_share_kedro_viz_on_azure.html#set-credentials
r
oh interesting in that case pls add that as well
👍 1
r
Happy to hear this resolved the issue. As a note, in the publish and share docs, we focussed on setting credentials which will be picked by Azure ServicePrincipal. I am curious to know if your setup is any different from the docs. Thank you
j
i'll post you updates here as soon as we created a PR.
thankyou 1