Hey Folks I am looking for a way to mount AWS EFS volume to Kedro #plugins-integrations

Hey Folks I am looking for a way to mount AWS EFS ...

Vishal Pandey

09/25/2024, 8:47 AM

Hey Folks I am looking for a way to mount AWS EFS volume to my kedro pipeline which will be executed by kubeflow . I am using the kubeflow plugin. The config has below 2 options for Volumes , I am not sure which one is for what purpose 1.

Copy code

volume:

    # Storage class - use null (or no value) to use the default storage
    # class deployed on the Kubernetes cluster
    storageclass: # default

    # The size of the volume that is created. Applicable for some storage
    # classes
    size: 1Gi

    # Access mode of the volume used to exchange data. ReadWriteMany is
    # preferred, but it is not supported on some environements (like GKE)
    # Default value: ReadWriteOnce
    #access_modes: [ReadWriteMany]

    # Flag indicating if the data-volume-init step (copying raw data to the
    # fresh volume) should be skipped
    skip_init: False

    # Allows to specify user executing pipelines within containers
    # Default: root user (to avoid issues with volumes in GKE)
    owner: 0

    # Flak indicating if volume for inter-node data exchange should be
    # kept after the pipeline is deleted
    keep: False

Copy code

# Optional section to allow mounting additional volumes (such as EmptyDir)
  # to specific nodes
  extra_volumes:
    tensorflow_step:
    - mount_path: /dev/shm
      volume:
        name: shared_memory
        empty_dir:
          cls: V1EmptyDirVolumeSource
          params:
            medium: Memory

Vishal Pandey

09/26/2024, 6:05 AM

@Artur Dobrogowski Can you give some thoughts here

marrrcin

09/26/2024, 8:24 AM

1. Is used as a "main" volume, which will be mounted under

/home/kedro/data

2. Is used for "extras" - meaning your use-case specific - if you need some additional volume for any purpose, you can attach it using this method. Most common use case is in the example - extending

/dev/shm

for distributed training in PyTorch (Kubernetes has problems with that).

👍 1

Vishal Pandey

09/26/2024, 8:28 AM

@marrrcin can i conclude that the first volume needs to be configured in case i want to use the EFS system. Also, the storage class is something that i need to check with the k8 cluster manager for the EFS I want to mount.

👌 1

Vishal Pandey

09/30/2024, 11:57 AM

@marrrcin This is how our EFS system is used as a pvc volume in our kubernetes cluster

Copy code

apiVersion: v1
kind: PersistentVolume
metadata:
  name: data-pv-kubeflow
spec:
  accessModes:
 - ReadWriteMany
  capacity:
    storage: 100Gi
  csi:
    driver: <http://efs.csi.aws.com|efs.csi.aws.com>
    volumeHandle: "fs-02d6475f7552a3c13:/data"
  persistentVolumeReclaimPolicy: Retain
  storageClassName: efs-sc
  volumeMode: Filesystem

So as per our discussion

storageClassName: efs-sc

, this is what we need to use right as the storage class ?

👌 1

Vishal Pandey

10/01/2024, 8:35 AM

@marrrcin I defined the above storage class as mentioned below

Copy code

# Optional volume specification
  volume:
    storageclass:  efs-sc

    access_modes: [ReadWriteMany]

    # Flag indicating if the data-volume-init step (copying raw data to the
    # fresh volume) should be skipped
    skip_init: False

    # Allows to specify user executing pipelines within containers
    # Default: root user (to avoid issues with volumes in GKE)
    owner: 0

    # Flak indicating if volume for inter-node data exchange should be
    # kept after the pipeline is deleted
    keep: False

Logs -

Copy code

INFO     Loading data from companies     data_catalog.py:539
                             (CSVDataset)...                                    
                    INFO     Running node:                           node.py:364
                             preprocess_companies_node:                         
                             preprocess_companies([companies]) ->               
                             [preprocessed_companies]                           
                    DEBUG    Inside Preprocess Companies             nodes.py:32
                    DEBUG    Checking EFS Mount now                  nodes.py:33
                    DEBUG    ['01_raw']                              nodes.py:34

This only shows [01_raw] and it seems The EFS is still not accessible. Any headsup ?

Vishal Pandey

10/01/2024, 11:35 AM

@Nok Lam Chan can you also look into this thread

Nok Lam Chan

10/01/2024, 11:36 AM

unfortunately I have never use kubeflow myself so I can't be much help here.

👍 1

Vishal Pandey

10/01/2024, 11:41 AM

@em-pe As we discussed I tried doing that, but no success. Can you also look into it.

3 Views

Open in Slack

Previous Next