Hey Folks I am looking for a way to mount AWS EFS ...
# plugins-integrations
v
Hey Folks I am looking for a way to mount AWS EFS volume to my kedro pipeline which will be executed by kubeflow . I am using the kubeflow plugin. The config has below 2 options for Volumes , I am not sure which one is for what purpose 1.
Copy code
volume:

    # Storage class - use null (or no value) to use the default storage
    # class deployed on the Kubernetes cluster
    storageclass: # default

    # The size of the volume that is created. Applicable for some storage
    # classes
    size: 1Gi

    # Access mode of the volume used to exchange data. ReadWriteMany is
    # preferred, but it is not supported on some environements (like GKE)
    # Default value: ReadWriteOnce
    #access_modes: [ReadWriteMany]

    # Flag indicating if the data-volume-init step (copying raw data to the
    # fresh volume) should be skipped
    skip_init: False

    # Allows to specify user executing pipelines within containers
    # Default: root user (to avoid issues with volumes in GKE)
    owner: 0

    # Flak indicating if volume for inter-node data exchange should be
    # kept after the pipeline is deleted
    keep: False
2.
Copy code
# Optional section to allow mounting additional volumes (such as EmptyDir)
  # to specific nodes
  extra_volumes:
    tensorflow_step:
    - mount_path: /dev/shm
      volume:
        name: shared_memory
        empty_dir:
          cls: V1EmptyDirVolumeSource
          params:
            medium: Memory
@Artur Dobrogowski Can you give some thoughts here
m
1. Is used as a "main" volume, which will be mounted under
/home/kedro/data
2. Is used for "extras" - meaning your use-case specific - if you need some additional volume for any purpose, you can attach it using this method. Most common use case is in the example - extending
/dev/shm
for distributed training in PyTorch (Kubernetes has problems with that).
👍 1
v
@marrrcin can i conclude that the first volume needs to be configured in case i want to use the EFS system. Also, the storage class is something that i need to check with the k8 cluster manager for the EFS I want to mount.
👌 1
@marrrcin This is how our EFS system is used as a pvc volume in our kubernetes cluster
Copy code
apiVersion: v1
kind: PersistentVolume
metadata:
  name: data-pv-kubeflow
spec:
  accessModes:
 - ReadWriteMany
  capacity:
    storage: 100Gi
  csi:
    driver: <http://efs.csi.aws.com|efs.csi.aws.com>
    volumeHandle: "fs-02d6475f7552a3c13:/data"
  persistentVolumeReclaimPolicy: Retain
  storageClassName: efs-sc
  volumeMode: Filesystem
So as per our discussion
storageClassName: efs-sc
, this is what we need to use right as the storage class ?
👌 1
@marrrcin I defined the above storage class as mentioned below
Copy code
# Optional volume specification
  volume:
    storageclass:  efs-sc

    access_modes: [ReadWriteMany]

    # Flag indicating if the data-volume-init step (copying raw data to the
    # fresh volume) should be skipped
    skip_init: False

    # Allows to specify user executing pipelines within containers
    # Default: root user (to avoid issues with volumes in GKE)
    owner: 0

    # Flak indicating if volume for inter-node data exchange should be
    # kept after the pipeline is deleted
    keep: False
Logs -
Copy code
INFO     Loading data from companies     data_catalog.py:539
                             (CSVDataset)...                                    
                    INFO     Running node:                           node.py:364
                             preprocess_companies_node:                         
                             preprocess_companies([companies]) ->               
                             [preprocessed_companies]                           
                    DEBUG    Inside Preprocess Companies             nodes.py:32
                    DEBUG    Checking EFS Mount now                  nodes.py:33
                    DEBUG    ['01_raw']                              nodes.py:34
This only shows [01_raw] and it seems The EFS is still not accessible. Any headsup ?
@Nok Lam Chan can you also look into this thread
n
unfortunately I have never use kubeflow myself so I can't be much help here.
👍 1
v
@em-pe As we discussed I tried doing that, but no success. Can you also look into it.