Vishal Pandey
09/25/2024, 8:47 AMvolume:
# Storage class - use null (or no value) to use the default storage
# class deployed on the Kubernetes cluster
storageclass: # default
# The size of the volume that is created. Applicable for some storage
# classes
size: 1Gi
# Access mode of the volume used to exchange data. ReadWriteMany is
# preferred, but it is not supported on some environements (like GKE)
# Default value: ReadWriteOnce
#access_modes: [ReadWriteMany]
# Flag indicating if the data-volume-init step (copying raw data to the
# fresh volume) should be skipped
skip_init: False
# Allows to specify user executing pipelines within containers
# Default: root user (to avoid issues with volumes in GKE)
owner: 0
# Flak indicating if volume for inter-node data exchange should be
# kept after the pipeline is deleted
keep: False
2.
# Optional section to allow mounting additional volumes (such as EmptyDir)
# to specific nodes
extra_volumes:
tensorflow_step:
- mount_path: /dev/shm
volume:
name: shared_memory
empty_dir:
cls: V1EmptyDirVolumeSource
params:
medium: Memory
Vishal Pandey
09/26/2024, 6:05 AMmarrrcin
09/26/2024, 8:24 AM/home/kedro/data
2. Is used for "extras" - meaning your use-case specific - if you need some additional volume for any purpose, you can attach it using this method. Most common use case is in the example - extending /dev/shm
for distributed training in PyTorch (Kubernetes has problems with that).Vishal Pandey
09/26/2024, 8:28 AMVishal Pandey
09/30/2024, 11:57 AMapiVersion: v1
kind: PersistentVolume
metadata:
name: data-pv-kubeflow
spec:
accessModes:
- ReadWriteMany
capacity:
storage: 100Gi
csi:
driver: <http://efs.csi.aws.com|efs.csi.aws.com>
volumeHandle: "fs-02d6475f7552a3c13:/data"
persistentVolumeReclaimPolicy: Retain
storageClassName: efs-sc
volumeMode: Filesystem
So as per our discussion storageClassName: efs-sc
, this is what we need to use right as the storage class ?Vishal Pandey
10/01/2024, 8:35 AM# Optional volume specification
volume:
storageclass: efs-sc
access_modes: [ReadWriteMany]
# Flag indicating if the data-volume-init step (copying raw data to the
# fresh volume) should be skipped
skip_init: False
# Allows to specify user executing pipelines within containers
# Default: root user (to avoid issues with volumes in GKE)
owner: 0
# Flak indicating if volume for inter-node data exchange should be
# kept after the pipeline is deleted
keep: False
Logs -
INFO Loading data from companies data_catalog.py:539
(CSVDataset)...
INFO Running node: node.py:364
preprocess_companies_node:
preprocess_companies([companies]) ->
[preprocessed_companies]
DEBUG Inside Preprocess Companies nodes.py:32
DEBUG Checking EFS Mount now nodes.py:33
DEBUG ['01_raw'] nodes.py:34
This only shows [01_raw] and it seems The EFS is still not accessible.
Any headsup ?Vishal Pandey
10/01/2024, 11:35 AMNok Lam Chan
10/01/2024, 11:36 AMVishal Pandey
10/01/2024, 11:41 AM