Hello! I have created a kedro matlab custom datase...
# questions
s
Hello! I have created a kedro matlab custom dataset which I would like to eventually contribute to the kedro datasets repo. May I ask how do I move forward?
matlab 5
m
Kedro-Datasets are a part of
kedro-plugins
repo: https://github.com/kedro-org/kedro-plugins/tree/main/kedro-datasets
j
amazing @Samuel Lee SJ! as @marrrcin says, the way forward would be opening a pull request on
kedro-plugins
, where
kedro-datasets
lives ⭐
I had my fair share of MATLAB at university πŸ˜„
πŸ‘€ 1
d
if you need any help making the contribution please shout! We’re more than happy to coach you through the process, what tests are required etc πŸ™‚
a
As a former MathWorks and current Kedro software engineer, I am excited for this haha
πŸš€ 5
❀️ 1
s
I have used some scipy plugins. Would I need to pull the relevant functions into the repo? Or should I lift the functions from the scipy repo and place it in the kedro datasets repo? @Juan Luis
Hello @datajoely I am trying to install test requirements. It returns a command not found error. the code i used was pulled straight from the kedro-datasets repo under CONTRIBUTING.md
Copy code
make plugin=kedro-datasets install-test-requirements
j
that
make
command looks good, can you paste the full error @Samuel Lee SJ? so we see what command is missing
s
Copy code
make : The term 'make' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the 
path is correct and try again.
At line:1 char:1
+ make plugin=kedro-datasets install-test-requirements
+ ~~~~
    + CategoryInfo          : ObjectNotFound: (make:String) [], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException
I called the command in CLI in vs code in a conda environment
d
Are you on windows? You may need to install make
j
yes, can you try
conda install make -c conda-forge
@Samuel Lee SJ and repeat?
s
@Juan Luis I did and I managed to install make. Thank you.
πŸ‘πŸΌ 1
@Juan Luis May I know the preferred python version for the kedro-datasets? This is because I have encountered an error regarding the install-test-requirements. I am running python 3.8.18 and my setuptools is currently version 59.6.0. When i run the make command, it returns with error ERROR: No matching distribution for setuptools >=61.2. Should I update my python version so that I can support a higher version of setuptools? Or should I install a higher version of setuptools manually?
<some context> I have run "pip install --upgrade setuptools>=61.2" and it returns with this error:
Copy code
ERROR: Could not find a version that satisfies the requirement setuptools>=61.2 (from versions: 0.6b1, 0.6b2, 0.6b3, 0.6b4, 0.6rc1, 0.6rc2, 0.6rc3, 0.6rc4, 0.6rc5, 0.6rc6, 
0.6rc7, 0.6rc8, 0.6rc9, 0.6rc10, 0.6rc11, 0.7.2, 0.7.3, 0.7.4, 0.7.5, 0.7.6, 0.7.7, 0.7.8, 0.8, 0.9, 0.9.1, 0.9.2, 0.9.3, 0.9.4, 0.9.5, 0.9.6, 0.9.7, 0.9.8, 1.0, 1.1, 1.1.1, 1.1.2, 1.1.3, 1.1.4, 1.1.5, 1.1.6, 1.1.7, 1.2, 1.3, 1.3.1, 1.3.2, 1.4, 1.4.1, 1.4.2, 2.0, 2.0.1, 2.0.2, 2.1, 2.1.1, 2.1.2, 2.2, 3.0, 3.0.1, 3.0.2, 3.1, 3.2, 3.3, 3.4, 3.4.1, 3.4.2, 3.4.3, 3.4.4, 3.5, 3.5.1, 3.5.2, 3.6, 3.7, 3.7.1, 3.8, 3.8.1, 4.0, 4.0.1, 5.0, 5.0.1, 5.0.2, 5.1, 5.2, 5.3, 5.4, 5.4.1, 5.4.2, 5.5, 5.5.1, 5.6, 5.7, 5.8, 6.0.1, 
6.0.2, 6.1, 7.0, 8.0, 8.0.1, 8.0.2, 8.0.3, 8.0.4, 8.1, 8.2, 8.2.1, 8.3, 9.0, 9.0.1, 9.1, 10.0, 10.0.1, 10.1, 10.2, 10.2.1, 11.0, 11.1, 11.2, 11.3, 11.3.1, 12.0, 12.0.1, 12.0.2, 12.0.3, 12.0.4, 12.0.5, 12.1, 12.2, 12.3, 12.4, 13.0.1, 13.0.2, 14.0, 14.1, 14.1.1, 14.2, 14.3, 14.3.1, 15.0, 15.1, 15.2, 16.0, 17.0, 17.1, 17.1.1, 18.0, 18.0.1, 18.1, 18.2, 18.3, 18.3.1, 18.3.2, 18.4, 18.5, 18.6, 18.6.1, 18.7, 18.7.1, 18.8, 18.8.1, 19.0, 19.1, 19.1.1, 19.2, 19.3, 19.4, 19.4.1, 19.5, 19.6, 19.6.1, 19.6.2, 19.7, 20.0, 20.1, 20.1.1, 20.2.2, 20.3, 20.3.1, 20.4, 20.6.6, 20.6.7, 20.6.8, 20.7.0, 20.8.0, 20.8.1, 20.9.0, 20.10.1, 21.0.0, 21.1.0, 21.2.0, 21.2.1, 21.2.2, 22.0.0, 22.0.1, 22.0.2, 22.0.4, 22.0.5, 23.0.0, 23.1.0, 23.2.0, 23.2.1, 24.0.0, 24.0.1, 24.0.2, 24.0.3, 24.1.0, 24.1.1, 24.2.0, 24.2.1, 24.3.0, 24.3.1, 25.0.0, 25.0.1, 25.0.2, 25.1.0, 25.1.1, 25.1.2, 
25.1.3, 25.1.4, 25.1.5, 25.1.6, 25.2.0, 25.3.0, 25.4.0, 26.0.0, 26.1.0, 26.1.1, 27.0.0, 27.1.0, 27.1.2, 27.2.0, 27.3.0, 27.3.1, 28.0.0, 28.1.0, 28.2.0, 28.3.0, 28.4.0, 28.5.0, 28.6.0, 28.6.1, 28.7.0, 28.7.1, 28.8.0, 28.8.1, 29.0.0, 29.0.1, 30.0.0, 30.1.0, 30.2.0, 30.2.1, 30.3.0, 30.4.0, 31.0.0, 31.0.1, 32.0.0, 32.1.0, 32.1.1, 32.1.2, 32.1.3, 
32.2.0, 32.3.0, 32.3.1, 33.1.0, 33.1.1, 34.0.0, 34.0.1, 34.0.2, 34.0.3, 34.1.0, 34.1.1, 34.2.0, 34.3.0, 34.3.1, 34.3.2, 34.3.3, 34.4.0, 34.4.1, 35.0.0, 35.0.1, 35.0.2, 36.0.1, 36.1.0, 36.1.1, 36.2.0, 36.2.1, 36.2.2, 36.2.3, 36.2.4, 36.2.5, 36.2.6, 36.2.7, 36.3.0, 36.4.0, 36.5.0, 36.6.0, 36.6.1, 36.7.0, 36.7.1, 36.7.2, 36.8.0, 37.0.0, 38.0.0, 
38.1.0, 38.2.0, 38.2.1, 38.2.3, 38.2.4, 38.2.5, 38.3.0, 38.4.0, 38.4.1, 38.5.0, 38.5.1, 38.5.2, 38.6.0, 38.6.1, 38.7.0, 39.0.0, 39.0.1, 39.1.0, 39.2.0, 40.0.0, 40.1.0, 40.1.1, 40.2.0, 40.3.0, 40.4.0, 40.4.1, 40.4.2, 40.4.3, 40.5.0, 40.6.0, 40.6.1, 40.6.2, 40.6.3, 40.7.0, 40.7.1, 40.7.2, 40.7.3, 40.8.0, 40.9.0, 41.0.0, 41.0.1, 41.1.0, 41.2.0, 
41.3.0, 41.4.0, 41.5.0, 41.5.1, 41.6.0, 42.0.0, 42.0.1, 42.0.2, 43.0.0, 44.0.0, 44.1.0, 44.1.1, 45.0.0, 45.1.0, 45.2.0, 45.3.0, 46.0.0, 46.1.0, 46.1.1, 46.1.2, 46.1.3, 46.2.0, 46.3.0, 46.3.1, 46.4.0, 47.0.0, 47.1.0, 47.1.1, 47.2.0, 47.3.0, 47.3.1, 47.3.2, 48.0.0, 49.0.0, 49.0.1, 49.1.0, 49.1.1, 49.1.2, 49.1.3, 49.2.0, 49.2.1, 49.3.0, 49.3.1, 
49.3.2, 49.4.0, 49.5.0, 49.6.0, 50.0.0, 50.0.1, 50.0.2, 50.0.3, 50.1.0, 50.2.0, 50.3.0, 50.3.1, 50.3.2, 51.0.0, 51.1.0, 51.1.0.post20201221, 51.1.1, 51.1.2, 51.2.0, 51.3.0, 51.3.1, 51.3.2, 51.3.3, 52.0.0, 53.0.0, 53.1.0, 54.0.0, 54.1.0, 54.1.1, 54.1.2, 54.1.3, 54.2.0, 56.0.0, 56.1.0, 56.2.0, 57.0.0, 57.1.0, 57.2.0, 57.3.0, 57.4.0, 57.5.0, 58.0.0, 58.0.1, 58.0.2, 58.0.3, 58.0.4, 58.1.0, 58.2.0, 58.3.0, 58.4.0, 58.5.0, 58.5.1, 58.5.2, 58.5.3, 59.0.1, 59.1.0, 59.1.1, 59.2.0, 59.3.0, 59.4.0, 59.5.0, 59.6.0)        
ERROR: No matching distribution found for setuptools>=61.2
Which is the same error as if I was running
Copy code
make plugin=kedro-datasets install-test-requirements
j
that's very weird, because there are versions well beyond 61.2 online πŸ€” https://pypi.org/project/setuptools/68.2.2/ and they support Python 3.8 @Samuel Lee SJ do you mind doing
pip install "setuptools==61.2" -vvv
and paste the (long) logs here?
and also the output of
pip --version
s
@Juan Luis So the output of pip install "setuptools==61.2" -vvv is too long to create a snippet or even paste into the chat. I copied the last end of it
Copy code
ERROR: No matching distribution found for setuptools==61.2
Exception information:
Traceback (most recent call last):
  File "G:\miniconda3\envs\kedro\lib\site-packages\pip\_vendor\resolvelib\resolvers.py", line 341, in resolve
    self._add_to_criteria(self.state.criteria, r, parent=None)
  File "G:\miniconda3\envs\kedro\lib\site-packages\pip\_vendor\resolvelib\resolvers.py", line 173, in _add_to_criteria
    raise RequirementsConflicted(criterion)
pip._vendor.resolvelib.resolvers.RequirementsConflicted: Requirements conflict: SpecifierRequirement('setuptools==61.2')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "G:\miniconda3\envs\kedro\lib\site-packages\pip\_internal\resolution\resolvelib\resolver.py", line 95, in resolve
    collected.requirements, max_rounds=try_to_avoid_resolution_too_deep
  File "G:\miniconda3\envs\kedro\lib\site-packages\pip\_vendor\resolvelib\resolvers.py", line 472, in resolve
    state = resolution.resolve(requirements, max_rounds=max_rounds)
  File "G:\miniconda3\envs\kedro\lib\site-packages\pip\_vendor\resolvelib\resolvers.py", line 343, in resolve
    raise ResolutionImpossible(e.criterion.information)
pip._vendor.resolvelib.resolvers.ResolutionImpossible: [RequirementInformation(requirement=SpecifierRequirement('setuptools==61.2'), parent=None)]

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "G:\miniconda3\envs\kedro\lib\site-packages\pip\_internal\cli\base_command.py", line 173, in _main
    status = self.run(options, args)
  File "G:\miniconda3\envs\kedro\lib\site-packages\pip\_internal\cli\req_command.py", line 203, in wrapper
    return func(self, options, args)
  File "G:\miniconda3\envs\kedro\lib\site-packages\pip\_internal\commands\install.py", line 316, in run
    reqs, check_supported_wheels=not options.target_dir
  File "G:\miniconda3\envs\kedro\lib\site-packages\pip\_internal\resolution\resolvelib\resolver.py", line 103, in resolve
    raise error from e
pip._internal.exceptions.DistributionNotFound: No matching distribution found for setuptools==61.2
Removed build tracker: 'C:\\Users\\samue\\AppData\\Local\\Temp\\pip-req-tracker-usv5jua5'
I also tried upgrading pip
Copy code
(kedro) PS G:\Working\kedro-plugins> pip install --upgrade pip
Requirement already satisfied: pip in g:\miniconda3\envs\kedro\lib\site-packages (21.2.2)
Collecting pip
  Downloading pip-21.3.1-py3-none-any.whl (1.7 MB)
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1.7 MB 6.8 MB/s
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 21.2.2
    Uninstalling pip-21.2.2:
ERROR: Could not install packages due to an OSError: [WinError 5] Access is denied: 'g:\\miniconda3\\envs\\kedro\\scripts\\pip.exe'
Consider using the `--user` option or check the permissions.

(kedro) PS G:\Working\kedro-plugins> pip install --upgrade pip --user
Script file 'G:\miniconda3\envs\kedro\Scripts\pip-script.py' is not present.
(kedro) PS G:\Working\kedro-plugins> pip --version
Script file 'G:\miniconda3\envs\kedro\Scripts\pip-script.py' is not present.
Just for context I am running python=3.8.18
j
I think this is a known Windows error, in that
pip
cannot upgrade itself because it's in use. first, I assume pip is installed with conda, so better to not meddle with that. can you do
conda list
to verify that? and then do
conda update pip
(also sorry this is taking so long πŸ™πŸΌ I'm more than okay guiding you through the process, as long as you are patient enough to keep digging πŸ˜„)
and the other thing, maybe setuptools is installed through conda too. so if that's the case you can try
conda update setuptools
s
(I am more than happy to sit and learn. I was about to say sorry for taking up your time)
πŸ™πŸΌ 1
The conda install i think worked. But now conda is giving me problems. As you mentioned earlier, I think I am getting a similar error
Copy code
(kedro) PS G:\Working\kedro-plugins\kedro-plugins> make plugin=kedro-datasets install-test-requirements
cd kedro-datasets && pip install ".[test]"
Script file 'G:\miniconda3\envs\kedro\Scripts\pip-script.py' is not present.
make: *** [Makefile:67: install-test-requirements] Error 105
Does this mean that I cannot use a conda environment?
j
fair enough, at this point I suspect something got borked, because it's not normal that the
pip-script.py
is missing. as annoying as it is, can you try 1.
conda deactivate
2. delete the
kedro
environment 3. create a new one (you can keep using Python 3.8 if you like, for now it's supported but will be dropped soon, maybe you can consider Python 3.9+) 4. activate and
conda install make -c conda-forge
as we did initially 5. see if
make ...
works
s
Does it matter which python version I create my matlab dataset on? Can I use python 3.10?
j
you can use 3.10 yes! it will be tested automatically in all supported Python versions
s
Nice.
@Juan Luis So I redid everything on python=3.10 I get a resolution error. Incompatibility between dask and kedro-datasets[test] This is the tailend of the code:
Copy code
ERROR: Cannot install dask[complete]==2021.12.0 and kedro-datasets[test]==1.8.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    kedro-datasets[test] 1.8.0 depends on dask~=2021.10; extra == "test"
    dask[complete] 2021.12.0 depends on dask 2021.12.0 (from <https://files.pythonhosted.org/packages/15/6d/99c63be3ea8a4a651d845addeea1f1b3bb8e5c6730bc26cfb6176631adf7/dask-2021.12.0-py3-none-any.whl> (from <https://pypi.org/simple/dask/>) (requires-python:>=3.7))

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit <https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts>
make: *** [Makefile:67: install-test-requirements] Error 1
j
okay, we're making progress
it turns out you don't need all those test dependencies to develop @Samuel Lee SJ, this is a known issue https://github.com/kedro-org/kedro-plugins/issues/415 to run the tests, you can install only the pytest plugins and the ruff linter:
Copy code
pip install "pytest~=7.2" "pytest-cov~=3.0" "pytest-xdist[psutil]~=2.2.1" "pytest-mock>=1.7.1, <2.0" "ruff~=0.0.290"
and when running your tests, you can do
pytest -k matlab
to only run the matlab-related tests, instead of everything
πŸ‘ 1
s
@Juan Luis I managed to sortof install pytest and while running
Copy code
pytest -k matlab
I get the following errors:
Copy code
(kedro) PS G:\Working\kedro-plugins\kedro-plugins\kedro-datasets\kedro_datasets> pytest -k matlab
rootdir: G:\Working\kedro-plugins\kedro-plugins\kedro-datasets
configfile: pyproject.toml
plugins: cov-3.0.0, forked-1.6.0, mock-1.13.0, xdist-2.2.1
collected 0 items                                                                                                                                                                            
G:\miniconda3\envs\kedro\lib\site-packages\coverage\inorout.py:507: CoverageWarning: Module kedro_datasets was never imported. (module-not-imported)
  self.warn(f"Module {pkg} was never imported.", slug="module-not-imported")
G:\miniconda3\envs\kedro\lib\site-packages\coverage\inorout.py:507: CoverageWarning: Module tests was never imported. (module-not-imported)
  self.warn(f"Module {pkg} was never imported.", slug="module-not-imported")
G:\miniconda3\envs\kedro\lib\site-packages\coverage\control.py:883: CoverageWarning: No data was collected. (no-data-collected)
  self._warn("No data was collected.", slug="no-data-collected")
WARNING: Failed to generate report: No data to report.
Then when I run pytest -k matlab in a directory higher pytest runs all the datasets. Could I get advice on how to proceed?
j
your first invokation is good @Samuel Lee SJ, but you haven't written any tests containing the word
matlab
in the name yet. that's why it says "collected 0 items"
the coverage warnings are a consequence of that
the
-k
flag looks for the test (function) names or module names, for example
test_matlab.py
or
def test_matlab_dataset
but if you already have the implementation, you can submit a pull request already and we can help you with the tests