Francis Duval
03/06/2024, 6:37 PMblack~=22.0
ipython>=8.10
jupyterlab>=3.0
kedro-datasets[pandas.CSVDataset, pandas.ExcelDataset, pandas.ParquetDataset, spark.SparkDataset, plotly.PlotlyDataset, plotly.JSONDataset, matplotlib.MatplotlibWriter, pickle.PickleDataset, tracking.JSONDataset, huggingface.HFTransformerPipelineDataset]>=1.0
kedro-telemetry>=0.3.1
kedro-viz>=6.7.0
kedro~=0.19.1
notebook
pytest-cov~=3.0
pytest-mock>=1.7.1, <2.0
pytest~=7.2
ruff~=0.0.290
scikit-learn~=1.0
seaborn~=0.12.1
pyspark~=3.5.0
langid~=1.1.6
pandas~=2.1.4
plotly~=5.18.0
nltk~=3.8.1
skorch~=0.15.0
First of all, I'm wondering why this is very different from the requirements.txt I get when running pip freeze > requirements.txt. There are much more packages with pip freeze. Also, pip freeze has the package kedro-dataset, but does not enumerate the different types of datasets (pandas.CSVDataset, pandas.ExcelDataset, etc.). Also, why are the packages requirements not strict (for instance, package_name==3.0)? Looks like this could lead to reproducibility problems.
When installing a new package in my environment, what actions should I take in order to update the requirements? At the end of the day, I just want my project to be as reproducible as possible. I'm a bit lost about all this! Maybe you can direct me to the proper documentation. Many thanks!Deepyaman Datta
03/06/2024, 6:44 PMpip freeze is the exact packages in your installation, pinned and including transitive dependencies.Deepyaman Datta
03/06/2024, 6:46 PMDeepyaman Datta
03/06/2024, 6:49 PMJuan Luis
03/06/2024, 6:54 PMpip freeze doesn't account for extras (hence [optional-dependencies]...Juan Luis
03/06/2024, 6:56 PMrequirements.txt is the closest we have now to standard lock files in Python (there have been several other attempts: Pipfile, poetry lock files, pdm lock files, and more)
what pip-tools does is including the non fully resolved dependencies in a file called <http://requirements.in|requirements.in> and then lock them in a requirements.txt https://github.com/jazzband/pip-tools/Francis Duval
03/06/2024, 7:02 PMFrancis Duval
03/06/2024, 7:43 PMblack~=22.0
ipython>=8.10
jupyterlab>=3.0
kedro-datasets[pandas.CSVDataset, pandas.ExcelDataset, pandas.ParquetDataset, spark.SparkDataset, plotly.PlotlyDataset, plotly.JSONDataset, matplotlib.MatplotlibWriter, pickle.PickleDataset, tracking.JSONDataset, huggingface.HFTransformerPipelineDataset]>=1.0
kedro-telemetry>=0.3.1
kedro-viz>=6.7.0
kedro~=0.19.1
notebook
pytest-cov~=3.0
pytest-mock>=1.7.1, <2.0
pytest~=7.2
ruff~=0.0.290
scikit-learn~=1.0
seaborn~=0.12.1
pyspark~=3.5.0
langid~=1.1.6
pandas~=2.1.4
plotly~=5.18.0
nltk~=3.8.1
skorch~=0.15.0Deepyaman Datta
03/06/2024, 7:45 PMpoetry, you do poetry add, and it will manage the inserts for you.
For a vanilla requirements.txt file, I'm not aware of a way to "manage" it; I generally add by hand.Juan Luis
03/06/2024, 10:17 PMpip compile -P nltkYury Fedotov
03/07/2024, 2:25 AMpip freeze would almost always differ a lot from contents of requirements.txt as they serve different purposes:
⢠requirements.txt is used to reproduce your environment and should include only packages that you actually call from IDE (like linters) or import in your code (like pandas probably). And as guys said above, ideally it should define loose requirements instead of literal versions.
⢠output of pip freeze just logs all packages in your env, including "direct dependencies" mentioned in requirements and their dependencies (which you don't control). The other way to say this is that this file outlines the outcome of pip resolving your loosely defined requirements to specific versions of each package.