Francis Duval
03/06/2024, 6:37 PMblack~=22.0
ipython>=8.10
jupyterlab>=3.0
kedro-datasets[pandas.CSVDataset, pandas.ExcelDataset, pandas.ParquetDataset, spark.SparkDataset, plotly.PlotlyDataset, plotly.JSONDataset, matplotlib.MatplotlibWriter, pickle.PickleDataset, tracking.JSONDataset, huggingface.HFTransformerPipelineDataset]>=1.0
kedro-telemetry>=0.3.1
kedro-viz>=6.7.0
kedro~=0.19.1
notebook
pytest-cov~=3.0
pytest-mock>=1.7.1, <2.0
pytest~=7.2
ruff~=0.0.290
scikit-learn~=1.0
seaborn~=0.12.1
pyspark~=3.5.0
langid~=1.1.6
pandas~=2.1.4
plotly~=5.18.0
nltk~=3.8.1
skorch~=0.15.0
First of all, I'm wondering why this is very different from the requirements.txt I get when running pip freeze > requirements.txt
. There are much more packages with pip freeze. Also, pip freeze has the package kedro-dataset, but does not enumerate the different types of datasets (pandas.CSVDataset, pandas.ExcelDataset, etc.). Also, why are the packages requirements not strict (for instance, package_name==3.0)? Looks like this could lead to reproducibility problems.
When installing a new package in my environment, what actions should I take in order to update the requirements? At the end of the day, I just want my project to be as reproducible as possible. I'm a bit lost about all this! Maybe you can direct me to the proper documentation. Many thanks!Deepyaman Datta
03/06/2024, 6:44 PMpip freeze
is the exact packages in your installation, pinned and including transitive dependencies.Deepyaman Datta
03/06/2024, 6:46 PMDeepyaman Datta
03/06/2024, 6:49 PMJuan Luis
03/06/2024, 6:54 PMpip freeze
doesn't account for extras (hence [optional-dependencies]
...Juan Luis
03/06/2024, 6:56 PMrequirements.txt
is the closest we have now to standard lock files in Python (there have been several other attempts: Pipfile, poetry lock files, pdm lock files, and more)
what pip-tools
does is including the non fully resolved dependencies in a file called <http://requirements.in|requirements.in>
and then lock them in a requirements.txt
https://github.com/jazzband/pip-tools/Francis Duval
03/06/2024, 7:02 PMFrancis Duval
03/06/2024, 7:43 PMblack~=22.0
ipython>=8.10
jupyterlab>=3.0
kedro-datasets[pandas.CSVDataset, pandas.ExcelDataset, pandas.ParquetDataset, spark.SparkDataset, plotly.PlotlyDataset, plotly.JSONDataset, matplotlib.MatplotlibWriter, pickle.PickleDataset, tracking.JSONDataset, huggingface.HFTransformerPipelineDataset]>=1.0
kedro-telemetry>=0.3.1
kedro-viz>=6.7.0
kedro~=0.19.1
notebook
pytest-cov~=3.0
pytest-mock>=1.7.1, <2.0
pytest~=7.2
ruff~=0.0.290
scikit-learn~=1.0
seaborn~=0.12.1
pyspark~=3.5.0
langid~=1.1.6
pandas~=2.1.4
plotly~=5.18.0
nltk~=3.8.1
skorch~=0.15.0
Deepyaman Datta
03/06/2024, 7:45 PMpoetry
, you do poetry add
, and it will manage the inserts for you.
For a vanilla requirements.txt
file, I'm not aware of a way to "manage" it; I generally add by hand.Juan Luis
03/06/2024, 10:17 PMpip compile -P nltk
Yury Fedotov
03/07/2024, 2:25 AMpip freeze
would almost always differ a lot from contents of requirements.txt
as they serve different purposes:
ā¢ requirements.txt
is used to reproduce your environment and should include only packages that you actually call from IDE (like linters) or import in your code (like pandas
probably). And as guys said above, ideally it should define loose requirements instead of literal versions.
ā¢ output of pip freeze
just logs all packages in your env, including "direct dependencies" mentioned in requirements and their dependencies (which you don't control). The other way to say this is that this file outlines the outcome of pip resolving your loosely defined requirements to specific versions of each package.