In the python project I work on at my workplace, we install some packages from PyPI, and some private company packages from Gemfury, using a standard requirements file.
After reading this article: https://medium.com/@alex.birsan/dependency-confusion-4a5d60fec610.
Our requirement file looks something like:
--index-url <OUR_GEMFURY_URL>
--extra-index-url https://pypi.python.org/simple
aiohttp==3.7.1
simplejson==3.17.1
<our-package>==1.0.0
<our-other-package>==1.2.0
I tried reading some of pip's documentation but I wasn't able to fully understand how it chooses from where to download the package. For example, what happens if someone uploads a malicious version 1.0.0 to pypi-prod - how does pip know which one of the packages to take? Is there maybe a way to specify to pip for a specific package to only search for it in --index-url?
How do you protect against dependency confusion in your code? Thanks for the help!
The article mentions the algorithm pip uses:
So if your script requires
<our-other-package>>=1.2.0
, you can get some mailicios package from public pypi server if it has higher than the version you intented to install.The straightforward solution mentioned in the article is removing
--extra-index-url
If
package 1.0
is internal or external package and is present in private pypi server it will be downloaded from there.External packages will be downloaded from public pypi server through internal pypi server which will cache them for future usage.
I'd also suggest to have explicit versions in requirements.txt, this way you are aware of versions you get and do conscious upgrades by increasing the versions.
To sum up the guidelines (which by no means exhaustive and protect against all possible security holes)
--extra-index-url https://pypi.python.org/simple
frompip.conf
,requirements.txt
and automation scripts.