What follows is a few key considerations about the security of the Python Package Index (PyPI) ecosystem,
commonly used when you
pip install <thing>. I’d like to make it very clear that I love and use
a regular basis, and I believe the team running it is doing a good job. Using open-source code to leverage
other peoples work and prevent reinventing the wheel is an absolutely good thing, but one that should
be done with awareness of the risks inherent in trusting third-party code in your supply chain.
These points don’t follow any real theme other than “you may not have known or truly thought about the implications”.
Anyone can create a PyPI package
Packages on PyPI do not undergo any type of vetting and are generally not curated in the way that an app on the App Store or package in Debian’s apt is. Just because it is nicely packaged and obtained via the official pip installer does not give code any more trust than code cloned from github or copy/pasted from pastebin.
If a person with malicious intent can convince you to install their python package, they have code execution on the system you’re installing it to (code execution both during installation, and when you import/use the library).
PyPI doesn’t currently support cryptographic signing of packages
As outlined by Donald Stufft in this excellent post from 2013, signing packages in a system like PyPI where anybody can upload and share code is a very non-trival problem to get right. Currently, there are only a few simple methods of integrity at play with PyPI and pip.
First, pip utilizes TLS between your client and the PyPI servers, meaning there’s a generally robust authentication+integrity+confidentiality process between you and PyPI.org. It extends no further however, so a compromise of PyPI.org or any of the upstream storage is not in the threat model TLS aims to solve.
Second (as described in the next point), you can pin a specific package by cryptographic hash, but it’s somewhat trust-on-first-use (TOFU), as you have to obtain the hash fingerprint from either PyPI.org or by obtaining the package and hashing it yourself at some point in time. If we’re assuming a threat model involving a compromise of PyPI.org, this doesn’t buy you anything.
You can pin a version of a library by version and cryptographic hash
By design PyPI will not let a user modify or re-release an already released version of a package, which should prevent an attacker from switching out an existing versioned library for a malicious copy. Furthermore, you can pin the actual hash of the package which prevents even a compromised PyPI from being able to give you bad code.
A typical requirements.txt file with a pinned version which will prevent a modification based on a new version will look like this:
A hashed file can be created by either looking up the hash on PyPI.org or manually downloading and hashing the package. The corresponding requirements.txt should then be created like so:
In order to check hashes upon install, pip must be run like so:
Should an error be detected, you’ll see this error:
PyPI supports strong 2FA but not for uploads
PyPI supports strong 2FA (both U2F and app-based TOTP) but doesn’t currently encourage or mandate users to use it, so adoption is fairly unknown. This is somewhat irrelevant to a reasonable degree however, as uploading can be done with either a token or username/password, with no 2FA requirements.
As mentioned earlier if you’re pinning the version or cryptographic hash, an attacker cannot replace a specific version and obviously not bypass hash checking. What they can do however, is release a newer version of a package and if you’re not pinned to a specific version/hash, you’ll get the malicious package and be exposed to code execution. When you run an initial
pip install <thing>, security of your system is completely reliant on the author of that given package having a strong password and excellent credential hygene. Unfortunately, I suspect we’ll see attackers increasingly target library authors credentials as a way to get a foothold into a wide variety of important systems.
The pypi.org page doesn’t offer any clues to trustworthiness
Some folks may look at the PyPi.org page for some clues as to the trustworthiness of a package and the packages author. Unfortunately there are no solid signals, and in fact some signals can trivially mislead a user.
A notable example is that an attacker who uploads a malicious package can specify any github URL for their project, upon which pypi.org will display the number of stars/forks for that gihub project on the pypi page. Following the standard packaging tutorial without changing the github URL in the manifest results in a pypi package page claiming thousands of stars on github.
IE, using the following config for your package and referencing urllib3’s github:
Will display the github stats for that package on your malicious PyPI page: