flawed.net.nz

What follows is a few key considerations about the security of the Python Package Index (PyPI) ecosystem, commonly used when you pip install <thing>. I’d like to make it very clear that I love and use pip on a regular basis, and I believe the team running it is doing a good job. Using open-source code to leverage other peoples work and prevent reinventing the wheel is an absolutely good thing, but one that should be done with awareness of the risks inherent in trusting third-party code in your supply chain.

These points don’t follow any real theme other than “you may not have known or truly thought about the implications”.

Anyone can create a PyPI package

Packages on PyPI do not undergo any type of vetting and are generally not curated in the way that an app on the App Store or package in Debian’s apt is. Just because it is nicely packaged and obtained via the official pip installer does not give code any more trust than code cloned from github or copy/pasted from pastebin.

If a person with malicious intent can convince you to install their python package, they have code execution on the system you’re installing it to (code execution both during installation, and when you import/use the library).

PyPI doesn’t currently support cryptographic signing of packages

As outlined by Donald Stufft in this excellent post from 2013, signing packages in a system like PyPI where anybody can upload and share code is a very non-trival problem to get right. Currently, there are only a few simple methods of integrity at play with PyPI and pip.

First, pip utilizes TLS between your client and the PyPI servers, meaning there’s a generally robust authentication+integrity+confidentiality process between you and PyPI.org. It extends no further however, so a compromise of PyPI.org or any of the upstream storage is not in the threat model TLS aims to solve.

Second (as described in the next point), you can pin a specific package by cryptographic hash, but it’s somewhat trust-on-first-use (TOFU), as you have to obtain the hash fingerprint from either PyPI.org or by obtaining the package and hashing it yourself at some point in time. If we’re assuming a threat model involving a compromise of PyPI.org, this doesn’t buy you anything.

The great news is there is work underway to introduce a signing system. This was proposed in PEP458 and has been funded by facebook, and appears to be based on The Update Framework.

You can pin a version of a library by version and cryptographic hash

By design PyPI will not let a user modify or re-release an already released version of a package, which should prevent an attacker from switching out an existing versioned library for a malicious copy. Furthermore, you can pin the actual hash of the package which prevents even a compromised PyPI from being able to give you bad code.

A typical requirements.txt file with a pinned version which will prevent a modification based on a new version will look like this:

example-pypackage==0.0.1

A hashed file can be created by either looking up the hash on PyPI.org or manually downloading and hashing the package. The corresponding requirements.txt should then be created like so:

example-pypackage==0.0.1 --hash=sha256:25194e5b2b1991b9df35dc44ebf3ecd617f89a27d4cee1db68d13f4d3fd46c92

In order to check hashes upon install, pip must be run like so:

pip install -r requirements.txt --require-hashes

Should an error be detected, you’ll see this error:

ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
    example-pypackage==0.0.1 from https://files.pythonhosted.org/packages/ba/78/b75db0a6275f55983f78a49941c986f40a032f10543544833a16f2653e0d/example-pypackage==0.0.1-py3-none-any.whl#sha256=7512dada4722e14ffae959e6cd0043d4b3d4287d9212c50221eda202e34ce9d6 (from -r requirements.txt (line 1)):
        Expected sha256 25194e5b2b1991b9df35dc44ebf3ecd617f89a27d4cee1db68d13f4d3fd46c92
             Got        7512dada4722e14ffae959e6cd0043d4b3d4287d9212c50221eda202e34ce9d6

PyPI supports strong 2FA but not for uploads

PyPI supports strong 2FA (both U2F and app-based TOTP) but doesn’t currently encourage or mandate users to use it, so adoption is fairly unknown. This is somewhat irrelevant to a reasonable degree however, as uploading can be done with either a token or username/password, with no 2FA requirements.

As mentioned earlier if you’re pinning the version or cryptographic hash, an attacker cannot replace a specific version and obviously not bypass hash checking. What they can do however, is release a newer version of a package and if you’re not pinned to a specific version/hash, you’ll get the malicious package and be exposed to code execution. When you run an initial pip install <thing>, security of your system is completely reliant on the author of that given package having a strong password and excellent credential hygene. Unfortunately, I suspect we’ll see attackers increasingly target library authors credentials as a way to get a foothold into a wide variety of important systems.

The pypi.org page doesn’t offer any clues to trustworthiness

Some folks may look at the PyPi.org page for some clues as to the trustworthiness of a package and the packages author. Unfortunately there are no solid signals, and in fact some signals can trivially mislead a user.

A notable example is that an attacker who uploads a malicious package can specify any github URL for their project, upon which pypi.org will display the number of stars/forks for that gihub project on the pypi page. Following the standard packaging tutorial without changing the github URL in the manifest results in a pypi package page claiming thousands of stars on github.

IE, using the following config for your package and referencing urllib3’s github:

import setuptools

with open("README.md", "r", encoding="utf-8") as fh:
    long_description = fh.read()

setuptools.setup(
    name="example-package-89723789423", 
    version="0.0.1",
    author="Example Author",
    author_email="author@example.com",
    description="A small example package",
    long_description=long_description,
    long_description_content_type="text/markdown",
    url="https://github.com/urllib3/urllib3",
    packages=setuptools.find_packages(),
    classifiers=[
        "Programming Language :: Python :: 3",
        "License :: OSI Approved :: MIT License",
        "Operating System :: OS Independent",
    ],
    python_requires='>=3.6',
)

Will display the github stats for that package on your malicious PyPI page:

PyPI Github Stats