We're releasing a security advisory for uv: CVE-2025-54368.
This advisory is for versions of uv up to and including v0.8.5. Users should upgrade to v0.8.6 or later.
This advisory concerns two distinct parsing differentials between uv's asynchronous ZIP parser and other ZIP parsers used in the Python ecosystem.
An attacker could exploit either of these differentials to craft a ZIP file that is extracted differently across uv and pip (or vice versa), giving a malicious result to the targeted installer type while presenting an innocent result to the other. Similarly, an attacker could contrive a ZIP that appears harmless to security scanners that use on ZIP parser, but that actually contains a malicious payload when extracted by other parsers. The attacker could do either of these while preserving the overall cryptographic digest of the ZIP.
Before we get into the full details, a TL;DR:
-
Thanks to a triage effort with the Python Security Response Team and PyPI maintainers, we were able to determine that the differentials were not exploited via PyPI during the time they were present. The PyPI maintainers have added additional checks to that will prevent the future uploading of distributions that could exploit these differentials.
-
This advisory concerns uv in particular, but the underlying vulnerability (differentials in ZIP parsing due to the ZIP format's ambiguity) affects the Python ecosystem more broadly. While we've patched uv, other installers and consumers of ZIP distributions may need to take similar steps to reduce the risk of differential parses.
-
For most users, the risk of exploitation is low: no malicious ZIPs were discovered on PyPI, and PyPI now guards against these parsing differentials with additional upload checks. However, users of third-party package indices are strongly encouraged to evaluate their non-PyPI sources for differential ZIPs.
-
For most users, there should be no breaking changes from upgrading to uv v0.8.6: a review of the top 15,000 packages on PyPI revealed only three distributions with ZIP encoding errors, each of which is innocent (suggesting an archiving/repacking error rather than attempt to exploit a differential). These types of innocent ZIP errors are permitted by uv, while potentially exploitable differentials are not. Users who do encounter breaking changes can set
UV_INSECURE_NO_ZIP_VALIDATION=1
to restore the previous behavior, and should report the malformed distribution to its upstream.
While we believe the practical impact is low, we take the hypothetical risk of parser differentials very seriously. Out of an abundance of caution, we have assigned this advisory a CVE identifier and have given it a "moderate" severity suggestion.
With that out of the way, let's get into the details of the vulnerability, how it happened, and how we fixed it.
Distributions, ZIPs, and streaming #
Like most other open source ecosystems, the Python packaging ecosystem is essentially in the business of throwing archives of package releases (called distributions) over the wire to installers (uv, pip, etc.).
There are two primary flavors of distributions in Python packaging: source and built distributions ("wheels"). Source distributions are essentially archives of source repository state at release time (with an accompanying build step), while wheels are pre-"built" archives of the package that can be unarchived into a Python environment with only a minimal amount of processing.
These two flavor represent a tradeoff: source distributions are "universal" but require a distribution-controlled build step (in the form of arbitrary code), while wheels don't need build-time code execution but may be constrained to a specific platform, Python version, architecture, etc.
Regardless of the flavor, distributions are frequently distributed as ZIP archives: wheels are always ZIPs, while source distributions can be either ZIPs or tarballs. Consequently, installers like uv and pip are in the business of downloading and extracting ZIP archives.
To stream or not to stream #
One of the things that makes uv so fast is its ability to stream distributions directly from an index.
Streaming allows uv to download parts of the distribution, like the metadata needed for dependency resolution, which means that uv can defer downloading the entire distribution until the version to install is determined. This can add up to significant performance savings, as uv can often backtrack during resolution without having to download each candidate up-front.
To stream ZIP archives efficiently, uv uses an asynchronous ZIP parser. This parser takes advantage of some of the ZIP format's peculiarities, not least of which is its "header-last" design: instead of starting with a header, a well-formed ZIP begins with one or more local file entries.
Each local file entry has, in turn:
- A local file header, which provides (inter alia) the file's name, compression scheme, and compressed size;
- An encryption header, which doesn't matter for our purposes;
- The file data itself, which is compressed (or stored verbatim, if the "stored" compression scheme is used);
- A data descriptor, which also doesn't matter for our purposes.
The sequence of local file entries in a ZIP archive effectively make up the contents of that archive. This layout makes the work of a streaming parser very simple: the parser can scan forward without needing to seek backwards or perform random access on a stream.
Following the local file entries is the ZIP "header," which ZIP calls the central directory. The central directory mirrors the local file entries: each local file entry has a corresponding central directory file header that contains the same size, compression scheme, and filename information, along with additional metadata (like filesystem permissions) to aid in extraction. Each central directory file header also cross-references its corresponding local file entry via an offset1.
These central directory file headers are in turn followed by an end of central directory record (EOCDR). The EOCDR contains (inter alia) the offset to the start of the central directory.
The vulnerabilities #
As hinted above, ZIP is an unusual file format by modern standards2: the "header" (central directory) is really a footer, one that doesn't immediately appear to be necessary to consult while parsing.
We can also see that the ZIP format incentivizes two very different parsing strategies:
- Forward stream parsing is encouraged by the fact that local file entries appear first and can be processed without random access. Moreover, the fact that the central directory appears mostly redundant with the local file entries makes it tempting to process it as little as possible, and instead rely solely on the local file entries as the ground truth for the archive's contents.
- Backwards scanning is encouraged by the fact that the central directory appears at the end of the ZIP, and that each "header" member (the EOCDR and each central directory file header) contains an offset to an earlier part of the archive.
We'll now see how the differences in these parsing strategies can lead to observable (and potentially exploitable) differentials.
Dangling files #
A local file entry that's missing a corresponding central directory file header is considered invalid by the ZIP specification3, but streaming parsers (like the one used in uv) typically ignore this requirement.
This gives the attacker the ability to create a ZIP that extracts differently across installers: an installer that processes the central directory will receive one set of files, while an installer that processes only the local file entries will receive a different set of files.
uv exhibits one variant of this differential, but the challenge of reconciling these two states is more general: both "sections" of the ZIP can contain duplicate file names, and the ZIP specification does not require any particular conflict resolution strategy.
"Doubled" ZIPs #
A ZIP archive contains a single central directory, identified by a single EOCDR4.
A ZIP parser is expected to scan (typically backwards5) through the file for the EOCDR6, which then provides the start of the central directory via an offset field. The parser can then seek to this offset to parse the central directory, per above.
Unfortunately, the ZIP specification is ambiguous about the nature of this offset: it's not described as either absolute (i.e. from the start of the ZIP) or relative (i.e. from the EOCDR's own offset).
As a result of this specification-level ambiguity, real-world ZIP parser interpret the offset
differently: Python's zipfile
interprets it as relative to the EOCDR, while uv's ZIP handling
interprets it as absolute. Like with dangling local files, this is sufficient for a confusable
extraction: some parsers will consult the "relative" central directory while cross-referencing local
file entries, while others will consult the "absolute" central directory.
Fixing the vulnerabilities #
We made the following changes to uv to address the differentials above:
- uv now reconciles the local file entries with the central directory file headers, and will refuse to process a ZIP that contains any inconsistencies between the two (in either direction).
- uv now fully consumes the central directory, including the "end of central directory" record7, while streaming. This allows uv to assert that the ZIP stream is fully exhausted, and that there is no trailing data that could be interpreted as a second ZIP archive with its own central directory.
In addition to the two differentials above, we've also made a number of proactive changes to uv's streaming ZIP parser to reject malformed or otherwise questionable ZIPs. These include rejecting:
- ZIPs with other mismatches between the local file entries and the central directory, including mismatches in a file's reported compressed and uncompressed sizes.
- ZIPs that contain mis-recorded compressed and uncompressed sizes, e.g. where the local file entry's claimed uncompressed size does not match the actual uncompressed size of the corresponding file data.
- ZIPs that contain incorrect or contextually invalid8 CRC32 checksums.
- ZIPs that contain an EOCDR "comment field" that appears to contain ZIP control structures, such as a nested EOCDR.
These proactive changes may but don't necessarily reflect potential differential sources; they're more about bringing uv closer to the gold standard of full recognition before parsing. We believe that other consumers of ZIP archives in the Python ecosystem (including analysis tooling and other Python package installers) should consider similar hardening steps as a proactive measure.
Concluding notes #
The fixes above have been released in uv 0.8.6.
We thank Caleb Brown (Google) and Tim Hatch (Netflix) for reporting these vulnerabilities to us. Additionally, we thank Seth Larson (PSF Security Developer-in-Residence) and Mike Fiedler (PyPI Safety and Security) for their triage efforts and coordination on a PyPI-level check that makes all Python users safer, regardless of their installer of choice. We encourage readers to review PyPI's security announcement as well.
Beyond these specific vulnerabilities, we observe that the ZIP format has a number of qualities that make it susceptible to parsing differentials:
-
It has a many-versioned and legally encumbered specification9;
-
It has numerous specification-level ambiguities that encourage implementors to accept inputs liberally, including a lack of clarity around whether offsets to control structures are relative or absolute;
-
It encourages techniques that are appropriate on "trusted" inputs (like eager decompression without checking the central directory) that are inappropriate on untrusted inputs (like ZIPs from third-party indices).
Separately, we observe that the differentials described above are not unique to stream parsing: they're more obvious in a streaming context, but the lack of clarity in the ZIP specification around intended parsing direction along with central directory location means that the same differentials can (and do) occur in ordinary "seeking" ZIP parsers as well.
Given these qualities, we suspect that ZIP parser differentials will continue to surface in both the Python ecosystem and other packaging ecosystems that make use of ZIP archives, regardless of parser structure. These differentials challenge long-standing assumptions about the "transitive" integrity10 of ZIP archives, and will require ecosystem-wide coordination and standards efforts to address systematically.
Footnotes #
-
The ZIP format dates back to 1989, and includes provisions for technical constraints that aren't relevant to modern computing. Perhaps most notably, the ZIP format supports being partitioned across multiple "disks," so that a ZIP archive could be split across multiple floppies. This need for partitioning also likely informed the "header-last" design, as it allows the ZIP to be decompressed eagerly in a single pass (as well as amended without needing to rewrite the entire archive). ↩
-
The ZIP format's encouragement of "backwards" scanning is itself a rich differential source: the trailing structures of a ZIP archive can include arbitrary data, meaning that a backwards-searching parser can be confused by an attacker who places an EOCDR signature (or other ZIP structure signatures) in e.g. the "real" EOCDR comment field. ↩
-
We're glossing over the distinction between "ZIP" and "ZIP64" here. In short: there are really two different EOCDR records, and a ZIP parser must be prepared to handle both an EOCDR and an EOCDR64 while locating the central directory's start, if the former indicates the latter. ↩
-
ZIP requires that directory entries have a CRC32 of
0
, so for example we reject ZIPs that have a non-zero CRC32s in directory entries. ↩ -
APPNOTE section 1.4. This is also why we're only linking to the specification, not quoting it. ↩
-
i.e., whether a ZIP whose cryptographic digest or signature is trusted can be assumed to contain only files that are also trusted. ↩