New Zip Bomb Stuffs 4.5PB of Data into 46MB File
ZIP files have been a handy way to compress information for easier transport and storage for decades. But the realities of 5.25-inch and 3.5-inch floppy longevity made relying on multi-disk zip archives a gamble when it came to long-term data preservation. For years, we’ve known it was possible to create a type of file known as a “zip bomb” — a seemingly small zip file that contained layer after layer of nested zip archives, such that the final unzipped version of the data set would be many orders of magnitude larger than the final version. A file of unknown provenance, named 42.zip, has floated around online for years, packing 4.5PB of data into a 42KB file by using this method. Anti-virus scanners and unzip applications typically now prevent the operation of zip bombs by refusing to be lured into unpacking layer after layer of recursive data.
Researcher David Fifield has developed his own type of zip bomb that improves (or “improves”) on this practice. His file size is much larger, requiring a 46MB base file to expand into a 4.5PB archive — but it doesn’t rely on recursion to achieve its compression.
The reason zip bombs use recursion is because the DEFLATE algorithm used in ZIP parsers can’t achieve a compression ratio higher than 1032:1. If you want more compression than that, you have to recurse. Fifield discovered a way to bypass this limit. As he writes on his blog:
This article shows how to construct a non-recursive zip bomb whose compression ratio surpasses the DEFLATE limit of 1032. It works by overlapping files inside the zip container, in order to reference a “kernel” of highly compressed data in multiple files, without making multiple copies of it. The zip bomb’s output size grows quadratically in the input size; i.e., the compression ratio gets better as the bomb gets bigger. The construction depends on features of both zip and DEFLATE—it is not directly portable to other file formats or compression algorithms. It is compatible with most zip parsers, the exceptions being “streaming” parsers that parse in one pass without first consulting the zip file’s central directory.
In order to make this method work, Fifield had to revisit how data is stored in zip files and choose the appropriate Deflate implementation to make his method work.
He used bulk_deflate, a custom compressor “specialized for compressing a string of repeated bytes,” because it could pack data more densely than zlib, info_ZIP, or Zopfli. While bulk_deflate outperforms these solutions, he notes that it isn’t as efficient in general use-cases. He also had to use an extension of the zip standard known as ZIP64 to create a file with more than 281TB of data output. With ZIP64, you can build a zip bomb of effectively infinite length.
There’s a great deal more information than this on Fifield’s blog, which steps through how the zip bomb was created, the exact modifications to the underlying standard, and evaluates the use of other compression algorithms besides Deflate to tackling the same idea. Bzip2, for example, can also be used to create zip bombs, though it isn’t quite as efficient at doing so.
Some anti-virus applications that can detect recursive zip bombs can already detect this method of creating them as well, and Fifield thinks it’ll be fairly easy to secure against them. Still, it’s an example of how code can be creatively modified to enable new types of high compression files that weren’t previously known to be possible. While an exceedingly simple attack, a zip bomb can be thought of as analogous to a DoS attack against a single system, in some ways. By hogging all available CPU, RAM, and storage resources, a machine can be rendered unresponsive and unavailable. It’s an attack method that dates to the earliest days of the internet (the first known zip bomb was uploaded in 1996). The continuing research on the topic is an interesting engineering story, even if the potential for a massive attack is fairly low.