Forcing Gzip Compression

Andy Martone <martone@google.com>
June 23, 2010

Gzip is Great!

~70% reduction in page size

~50% faster transfer time on DSL

~60% faster transfer time on dialup

This is old news, right?

Content-Encoding defined in HTTP 1.1 RFC 2616 (1999)

Browser support since IE 4, Netscape 6, Opera 5, etc.

Even Lynx 2.6 supports gzip!

What's the big deal?

At Velocity 2009, Tony Gentilcore reported that ~15% of web traffic is uncompressed.

There's no way that's all old browsers, right?

Some proxies and security software strip or mangle the Accept-Encoding header!

An idea

For requests with missing or mangled Accept-Encoding headers, inspect the User-Agent to identify browsers that should understand gzip.

Test their ability to decompress gzip.

If successful, send them gzipped content!

But... what about the RFC?

RFC 2616 states:

If no Accept-Encoding field is present in a request, the server MAY assume that the client will accept any content coding. In this case, if "identity" is one of the available content-codings, then the server SHOULD use the "identity" content-coding, unless it has additional information that a different content-coding is meaningful to the client.

Looks like we found some wiggle room!

Testing for additional information

When to test

Inspect all requests missing a valid Accept-Encoding header.

Look at the User-Agent.

If it's a "modern" browser...

(IE 6+, Firefox 1.5+, Safari 2+, Opera 7+, Chrome)

And if the request does not have a special cookie...

Run a test.

Example request to test

GET / HTTP/1.1
Host: www.google.com
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 GTB7.1 GTBA
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
AXXept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Cookie: XXX
Cache-Control: max-age=0

Running the test

At the bottom of a page, inject JavaScript to:

Check for a cookie.

If absent, set a session cookie with a stop value.

Write out an iframe element to the page.

<script>
   if (!document.cookie.match(/GZ=Z=[0,1]/) {
     document.cookie = 'GZ=Z=0';
     var i = document.createElement('iframe');
     i.src = '/compressiontest/gzip.html';
     // Append iframe to document.
   }
</script>

Running the test

The browser then makes a request for the iframe contents.

GET /compressiontest/gzip.html HTTP/1.1
...
Cookie: GZ=Z=0
AXXept-Encoding: gzip,deflate

Running the test

The server responds with an HTML document containing a JavaScript block in the body, served with the following headers:

Content-Type: text/html
Pragma: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Cache-Control: no-cache, must-revalidate
Content-Encoding: gzip

We do not want this response to be cached.

The response body is always served compressed, regardless of the Accept-Encoding header on the request.

Running the test

If the browser understands the compressed response, it executes the JavaScript and sets the session cookie to a "compression ok" value.

<script>
  ...
  document.cookie = 'GZ=Z=1; path=/';
</script>

If the browser does not understand the response, it silently fails and the cookie value remains the same.

Forcing compression

Subsequent requests from the client will contain this session cookie with its updated value.

The server always sends compressed content to requests that contain the cookie with the "compression ok" value.

We are only able to compress the response to the second request.

The server never sends the compression testing JavaScript to requests that contain the stop value in the cookie.

It works!

Google Web Search successfully compresses a significant number of responses to requests that don't send a valid Accept-Encoding header.

Average size of the HTML contents of a search results page is about 34KB.

Gzip compresses the page by about 70% down to about 10KB!

This reduces page load time for affected requests by ~15%!

This provides a noticeable latency win and bandwidth savings for both Google and users.

Latency win at the tail end of the distribution!

Who are these users?

All browsers, all countries

Higher percentage of IE users than normal traffic

IE6 has a lower success rate than IE7 and IE8

4x more likely to be behind a known proxy

Cyclical pattern, dropping off on weekends:

What not to do

The iframe request MUST end in .html, served with a Content-Type of text/html.

<iframe src="/compressiontest/gzip.html">

The following will show a user-visible error in various versions of IE:

<iframe src="/compressiontest">
<script src="/compressiontest/gzip.js">

Don't be afraid to be aggressive with cookie lifetime.

Original cookie expiration of 1 or 2 hours didn't provide much benefit.

Other considerations

Risk of false positives.

Advantage of using a session cookie - restart the browser and behavior resets.