A faster web, part II
Author : Stephane Rodriguez
Creation date : May 1, 2001.
Topic : better use of the HTML programming language, part 2
Audience : standard, programmers
Keywords : HTML language, W3C, tags, optimization, HTTP transport, web publishing,
content management.





HTTP ? Oh yes !!!!

A short article to describe a quite amazing thing about the internet and the fact that surfing is ways slow these days.

In part I, we have mentioned that many content publishers totally disregard the tiny details about the content they publish. Whether it is not optimized and has real side effects to any of the people who get their web page, they don't mind. Whether HTML editors are adequate to produce today's web pages is an interesting topic, adressed in part I, but it remains to be said that by the way HTML is the binary data transported by the HTTP protocol, and HTTP though old has several interesting design features.

One of them, and we are not going to detail it much is the fact that any content can be sent to you by one or more proxy servers. Thanks to that, companies like Akamaï have started two years ago to install web proxy servers in many points around the globe, so servers are of smaller distance to the ISP or even to you, with the benefit effect of turning trafficked sites to normal sites. That's great and that should go on. At least there are several things to say about this "technology" but we'll keep that for a further discussion.

The real benefit of HTTP regarded as an HTML container is that it could be able to process it before it goes through the network. Then when you receive the content, the binary data could be "un"processed to initial state before it's handed to the web browser and finally shown on screen. But nobody seems to mention the fact that HTTP has these processing features, and what kind of feature would be of interest at this point ?

Surprisingly, yes, few is said about the HTTP compression feature which is, let's say it clearly, native and by design embedded in all 4.0+ web browsers including IE and Netscape.

Whenever you click inside a web page, you send a new URL and invisible headers. Among these headers is one "Accept: gzip, deflate" which is part of HTTP 1.1 and which says : ok, I am a user agent (a web browser client), I agree you send me back compressed data in such a way that I will process it using my zip library implementation (by the way calling the deflate() method). When the process is finished, the resulting content can be processed as usual by the web browser to parse the HTML content and show the page.

What is the benefit of compression ? When you know that on one hand HTML content is text, which statistically is very repetitive because of the HTML tags, it opens wide the window of small content to transport rather than initial and raw content. By the way, using a loseless processing such like zip (zip is known on Unix systems under the name gzip, a library made ten years ago by Mark Adler and Jean-loup Gailly) you are sure not to lose anything. If we assume that 50% is the new size of compressed HTML content, it means that half the time usually needed to get a web page is now needed. And this has nothing to do with the fact that servers are scalable or that HTML content was made using an adequate HTML editor. No, it's pure gain.

What do we need to ensure that it works ? We need web servers understanding the "Accept: gzip, deflate" header. This article would have been useless if there was no top5 web server that would have this feature built-in. Of course, there is one. And this one is Microsoft IIS 5.0 and up. I don't necessarily like products from Microsoft, but I have to say that if for me small content is a concern (my clients send big reports and thus everything would be easier this way) then I would install and use any server that serves this purpose.

For details with IIS 5.0, you have to make sure that you have two DLL files compfilt.dll and gzip.dll on your system and seen in the configuration panel of IIS. Doing so, automatically, HTML content will be compressed whenever the web server sees the mentioned HTTP header. More details in the Microsoft support center website.

This looks fairly easy, especially when you know that IIS 5.0 costs virtually 0 dollar. No installation or update required on the client side. Amazing !!


Stephane Rodriguez, May 1, 2001.
...to be continued on part III.