[ Team LiB ] |
How to Optimize Your HTMLOptimizing HTML is a matter of using the fewest number of bytes to deliver a valid page that renders properly. There are a number of techniques you can use to shrink your HTML. These include removing whitespace, omitting optional closing tags and quotes, removing redundant tags and attributes, cutting comments, and minimizing HTTP requests. This last point is important to keep in mind. We'll delve more deeply into graphics in Chapter 12, "Optimizing Web Graphics." First, let's start with the DOCTYPE declaration. Step 1: Choose the Right DOCTYPEAs far as speed goes, there are two things to consider when choosing a DOCTYPE and coding style: DOCTYPE switching and parsing speed. By now you've seen the three DTDs you can use at the top of HTML documents (see Chapter 5, "Extreme XHTML" for details). Depending on the DTD that you choose and some internal parameters the browser developers have chosen, browsers switch to one of two or three "modes" to render your HTML document: standards, almost standards, and quirks modes. For more information on "almost standards" mode, see http://www.mozilla.org/docs/web-developer/quirks/doctypes.html. Standards Mode versus Quirks ModeThe "standards" mode can be fastest because the parsing code is smaller and less complex. The browser is most likely to invoke this mode when you use the strict DTD. Keep in mind that there's not a one-to-one relationship between the DOCTYPE you choose and the mode browsers switch into, although they do have an influence.[13] For more details, see Matthias Gutfeldt's article on DOCTYPE switching, available at http://gutfeldt.ch/matthias/articles/doctypeswitch.html.
There are some tradeoffs, however. Strict means just that�strict. No deprecated tags allowed. Some designers may have to rethink how they lay out pages. A common method some authors use is to first markup and validate the structure of their documents and then add presentation in the form of style sheets. As you can imagine, quirks mode by its very nature is slower to parse HTML than standards mode. The parsers for quirks mode are necessarily more complex to allow for all the permutations of looser legacy markup. Coding StyleFor HTML, your coding style can affect download and display speed. When you use the strict DTD and close all tags, the browser can use a faster parsing algorithm and does less work inserting and matching tags. Your pages will render faster but will be slightly larger because of the closing tags. On the other hand, omitting optional closing tags can yield smaller pages that will download faster but render slightly slower�yet they will still be valid. Either method is a valid approach, but the former makes for a smoother transition to XHTML. One approach is to fully optimize home pages (omit optional closing tags and quotes) and create well-formed interior pages (use XHTML-like HTML with closing tags and fully qualified attributes). Step 2: Minimize HTTP RequestsBecause each HTTP request takes an indeterminate amount of time, it's important to minimize the number of HTTP requests per page. As discussed earlier, the browser issues HTTP GET requests, and the HTTP server responds with the requested object. Each object�including HTML pages, images and multimedia, external style sheets, and JavaScript�takes one HTTP request. To get a better idea of how to improve the conversation between client and server, let's take a look at a real-world example. Let's optimize the prototype of Elivad.com's home page (see Figure 3.1). Elivad claims to boost click-through rates for online ads. Figure 3.1. Elivad.com prototype with graphic rollovers.Here's an abbreviated log file of Elivad.com, which uses graphic rollovers after loading this page:
Notice the number of HTTP requests. Each corresponds to an image, an external CSS or JavaScript file, or the HTML file itself. The total number of bytes transferred is 50,580 bytes, with 20 HTTP requests. The seven graphic rollovers account for 14 of these HTTP requests, two for each item. This home page takes about 15 seconds to download at 56Kbps. This is due to the total size of the page (50,580 bytes at 4,500 bytes per second) and the latency introduced from HTTP requests. Each HTTP request takes an indeterminate about of time, depending on network conditions. Jakob Nielsen found that on average it takes about 1/2 to 2 seconds per HTTP request.[14]
An Optimized ExampleNow let's take that same page and optimize it. By converting images and rollovers into text, consolidating and optimizing images, and optimizing JavaScripts and style sheets, you can minimize the number of requests and speed up your pages. By replacing the graphic rollovers and JavaScript with CSS rollovers and any buttons with links and colored backgrounds, you can eliminate the majority of images. So instead of this:
Do this:
Here's the same page after replacing the graphic rollovers with CSS (see Figure 3.2). Figure 3.2. Elivad.com after optimization.Here's the log file after loading this optimized page:
Notice that the number of HTTP requests decreased from 20 to 6. By replacing the graphic rollovers with CSS, you eliminate 14 HTTP requests. The page now weighs in at 22,027 bytes and takes about six seconds to load on a 56K modem. The HTML is 41 percent smaller, having eliminated the JavaScript rollover code. The CSS is slightly larger, because of the CSS rollover added to the navigation bar links, but that's a small price to pay for the reduced HTTP request load. The images were also optimized to save space. Overall the page feels much faster and has the same functionality as before. Most importantly, we've brought the load time down below eight seconds. Modern browsers and servers (HTTP 1.1) send out multiple simultaneous requests to save time. Even with HTTP keep-alive, each round trip adds more time because the message has to traverse the Internet from client to server and back again. Each hop in this path adds latency due to packet loss and network load. By minimizing the number of HTTP requests required by your pages, you can speed up their display, and lower delay variability. This is especially important for any external files in the head of your document, which must be processed before the visible body content. Multiple external CSS and JavaScript files are now common, but they can dramatically slow down your pages. You'll learn how to defer, consolidate, and eliminate these external files in Chapter 7, "CSS Optimization," and Chapter 9, "Optimizing JavaScript for Download Speed." Step 3: Remove WhitespaceThe average web page has between 20 and 35 percent extra whitespace, according to Insider Software (http://www.insidersoftware.com/) and WebTrimmer (http://www.glostart.com/webtrimmer/). Browsers don't care how pretty your markup is; they're just looking between tags�real or implied. Those extra spaces, tabs, and returns make your markup easier to read but slower to display. Spaces between and inside tags can also be removed. Indents are typically used by programmers and WYSIWYG editors to make the markup more legible and the document's structure more obvious. So instead of this:
Do this:
Or even better:
Removing whitespace and tightening up things saves over 50 percent for this code snippet. This whitespace is entirely unnecessary (with some exceptions for JavaScript) for browsers rendering HTML. They see the HTML file as a stream of bytes with tags interspersed around data. Indents and spaces before or at the end of lines are simply wasted bandwidth and are ignored by browsers. If necessary, you can re-beautify your markup for editing by using sophisticated text editors like BBEdit and Homesite or by using regular expressions or short shell scripts. Step 4: Tighten Up Comma-Delimited AttributesSome tags allow a comma-delimited list for attribute values. The most common are the <keywords> meta tag, the <map> coordinate attribute, and the <style> tag. Browsers and search engines ignore leading spaces before comma-delimited attributes. You can save some space by omitting spaces when using commas or omitting commas altogether for the <keywords> meta tag. See Chapter 15, "Keyword Optimization," for more details. The style tag allows comma-delimited lists. The same principle applies. So instead of this:
Do this:
Or even better:
Sharp-eyed readers will see that there is one additional optimization that can be made to this style sheet. For more details, see Chapter 7, "CSS Optimization." Step 5: Omit Redundant Tags and AttributesIn many cases, attributes or tags are redundant and can be safely eliminated. Elements (formatting or otherwise) need to be placed only around blocks of HTML text. Here's an example:
This code becomes the following (which is invalid markup):
Even better, get rid of the deprecated <font> tag and remove optional closing tags, like this:
You could use an id instead of a class attribute to identify the list element here; however, id names must be unique within the document and cannot be used on multiple elements like class can. Redundant AttributesRedundant attributes generally specify the default attribute value for that particular element. These include align="left" for p, h1, h2 and other heading tags, table, tr, and td elements and border="0" for non-linked images. Tables are a common place to find redundant attributes. The td tag can be aligned individually, but you also can control the alignment of an entire row by using the tr tag, which saves space. For strict XHTML compliance, use style sheets to align the contents of tables. So instead of this:
Do this:
Or even better:
Step 6: Omit Optional Quotes, If You DareThe HTML 4.01 specification allows certain attribute values to be unquoted. Attribute strings that contain only alphanumeric characters (A-Z, a-z, 0-9), hyphens, periods, underscores, and colons can be unquoted. Any attribute values that include other characters must be quoted. This means that you can do this:
But not this:
Values with spaces, symbols, or links require quotes in HTML. For example:
Browsers are quite liberal in what they accept for HTML markup. Some sites take advantage of this and omit quotes entirely, violating the HTML Recommendation. Yahoo!, the busiest site on the web, omits quotes from their link tags. For example:
Note the lack of quotes here. Also notice those funny-looking URLs that start with r/. To abbreviate their URLs, Yahoo! uses redirects to save bandwidth. We'll discuss how to use automatic URL abbreviation in Chapter 4, "Advanced HTML Optimization," and go into more detail in Chapter 17, "Server-Side Techniques." Omitting quotes for URLs works on all current browsers and saves Yahoo! three percent off their home page,[15] but it is invalid HTML. However, there's no guarantee that future browsers won't require quotes around links. I recommend that you quote all attribute values that require them, and to get ready for XHTML, you may want to consider quoting all attribute values regardless of whether the quotes are needed in HTML.
NOTE If you are adventurous, you can learn more about Extreme HTML Optimization at http://www.webreference.com/authoring/languages/html/optimize/. Step 7: Omit Optional Closing TagsA number of elements in HTML don't technically require closing tags because the elements that follow them imply closure. These include p, li, option, and even body and html. Even table row and data closing tags (</td> and </tr>) are not technically required by the HTML specification, but Netscape 3 does not render tables properly without them. So instead of this:
Do this:
Even better:
Keep in mind that this practice violates XHTML where all tags must be closed and attributes fully qualified. Early versions of Netscape 6 also can fail to properly apply CSS and execute dynamically written external JavaScripts when elements are improperly nested. So, make sure that your elements are nested properly and your HTML is validated.[16]
Again, this is a tradeoff between page size and rendering speed. Using HTML with all tags closed makes your pages render slightly faster, whereas using HTML without optional closing tags makes your pages download faster but still validate. At current bandwidth-to-CPU-speed ratios, bandwidth is the limiting factor. Step 8: Minimize Colors and Character EntitiesYou can save some space by optimizing your color references and character entities. In HTML 4.01, each offers numeric references and named references to colors and character entities. In some cases, the named reference is shorter; in others, the numeric reference is shorter. Choose the shortest reference to save a few bytes. As color attributes are deprecated in HTML 4.01 and XHTML, you can use style sheets to specify colors to save even more space, because in some cases, they can be optimized to use shorthand hexadecimal colors. Character ReferencesFor characters outside of the default (or specified) character encoding scheme, you can use SGML character references to specify special characters. In HTML, you can specify character entities in two ways:
In some cases, using the numeric reference is shorter than the named reference and vice versa (that is, reg and deg). To see the full list of character entities, go to http://www.w3.org/TR/html401/sgml/entities.html. ColorsColor attributes can be represented numerically or with one of 16 named colors. In HTML, you specify colors using RBG hexadecimal triplets, like #RRGGBB:
You also can specify colors using named colors. Because Internet Explorer 2 defined only 16 colors, there are 16 colors in the HTML 4.01 specification. Color names are case-insensitive. For example:
The W3C has deprecated the use of colors in HTML attributes in favor of style sheets, however, so this information applies primarily to style sheets. Named colors outside the 16 listed in the specification are not recommended because different browsers support different numbers and types of named colors. (Remember the infamous 216 browser-safe colors?) The hexadecimal values are unambiguous and can use fewer characters than their named equivalents, although some named colors use less than their hex equivalents ("red," for example). The sixteen named colors are listed here:
In modern browsers, version 3 and up, except for buggy behavior in Mac IE3,[17] RGB triplets can be abbreviated if each of the R, G, and B hex pairs are the same, thus:
The browser automatically expands three-character colors into six by duplicating the R, G, and B values. You'll learn more about color abbreviation in Chapter 7, "CSS Optimization." For more information, see the HTML 4.01 specification.[18]
Step 9: Cut the CommentsHTML comments are often used to mark major sections of documents. These comments can help teams of designers locate insertion points for new or changed content. Unfortunately, users have to download your entire HTML file including your comments, and browsers don't display them. Therefore, comments should be abbreviated. So instead of this:
Do this:
Abbreviating your comments can dramatically reduce file sizes, especially for heavily commented pages. This technique saves 31 bytes, or 75.6 percent (31/41). Embed Labels in ElementsA more efficient technique is to eliminate comments entirely. Instead of peppering your code with placeholding comments for other designers to key off of, eliminate the comments by shunting them into surrounding elements. For example, you could shunt the preceding comment label into an element's id, like this:
This technique of shunting labels into elements saves 41 bytes or 85 percent over the original (41/48). By using a template system, you can include any comments you need within the template. A script could then periodically strip out all comments from the page, and output the final optimized page. We use this technique on Webreference.com's home page.[19]
You can eliminate comments and id labels entirely. Using SSI or a content management system, you can merge separate files into one optimized template and have the best of both worlds. Editors can update only the parts of the page that need to change, and the server or content management system can assemble the optimized page. Pages Are Not Digital Dumping GroundsHTML files, especially crucial high-traffic pages, are not digital blackboards where designers can freely scrawl comment graffiti. They are not repositories for old or seldom-used commented-out blocks of markup. HTML documents should be designed to make it easy for users to get your information, not for the convenience of designers. Step 10: Minimize alt Valuesalt values are important for a number of reasons, not the least of which is the fact that there are over 49 million disabled people in the U.S. alone.[20] Vision-impaired users rely on alt values to navigate graphics-rich sites. Frequently we're seeing graphics-only designs with no alt values. This makes for a beautiful slow-loading site that is not usable with graphics turned off. Available for the applet, area, and img elements, alt attribute values can be optimized and eliminated altogether for non-functional graphics.
So instead of this:
Do this:
For functional graphics, alt values should be descriptive, not generic. Imagine that you are a visually impaired person surfing your site�what would you want to know about that image? So instead of this:
Do this:
Make your alt values short and sweet, and don't try stuffing them with too many keywords. For client-side image maps, alt values allow non-graphical browsers to present the map as a list of links identified by the alt labels. Accessibility and OptimizationThe Americans with Disabilities Act includes Section 508, which provides accessibility rules for electronic and information technology that is created or procured by a U.S. Federal agency (http://www.section508.gov/). These rules/laws have been passed to ensure equal access for all to publicly available information. By providing alternative content, you can remove barriers for people with disabilities and maximize your potential audience. Optimization does not preclude accessibility; it actually enhances it if done properly (see the W3C's "Web Accessibility Guidelines" at http://www.w3.org/WAI/). Pages that use text to convey information display faster and are more easily read by people with or without disabilities. Step 11: Minimize the headBrowsers interact with servers in discrete-sized messages and parse HTML pages sequentially. The head must be parsed before the rest of your page is rendered. By minimizing the size of the head of your pages, you can speed your content's initial display. This is especially important for busy home pages. Excess CSS or JavaScript, especially multiple external files, can both delay content display and lower your page's search engine relevance. Minimize meta Tagsmeta tags are HTML tags that you place in the head section of your page. They let you specify metadata information about your document in a variety of ways. meta tags are designed to help automated agents process metadata about your document and to help show where your document fits into the web. Many sites overuse meta tags, however, by stuffing in data that could best be omitted or handled more efficiently by the server. The meta element identifies the properties of a document (such as author, expiration date, a list of keywords, and so on) and assigns values to those properties. For example, one way to specify the author of a document is to use the meta element, as follows:
The meta element specifies a property ("Author") and assigns a value to it ("Dr. Seuss"). meta tags can be used to specify default scripting, style sheet languages, or character sets. They also can be used to specify keywords to help search engines classify your pages. Additionally, meta tags can be used to augment HTTP headers sent from the server (although most of these are redundant) and have even been extended to include taxonomy classification systems like the Dublin Core.[21]
Here's a list of some popular meta tags:
The only meta tags you really need to use are the description and keywords tags, and possibly the default scripting language. All the others are superfluous or can be handled more efficiently by server settings. |
[ Team LiB ] |
No comments:
Post a Comment