Workshop: High Performance Webpages
Date: Sunday, 4/15/2007
Presenters: Steve Souders and Tenni Theurer, Yahoo!
This is my first set of notes from O’Reilly’s Web 2.0 Expo. The workshop was “High Performance Webpages,” a dense, three-hour session with two leads on Yahoo!’s front-end optimization team (Steve Souders and Tenni Theurer.) This team has done some solid work breaking down the best practices for making a website speedy and responsive.
The presentation also advocated for optimization from the point of view of the user. Distinguishing between “response-time” optimization (request to page load) and “efficiency” optimization (minimization of resource use), the team made a very strong case that the latter optimization is premature. Having access to Yahoo!’s server logs, they have collected a large body of data that suggests that for rich sites like ours, 80%-90% of the user’s response time is eaten up by the client.
On a side note, I think Yahoo! has done the developer community a huge service by making all of this information public. This effort is part and parcel with their open-source release of the YUI toolkit. I’ve just added the YUI blog to my RSS reader.
The full set of rough notes are after the jump. These include empirical data on a number of experiments that the team conducted around frontend performance, as well as a distilled set of 14 best practices for speeding up a web application.
Why Focus on Front-end Performance?
- Note: They defined response time as the time from when the initial request is sent to when the onload event is fired in the browser.
Q: Have their been any studies corresponding drop rates to response times? A: Yahoo doesn’t have these numbers. Drop rates vary considerably across Yahoo’s properties. [A thought: drop rates are important, but so are the total number of page views per session. A t Instructables, we've found that people will be more willing to explore (and thus generate more clicks) if the site is snappier. For an ad-based model, this is key.]
The Browser Cache Experiment
With a full cache, there are 83% fewer bytes, 90% fewer HTTP requests than when the user’s cache is empty. This experiment tested how many users came to a site with full versus empty caches.
The team added a 1px image to a page with response headers set so that 200 (new request) and 304 (conditional get; browser only checks the last-modified date) response codes could be compared. This was for the front page and search, discounting the activity of robots and spiders.
Results: Unique visitors with an empty cache dropped to an average steady state of about 45% within a week. Page views with an empty cache dropped to about 20% in the same time frame. Across properties, the percentages varied according to turnover, but not much. For example, unique visitors with an empty cache varied only between 40% and 60%.
Conclusion: Empty caches will be common, even at the steady state. You can’t optimize only for the full cache scenario.
Measuring Cookie Performance
Background: Processing of a cookie in the browser is initiated by the HTTP response for the main page, in the Set-Cookie field. Further HTTP requests to the same domain will send Cookie data back to the server. To measure cookie performance, the team varied cookie size (with gibberish in the cookies) to see what the delay would be in total response time.
Results: The delta in response time increased linearly, up to a 80 ms at 3000 Bytes. Totaled with other cookies, they believed that these delays were significant. Looking at other sites, they found that total cookie sizes range up to 500 Bytes (20% of users) up to 1000 Bytes (80% of users.) 98% of users are below 1500 Bytes. Major sites have cookies on their front pages ranging from less than 100 Bytes (Yahoo, Google) up to 500 Bytes (MySpace).
Conclusions: Eliminate unnecessary cookies. Try to keep cookies at the subdomain level so that cookies on one subdomain aren’t loaded on every other property in the domain. Use the Path variable, if possible. Set the expires date or the cookie will be purged at the end of the session.
[Thought: This seems like a small issue if 98% of users are below 1500 Bytes (estimated at 40ms.) Also, in the short term, Instructables is also unlikely to push a bunch of unnecessary cookies into the user's browser, since we have only one dev team and no subdomains.]
[To look into, later: Does Firebug profile cookie processing time?]
Pros and Cons of Parallel HTTP Requests
Browsers will request a lot of different referenced files (CSS, JS, images, etc.) in parallel. IE generally does two at once, while Firefox can be configured to do any number, defaulting to four. These limits are per-domain, so even if the user agent isn’t configured to do more parallel downloads, developers sometimes use subdomains to increase the number of files that are downloaded in parallel.
One would think that more parallel downloads speeds things up. Is this the case? The team conducted an experiment with a single page containing images in blocks of twenty. They measured load times for both small images (<1Kb) and larger ones.
Results: The team found that, for small images in blocks of twenty, performance didn’t increase much after 2 parallel requests. And surprisingly, for large images, performance actually degraded very badly. They theorized that this was due to CPU thrashing and DNS lookups (using the subdomain solution.)
14 Rules for lowering Response time (in order of importance):
- Make fewer HTTP requests
- HTTP requests can be made in parallel, but these are the largest bottleneck.
- Image Maps can be used so that fewer images are downloaded (e.g. toolbar). Use client-side image maps to get good cursor effects. It does need to be contiguous and is tedious and more difficult to maintain, but really reduces the number of requests.
- CSS Sprites: the #1 way of reducing HTTP requests, using a large image and background-position in CSS. Savings on headers for each image. See http://alistapart.com/articles/sprites. Drawbacks: accessibility and printing.
- Inline Images: Binary data in the src attribute. However, doesn’t work in IE! It’s cool, since you can even put this data in a CSS file, which will be cached and keeps HTML responses small.
- Combine scripts and CSS files. Drawbacks here apply more to large, multi-product sites where there are large numbers of combinations of styles. This can be done at build time or it can be done dynamically by directing the script request to the app layer (e.g. a servlet that knows which scripts to combine and gzip.)
- Use a CDN (Content Delivery Network, e.g. Akamai)
- Less important for smaller sites or for geographically centrallized user bases.
- Add an Expires header
- Make sure that as many clients cache files, as often as possible. This applies to all files, not just images. Drawback: files must have their names changed (e.g. timestamps appended) when new versions are created.
- Gzip components
- Don’t use Deflate, don’t Gzip binaries, and don’t Gzip anything less than 1KB.
- Edge cases: IE 6.0 has an issue with Gzip <1% of the time.
- Put CSS at the top
- IE won’t render a page until all of the CSS has been processed. If the stylesheet is at the end, you’ll get a white screen while you wait. Firefox renders as you go, which means you get the “flash of unstyled content.”
- Solution: Placing the stylesheet reference in HEAD avoids this issue in both browsers.
- Use LINK and not @import, since the latter will only get called at the end of the page.
- Move JS to the bottom
- JS can’t be downloaded in parallel! Browsers have to wait, since scripts can create new requests.
- To avoid this issue, put script references as low in the page as you can, preferably at the very end.
- Note that in FF, CSS will block parallel downloads as well.
- Avoid CSS expressions
- CSS expressions allow JS-like expressions to be embedded in CSS.
- This is an IE-only feature that is extremely inefficient, since CSS expressions are evaluated a lot (e.g. constantly, on mousemove). Avoid it.
- Hacky workarounds include expresions that overwrite themselves.
- Make JS and CSS external
- Optimization between page size (internal) and number of HTTP requests (external.)
- Most of the time, given page views per user (i.e. empty vs. full cache ration), it’s better to have your referenced files external.
- Workarounds: Post-onload downloading (using JS). You can even set a cookie once the components are downloaded so that subsequent requests keep references external.
- [Look this part up; confusing]
- Reduce DNS Lookups
- Typically 20-120 ms, weighted to the low end. In addition, DNS lookups block parallel downloads.
- Time to Live (TTL) settings: Shorter means that you have more lookups. Longer means that your servers are less responsive to failover.
- Have keep alive turned on, use fever subdomains.
- Minify JS
- You can also obfuscate, but the latter makes the code much harder to debug. Minify is safer.
- http:// dojotoolkit.org/docs/shminlsafe (sp?)
- Avoid Redirects
- 3xx status codes slow things down (DNS lookups block all downloading.)
- If you have to use a redirect, set the expires header.
- Use web server settings (Alias, DirectorySlash, mod_rewrite, or CNAMES.)
- Remove Duplicate Scripts
- Duplicate calls happen pretty often when dev teams arent paying attention.
- Script insertion functions are useful, since they can handle not only avoidance of duplicate calls, but also resolution of dependencies and versioning.
- Remove ETags
- Etags will cause headers to mismatch, causing re-downloading of referenced files. This especially bad when you have more than one server.
- Make AJAX Cacheable and Small
- Apply rules 1-13 to AJAX content as well. This rule is actually a higher priority as AJAX request sizes increase.
- For dynamic data, you can use a time stamp in the URL instead of a random string so that the client can still cache the responses.
Resources to look up:
- High Performance Web Sites (Souders, the speaker)
- Firefox extension for analyzing performance (Fasterfox)
- IBM Page Detailer (Packet sniffer, Windows only)
- Yahoo User Interface Blog (YUIblog.com)
- Keynote and Gomez: Performance-measuring apps.
- YSlow – a soon-to-be-released FF extension working with Firefox to grade your code against the 14 rules.