Notes from JS Conference Workshop: Extreme Web Performance
Posted on Mar 16, 2017
WPO stands for Web Performance Optimization. It’s similar to SEO and it’s a real field of interest. The Velocity conference just deals with WPO.
Service optimization makes up 16%. If you optimize the server by 50%, you see a 10% improvement.
Front end optimization makes up 84%. If you optimize the front end by 50%, you see a 45% improvement.
Popular Web Page Test Tool
There is even a book for the website by O’Reilly called “Using WebPageTest”
The lady sitting next to me was knitting something instead of paying attention to the speaker. Seriously!?
Snapshot Filmstrip View
WebPageTest.org can show you a filmstrip view of snapshots of a page being loaded so you can see how the page loads over time
Speed Index (SI)
Speed Index is a metric for how fast a ATF (above-the-fold) content is rendered. If you compare gmail.com to amazon.com, the both have the same page load time but Amazon has a much higher (better) speed index. So, Amazon’s ATF content displays much faster than Gmail’s which leads to a better experience for users because they feel that the site loaded faster. The speed index on the first viewing should be different and better than that of the 2nd and subsequent viewings since latter viewings should load content from cache. A speed index of 1000ms (1s) is good. whitehouse.gov has a speed index of 4.5s. Speed index varies with viewport or device because it depends on the ATF / viewport dimensions and other factors. Page load time is not the most important metric anymore. SI (Speed Index) is most important. TTFB (Time-To-First-Byte) is the it takes before the first byte of a resource or asset is downloaded. TTFB is usually not a problem.
Google Lighthouse is an automated tool to improve the quality of your website / app but heavily testing it and producing an audit report.
httparchive.org, by Steve Souders, is like archive.org but it keeps track of website http headers, etc over time, including HAR files which contain waterfall chart data.
100ms feels like immediate feedback
1s leads to a user losing their flow of thoughts
Google RAIL has the following goals with respect to user perception:
R = Response. Get a response within 100ms
A = Animation. You need 16ms per frame to achieve a 60 fps (frame per second) refresh rate to eliminate page jank / stutter
L = Load. The page should load within 1s.
There are many different browsers on mobile. Believe it or not, the “UC Browser” represents 15% of the mobile browser market on Android. It’s from users in Asia. Also, many users or apps load pages in “web views” which similar to browsers, e.g. Facebook views. Some people privately post websites to themselves so they can test how their website looks in Facebook “views”. Many, many people (30%) view websites from within Facebook so they’re not even viewing from a real browser.
Simulator = an app that looks like but isn’t really the real thing, e.g. iOS simulator, Chrome Dev Tools responsive mode
Emulator = a built-on machine running the same code base, e.g. Android Genymotion virtual machine. Unfortunately, Though Genymotion is an Android emulator, it doesn’t come with Chrome so you can’t test in Chrome.
WebPageTest.org uses real devices on the back end to run their tests. It’s doesn’t use simulators.
Samsung Remote Test Lab is a free tool for testing Samsung devices.
Mac / iOS has a network link conditioner to throttle connection bandwidth.
- DNS Resolution. Takes about 100ms per domain
- TCP Handshake
- TLS (https) negotiation
- HTTP Request
- Headers: User agent, more data about the request
- Server receives the HTTP request
- HTTP Response
- Browser parses HTML response
- Browser gets a list of additional resources and does the whole HTTP process for each resource
- Browser starts rendering the page
There’s no reason not to upgrade your server to HTTP 2.0. It has many nice features including
- Header Compression
Each http header is about 1K per request so compression is good. If there are 45 requests, then there’s 45K from just headers without compression.
- TCP Connection Reuse
With HTTP 1.0, you only get 6 concurrent simultaneous connections per origin (domain, protocol, port). With HTTP 2.0, this is not an issue.
- Push to Cache
HTTP 2 requires TLS to encrypt all data so proxies can’t mess with data en route.
Users can see between a 15% – 60% performance improvement with HTTP 2.0. HTTP 2.0 requires TLS (https). Many CDNs offer HTTP 2.0 including CloudFlare.
Bandwidth and Latency
Only 25% of people in the world are on 4G.
2G vs 3G vs 4G vs Home Wifi. 4G still has very low latency compared to home wifi.
Latency (data transfer delay) is different from bandwidth. Many times the real issue is with latency.
- keep-alive: keep the connection alive to reduce the number of TCO handshakes
- content-encoding: use gzip compression, etc, where applicable
- cache headers
- Request # 1
- max-age: specified on server, e.g. 2 days
- Request # 2
- File in Cache Expired
If cache is expired (max-age exceeded), get the file only if it was modified. Possible responses are
- 304 (File not modified so don’t get it)
- 200 (File modified so get it)
- File in Cache Not Expired
In the file in the cache hasn’t expired, get the file from cache.
- File in Cache Expired
Available in some browsers, e.g. Chrome, it is a background process on the client. It can answer for the server and replace all network steps. Service workers have their own cache.
- Enable gzip on text-based files. If enabled, it’ll say “content-encoding: gzip” in the headers.
- Reduce DNS resolution queries to 50-120ms per request.
- Enable keep-alive to reduce TCP handshakes.
- Make static content expire in the far future, e.g. 2030. If you need to change a file, e.g. logo, change the filename or use a hash, e.g. logo.png?20170228 using an automated build tool.
If you’re on HTTP 1.1, where you’re limited to download 6 resources at a time, use different domains to download more than 6 at a time, even if the domains are CNAMEs to the same server. Based on tests, a maximum of 2 domains is ideal. Sharding to more than 2 domains won’t make a difference. If you’re on HTTP 2, domain sharding isn’t necessary because HTTP 2 multiplexes multiple connections.
Server static content from a cookie-less domain.
Reduce cookie size / gzip cookies.
eg. whitehouse.gov redirects to www.whitehouse.gov. This redirect costs a delay of 300 ms which, if our goal is to load ATF (above-the-fold) content within 1s, is 1/3 of our available time.
Example of ridiculous redirect chains:
Basic Optimization Priority
- Optimize Images
- Optimize HTML
- Optimize CSS
- Optimize JS
CSS blocks rendering. Browser must download CSS before it can render a page.
JS blocks parsing if it encounters a JS script tag. Because of this, before you use to put JS at the top of the page / code, but now you should put it at the bottom.
- Compress images.
- Be careful with images that can block a page, e.g. by putting images as base-64-encoded CSS properties. Base-64-encoding images is okay for small images but put them in a JSON file or something, not in a CSS file.
- Embrace responsive images.
- Embrace SVG images which can be gzipped.
- Don’t use interstitial ad banners.
- Avoid client-side rendering.
- Embrace plain HTML for initial loading. This leads to an average 5x increase in page load performance.
- Announce DNS queries ASAP, e.g. by prefetching them using <link rel=”dns-prefetch” href=”http://newdomain.com“>
- Minify and compress CSS
- Combine multiple CSS files even though HTTP 2 supports more than 6 parallel requests
- Add <link> for CSS include at top of HTML as soon as possible. Don’t use @import.
- Push CSS request on HTTP 2.
- Minify and compress JS
- Combine multiple JS files even though HTTP 2 supports more than 6 parallel requests
- Use non-blocking scripts, e.g. use <script defer>.
- Don’t use async for loading jQuery. Async is OK for loading tracking JS.<script async>
- Load JS on-demand when users will use it.
- Stop binding to page-onload event; bind to document-onload.
- Release the main JS thread ASAP, e.g. by using setTimeout(doSomethingHeavy, 0); or use web workers.
- Be careful with web fonts that need to be downloaded. It’s better to use a default font and then replace it with a web font for ATF (above-the-fold) content even if it causes a flash of changing font.
- Use tools to remove characters from font files that you won’t use.
Rendering / Painting
Avoid writing code that depend on the CPU to repaint the screen. Code that uses the GPU is better. This is important on scroll, transitions, and animations. For example, CSS transforms and opacity are GPU-based but border-radius, gradients, shadows, and filters are CPU-based. To visually see where page jank / paint flashing occurs, open Chrome Dev Tools > Rendering > Paint Flashing.
The new enemy is not the user but the designer! Designers need to know how to design pages in such a way that they won’t adversely affect performance.
On 3G networks, there’s a minimum of 600ms overhead for network connection. If our goal is to load the ATC content in 1s, this leaves only 400ms left to load the ATF content. We only care that the ATF content be loaded in 1s because user perception of a fast-loading page is more important than having the entire page actually loading in 1s, which is much harder to accomplish. On mobile, to achieve an ATF content load time within 1s, this requires a max of 1 RTT (round trip request) which requires the request load be about 14 KB (compressed). If the request load is larger than 14 KB, this will cause another round trip. Therefore, in order to achieve a 1s ATF load time, we must combine HTML, CSS, and JS into one, single file and it must be <= 14KB compressed. To help with this, we can separate ATF content from the rest of the content.
- Load CSS media queries asynchronously
- User server-side libraries like WURLF or DeviceAtlas
Navigation Timing API
To help with analyzing performance, use
- the available timestamps
Extreme Optimization Tips
- Deliver ATF content in 14 KB compressed
- Embed all CSS and JS that is actually needed
- If there’s available space in the 14KB max, embed a logo and or low-res images
- Don’t use blocking web fonts
- Considering implement AMP (Accelerated Mobile Pages)
- Consider SD (standard definition) vs HD (high definition) images for different connection speeds
- Create a custom cache
- Predict the near future, e.g.
- bind on mousedown event instead of on click event. This will save 100 ms. Amazon does this.
- Start actions on hover event to save 200 ms (use this technique with caution)
- Prefetch the next resources, e.g. <link rel=”prefetch” href=””>
- Prerender resources, e.g. <link rel=”prerender” href=””>
- Lazy load resources, e.g. images. Facebook does multistage downloading of content, e.g. Don’t download BTF (below-the-fold) content unless people scroll.
- Don’t unnecessarily use redirects so you can reduce the number of requests.
- Load the ATF content in 1s and defer the rest of the content.
- Zopfli: 8% improved compression over gzip and deflate.
- Brotli: supported in Chrome 49+ and TLS only (Facebook is using this). 15-20% better than gzip.
Alternative Image Formats
- Zopfli PNG
Use the <picture> tag to let the browser decide which image format to download.
Behave like a magician
Trick users into feeling that your website is super fast, e.g.
- When users click the Facebook “Like” button, the response is instant but in reality, it’s not done processing yet. Same for Gmail.
- Render placeholders instead of progress indicators since the latter makes users feel your site is slow.
The conference presentation is available at http://tinyurl.com/fwd17perf