Posted September 25, 2018 by Nolan Lawson in Web.
We all want to make faster websites. The question is just what to measure, and how to use that information to determine what’s “slow” and what could be made faster.
So in this post, I want to demystify some of these concepts, and offer techniques for accurately measuring what’s going on when we render things on the web.
The web rendering pipeline
When we use the handy Performance profiler in the Chrome Dev Tools, we see something like this:
This is a view of the CPU costs of our component, in terms of milliseconds on the UI thread. To break things down, here are the steps required:
- Calculate style – taking a CSS stylesheet and matching its selector rules with elements in the DOM. This is also known as “formatting.”
- Calculate layout – taking those CSS styles we calculated in step #2 and figuring out where the boxes should be laid out on the screen. This is also known as “reflow.”
- Render – the process of actually putting pixels on the screen. This often involves painting, compositing, GPU acceleration, and a separate rendering thread.
All of these steps invoke CPU costs, and therefore all of them can impact the user experience. If any one of them takes a long time, it can lead to the appearance of a slow-loading component.
The naïve approach
Style and layout calculations, however, are 100% measurable because they block the main thread. And yes, this is true even with something like Firefox’s Stylo engine – even if multiple threads can be employed to speed up the work, ultimately the main thread has to wait on all the other threads to deliver the final result. This is just the way the web works, as specc’ed.
What to measure
As it turns out,
requestAnimationFrame will be our main tool of choice, but there’s a problem. As Jake Archibald explains in his excellent talk on the event loop, browsers disagree on where to fire this callback:
Now, per the HTML5 event loop spec,
requestAnimationFrame is indeed supposed to fire beforestyle and layout are calculated. Edge has already fixed this in v18, and perhaps Safari will fix it in the future as well. But that would still leave us with inconsistent behavior in IE, as well as in older versions of Safari and Edge.
Also, if anything, the spec-compliant behavior actually makes it more difficult to measure style and layout! In an ideal world, the spec would have two timers – one for
requestAnimationFrame, and another for
requestAnimationFrameAfterStyleAndLayout (or something like that). In fact, there has been some discussion at the WHATWG about adding an API for this, but so far it’s just a gleam in the spec authors’ eyes.
Unfortunately, we live in the real world with real constraints, and we can’t wait for browsers to add this timer. So we’ll just have to figure out how to crack this nut, even with browsers disagreeing on when
requestAnimationFrame should fire. Is there any solution that will work cross-browser?
Cross-browser “after frame” callback
There’s no solution that will work perfectly to place a callback right after style and layout, but based on the advice of Todd Reifsteck, I believe this comes closest:
Let’s break down what this code is doing. In the case of spec-compliant browsers, such as Chrome, it looks like this:
rAF fires before style and layout, but the next
setTimeout fires just after those steps (including “paint,” in this case).
And here’s how it works in non-spec-compliant browsers, such as Edge 17:
rAF fires after style and layout, and the next
setTimeout happens so soon that the Edge F12 Tools actually render the two marks on top of each other.
So essentially, the trick is to queue a
setTimeout callback inside of a
rAF, which ensures that the second callback happens after style and layout, regardless of whether the browser is spec-compliant or not.
Downsides and alternatives
Now to be fair, there are a lot of problems with this technique:
setTimeoutis somewhat unpredictable in that it may be clamped to 4ms (or more in some cases).
- If there are any other
setTimeoutcallbacks that have been queued elsewhere in the code, then ours may not be the last one to run.
- In the non-spec-compliant browsers, doing the
setTimeoutis actually a waste, because we already have a perfectly good place to set our mark – right inside the
However, if you’re looking for a one-size-fits-all solution for all browsers,
setTimeout is about as close as you can get. Let’s consider some alternative approaches and why they wouldn’t work so well:
rAF + microtask
rAF + requestIdleCallback
requestIdleCallback from inside of a
requestAnimationFrame will indeed capture style and layout:
However, if the microtask version fires too early, I would worry that this one would fire too late. The screenshot above shows it firing fairly quickly, but if the main thread is busy doing other work,
rICcould be delayed a long time waiting for the browser to decide that it’s safe to run some “idle” work. This one is far less of a sure bet than
rAF + rAF
This one, also called a “double
rAF,” is a perfectly fine solution, but compared to the
setTimeoutversion, it probably captures more idle time – roughly 16.7ms on a 60Hz screen, as opposed to the standard 4ms for
setTimeout – and is therefore slightly more inaccurate.
You might wonder about that, given that I’ve already talked about
setTimeout(0) not really firing in 0 (or even necessarily 4) milliseconds in a previous blog post. But keep in mind that, even though
setTimeout() may be clamped by as much as a second, this only occurs in a background tab. And if we’re running in a background tab, we can’t count on
rAF at all, because it may be paused altogether. (How to deal with noisy telemetry from background tabs is an interesting but separate question.)
setTimeout, despite its flaws, is probably still better than
Not fooling ourselves
In any case, whether we choose
setTimeout or double
As an example, let’s consider what would happen if our style and layout costs weren’t just invoked by the event loop – that is, if our component were calling one of the many APIs that force style/layout recalculation, such as
If we call
The important point here is that we’re not doing anything any slower or faster – we’ve merely moved the costs around. If we don’t measure the full costs of style and layout, though, we might deceive ourselves into thinking that calling
getBoundingClientRect() is slower than not calling it! In fact, though, it’s just a case of robbing Peter to pay Paul.
It’s worth noting, though, that the Chrome Dev Tools have added little red triangles to our style/layout calculations, with the message “Forced reflow is a likely performance bottleneck.” This can be a bit misleading in this case, because again, the costs are not actually any higher – they’ve just moved to earlier in the trace.
(Now it’s true that, if we call
getBoundingClientRect() repeatedly and change the DOM in the process, then we might invoke layout thrashing, in which case the overall costs would indeed be higher. So the Chrome Dev Tools are right to warn folks in that case.)
However, it’s important to understand how the HTML5 event loop works, and to place performance marks at the appropriate points in the component rendering lifecycle. This can help avoid any mistaken conclusions about what’s “slower” or “faster” based on an incomplete view of the pipeline, and ensure that style and layout costs are accounted for.
I hope this blog post was useful, and that the art of measuring client-side performance is a little less mysterious now. And maybe it’s time to push browser vendors to add
requestAnimationFrameAfterStyleAndLayout (we’ll bikeshed on the name though!).
Thanks to Ben Kelly, Todd Reifsteck, and Alex Russell for feedback on a draft of this blog post.