We have long ignored the topic of browsers, CSS and accessibility and decided to return to it with the translation of today's overview material (original - February 2020). I am especially interested in your opinion on the server rendering technology mentioned here, as well as on how urgent is the need for a full-fledged book on HTTP / 2 - however, let's talk about everything in order.
This post describes some techniques to speed up the loading of front-end applications and thus improve the usability.
Let's take a look at the general architecture of the frontend. How do you ensure that critical assets load first and maximize the likelihood that those assets will already end up in the cache?
I will not dwell on how the backend should deliver resources, whether you need to make your page in the form of a client application at all, or how to optimize the rendering time of your application.
Overview
Let's break down the app download process into three separate steps:
- Primary rendering - how long will it take before the user sees something?
- App download - how long will it take before the user sees the app?
- – ?
Until the stage of primary rendering (rendering), the user simply cannot see anything. To render a page, you need at least an HTML document, but in most cases you also need to load additional resources, such as CSS and JavaScript files. If available, the browser can start rendering to the screen.
In this post, I will be using WebPageTest waterfall diagrams . The cascade of requests for your site will look something like this.
A set of other files are loaded along with the HTML document, and the page is rendered after they are all in memory. Please note that CSS files are loaded in parallel, so each subsequent request does not significantly increase the delay.
Reducing the number of render-blocking requests
Stylesheets and (by default) script elements do not allow any content below them to be displayed.
There are several ways to fix this:
- We put script tags at the very bottom of the tag
body
- Load scripts asynchronously using
async
- Write small chunks of JS or CSS inline if you want to load them synchronously
Avoid rendering blocking query chains
It's not just the number of render-blocking requests that can slow your site down. What matters is the size of each of those resources that need to be downloaded, as well as when exactly the browser detects that the resource needs to be downloaded.
If the browser becomes aware of the need to download the file only after another request completes, then you may end up with a chain of synchronous requests. This can happen for several reasons:
- Having rules
@import
in CSS - Using web fonts referenced in a CSS file
- JavaScript injection link or script tags
Consider this example:
One of the CSS files on this site contains a rule
@import
for loading a Google font. Thus, the browser must execute the following requests one by one, in this order:
- Document HTML
- Application CSS
- Google Fonts CSS
- Google Font Woff file (not shown in cascade)
To fix this, we first move the request for Google Fonts CSS from
@import
to the link tag in the HTML document. This will shorten the chain by one link.
For even greater speedup, embed the Google Fonts CSS file directly into your HTML or your CSS file.
(Remember that CSS response from Google Fonts depends on the user agent. If you make a request using IE8, the CSS will link to an EOT file (embedded by OpenType), IE11 will receive a woff file, and modern browsers will receive a woff2. But if you are happy with it work as with relatively old browsers using system fonts, you can simply copy and paste the contents of the CSS file.)
Even after the page starts rendering, the user may not be able to do anything with it as no text will be displayed until the font is fully loaded. This can be avoided by using the font-display swap property , which is now the default in Google Fonts.
Sometimes it is not possible to completely get rid of the chain of requests. In such cases, try using the
preload
or tag preconnect
. For example, the site shown above might connect to fonts.googleapis.com
before the actual CSS request is made.
Reusing server connections to speed up requests
Typically, establishing a new server connection requires 3 round-trip passes between the browser and the server:
- DNS lookup
- Establishing a TCP connection
- Establishing an SSL connection
Once the connection is established, at least 1 more round trip is required: send a request and download a response.
As shown in the cascade below, connections are initiated to four different servers: hostgator.com, optimizely.com, googletagmanager.com, and googelapis.com.
However , subsequent requests to the affected server can reuse the existing connection. Therefore,
base.css
or index1.css
are loaded quickly, since they are also located on hostgator.com.
Reducing file size and using content delivery networks (CDN)
Two other factors that you control affect the duration of the request, along with the file size: the size of the resource and the location of your servers.
Send the user the minimum required amount of data, moreover, take care of their compression (for example, using brotli or gzip).
Content Delivery Networks (CDNs) provide servers in a wide variety of locations, so there is a good chance one of them will be located near your users. You can connect them not to your central application server, but to the nearest server on the CDN. Thus, the data path to and from the server will be significantly reduced. This is especially useful when working with static resources such as CSS, JavaScript, and images because they are easy to distribute.
Bypassing the network with service workers
Service workers allow you to intercept requests before they enter the network. Thus, the first render can happen almost instantly !
Of course, this only works if you want the network to simply send a response. This response should already be cached, thus making life easier for your users when they re-download your application.
The service worker shown below caches the HTML and CSS required to render the page. When reloaded, the application tries to issue cached resources, and if they are not available, it turns to the network as a fallback.
self.addEventListener("install", async e => {
caches.open("v1").then(function (cache) {
return cache.addAll(["/app", "/app.css"]);
});
});
self.addEventListener("fetch", event => {
event.respondWith(
caches.match(event.request).then(cachedResponse => {
return cachedResponse || fetch(event.request);
})
);
});
For more information on preloading and caching resources using service workers, see this tutorial .
Application download
Okay, our user has already seen something. What else does he need to be able to use our application?
- Loading the app (JS and CSS)
- Loading the most important data for a page
- Download additional data and images
Please note that not only loading data over the network can slow down rendering. When the code is loaded, the browser needs to parse, compile, and execute it.
Splitting the bundle: load only the necessary code and maximize cache hits.
By splitting the bundle, you can download only the code that you need only for this page, and not download the entire application. When splitting a bundle, it can be cached in parts, even if other parts of the code have changed and need to be reloaded.
Typically, the code consists of three different types of files:
- Code specific to this page
- Shared application code
- Third party modules that rarely change (great for caching!)
Webpack can automatically split the split code to reduce the overall weight of the downloads, this is done using optimization.splitChunks . Be sure to enable the runtime chunk so that the hashes of the chunks remain stable and long-term caching can be used usefully. Ivan Akulov has written an in-depth guide on sharing and caching Webpack code.
The splitting of page-specific code cannot be done automatically, so you have to identify snippets that can be loaded separately. This is often a specific route or set of pages. Use dynamic imports to lazy load such code.
Splitting the bundle results in more requests being made to fully load your application. But, if the requests are parallelized, this problem is not big, especially on sites using HTTP / 2. Note the first three queries in this cascade:
However, this cascade also shows 2 queries executed in sequence. These fragments are needed only for this page, and are dynamically loaded using a call
import()
.
This can be fixed by inserting a tag
preload link
if you know you will definitely need these fragments.
However, as you can see, the gain in speed in this case may be small compared to the total page load time.
In addition, using preloading is sometimes counterproductive and can lead to delays when other, more important files are loaded. Check out Andy Davis' post on font preloading and how to block primary rendering by loading fonts first and then the CSS that prevents rendering.
Loading page data
Probably, your application is designed to display some kind of data. Here are some tips on how to load data ahead of time and avoid rendering delays.
Don't wait for bundles, start loading data right away.
There may be a special case of a chain of sequential requests: you load an application bundle, and this code already requests page data.
There are two ways to avoid this:
- Embed page data into HTML document
- Start requesting data via an inline script inside the document
Embedding data in HTML ensures that your application doesn't have to wait for it to load. It also reduces the overall complexity of the application by not having to handle loading state.
However, this idea is not so good if fetching data results in a significant delay in the response of your document, as it will slow down the initial rendering too.
In this case, or when serving a cached HTML document using a service worker, you can embed an inline script in the HTML that will load this data. It can be provided as a global promise, like this:
window.userDataPromise = fetch("/me")
Then, if the data is already ready, your application can immediately start rendering, or wait until it is ready.
When using both of these methods, you need to know exactly what data should be displayed on the page, and even before the application starts rendering. This is usually easy to provide for user-specific data (name, notifications ...) but not easy when dealing with page-specific content. Try to highlight the most important pages yourself and write your own logic for each of them.
Don't block rendering while waiting for irrelevant data
Sometimes generating paged data requires slow and complex logic implemented in the backend. In such cases, the ability to load a simplified version of the data first comes in handy, if that's enough to make your application functional and interactive.
For example, an analytical tool might first load all charts and then accompany them with data. Thus, the user will immediately be able to look at the diagram he is interested in, and you will have time to distribute the backend requests across different servers.
Avoid Chains of Sequential Data Queries
This advice may seem contrary to my previous point where I talked about deferring the loading of non-essential data to a second request. However, avoid chaining consecutive requests if a subsequent request in the chain does not provide the user with any new information.
Instead of first asking what a user is logged in and then asking for a list of groups that user belongs to, return the list of groups along with information about the user. You can use GraphQL for this , but a custom endpoint is
user?includeTeams=true
fine too.
Server side rendering
In this case, we mean the advance rendering of the application on the server, so that a full-fledged HTML page is served as a response to a request from a document. This way, the client can see the entire page without having to wait for additional code or data to be loaded!
Since the server is sending just static HTML to the client, your application is still devoid of interactivity at this stage. The application needs to be loaded, it needs to re-run the rendering logic, and then attach the required event listeners to the DOM.
Use server-side rendering if you find that non-interactive content is valuable on its own. Also, this approach helps to cache the HTML that was displayed there on the server, and then transfer it to all users without delay when the document is first requested. For example, server-side rendering is great if you are rendering a blog using React.
Read this article by Michal Janaszek; it describes well how to combine service workers with server-side rendering.
Next page
At some point, the user working with your application will need to go to the next page. When the first page is open, you are in control of everything that happens in the browser, so you can prepare for the next interaction.
Prefetching Resources
Prefetching the code required to display the next page can help avoid delays in custom navigation. Use tags
prefetch link
or webpackPrefetch
for dynamic imports:
import(
/* webpackPrefetch: true, webpackChunkName: "todo-list" */ "./TodoList"
)
Consider how much user data you are using and what is the bandwidth, especially when it comes to mobile connection. It is in the mobile version of the site that you can not be zealous with preloading, and also if the data saving mode is activated.
Strategically select the data that your users need most.
Reuse the data that is already loaded
Locally cache the Ajax data in your application to avoid unnecessary requests later. If the user navigates to the list of groups on the Edit Group page, the transition can be made instantaneous by reusing the data already selected earlier.
Please note that this will not work if your object is frequently edited by other users and the data you have uploaded can quickly become out of date. In such cases, try first showing the existing data in read-only mode and in the meantime selecting the updated data.
Conclusion
In this article, we looked at a number of factors that can slow down a page at various points in the loading process. Use tools such as Chrome DevTools , WebPageTest, and Lighthouse to determine which tips are relevant to your application.
In practice, it is rarely possible to carry out comprehensive optimization. Determine what is most important to your users and focus on that.
As I worked on this article, I realized that I share a deeply ingrained belief that multiple queries are bad performance issues. This was true in the past, when each request required a separate connection and browsers only allowed a few connections per domain. But this problem disappeared with the advent of HTTP / 2 and modern browsers.
There are strong arguments for splitting queries. By doing this, you can load strictly necessary resources and make better use of cached content, since you only need to reload the files that have changed.