Evaluating the performance of your own website is very easy today. One or two clicks and Google or another service spits out results with concrete suggestions for solutions. Wonderful. At least for the first optimization run. But at the latest when it comes to fine tuning, changing hosters or cleaning up WordPress, it becomes important to understand which tools actually measure the load time and how to deal with this data.
A few years ago, a customer contacted us in chat. He's just migrated his website and was comparing the performance of his website and his old host and at Raidboxes. He told us that the migration was not worth it for a performance increase of only 9 points on Google PageSpeed Insights.
In fact, we get requests like this all the time. That's why I took a look at what information tools like Google PageSpeed Insights actually provide for interpretation, and how they measure performance and load time. To be honest, I was a bit surprised by the results. The meaning of the values is usually explained very well and in detail. However, the help pages of the test providers do not go into detail on two points:
- Which tool is suitable for which purpose?
- What data can be interpreted and used?
Tools like Google PageSpeed Insights do not measure the speed of your website
As discussion in another blog post, tests like Google PageSpeed Insights do not measure your site's loading time but rather its optimization potential. They determine how well your site performs against a pre-defined set of performance criteria. In addition, the tests provide instructions on how to optimize performance potential. But there's one thing that these tests explicitly do not do: measure page load time.
On Google it sounds like this:
PageSpeed Insights measures ways to increase the performance of a website in the following areas:
- Time taken to load content visible without scrolling: The time between requesting a new page and the browser rendering the content without scrolling.
- The time it takes to fully load the page: Time from requesting a new page to the browser fully rendering the page.
So you can see, Google doesn't measure the speed, but the "possibilities to increase the performance". A crucial difference. And it also means that you can't tell from the results how fast the page or the area visible without scrolling actually loads.
Performance tools like PageSpeed Insights show where you can quickly gain a lot of performance.
But this is not a problem, because the tools still provide valuable optimization data, even if they do not measure load time. For important optimization steps, such as the use of caching or image compression, the results of such tests have the greatest added value.
However, as soon as it comes to load time optimization of an already optimized website, these tests can only provide limited insights. In such a case, you need to perform a real performance measurement. This is especially true when you change hosting providers. Because no matter how good the web server itself may be, if the website is full of construction sites, even a change of infrastructure will bring relatively little.
For such a "real" performance measurement, you can use the following tools, for example:
With one of these tests, the customer would have been able to tell in a comparison exactly where his site had which performance gains after the switch.
And that brings me to the second point of this post: Especially tools like PageSpeed Insights tempt you to use values for a comparison that are only suitable to a limited extent or not at all. Because when you work with point scores or grading systems, you quickly get into a situation that I call the school grade dilemma in this article.
The school grades dilemma: grades are not suitable for comparisons
Tools like Google PageSpeed Insights, or Yahoo's YSlow output two types of data:
- a score for the performance of the website
- concrete advice on how to improve this grade
The scores are on a scale from 0 to 100, with 100 being the best result. So far so clear. And intuitively accessible. Especially because the ratings are supported by a traffic light system.
But when it comes to comparing two sides on the basis of these ratings, interpreting the measurement results is no longer so easy. In fact, it is incredibly difficult, if not impossible. Because everyone can see that the site with the 90 rating is better than the one with the 80 rating. But the following statement can no longer be made: By what factor is the site with the 90 rating better than the other one?
And this describes the problem at its core: Grading systems simply do not allow such statements. You know this from your school days: the person sitting next to you got a C, but you got a B yourself. Even if only one or two points separate you: The result is fundamentally different. And without knowing the grade key of the paper, it is impossible to say how close the result was.
The reason for this limited significance is the so-called scale level of the measurement data. However, I do not want to go into this in detail here. For more details on scale levels and the permissible arithmetic operations, just take a look at Wikipedia.
Back to our example from the beginning: no one is able to say exactly by which factor the old and the new site differ. Such a statement is only possible with a real speed measurement.
Time measurements provide the best loading time data
The most valuable data for comparisons, the preparation of optimization measures etc. are in any case time measurements. Because these have a zero point to which one can orientate oneself. Thus, tools that measure the loading time allow all kinds of statements and comparisons.
So if you measure a page load time of 2.712 seconds before an optimization measure and a value of 2.133 seconds after the conversion, you can make the following statements based on this data:
- The site is 21 per cent faster after the changeover than before the changeover
- The optimised aspect is responsible for more than one fifth of the page performance. (one of the most important pieces of information ever!)
- All further optimization measures can be set in relation to this value. Thus, an optimization that would bring 9 per cent more speed, but would mean disproportionately more effort, can be prioritised differently than a measure that saves correspondingly more loading time.
If the client from the example case had measured from the beginning with a tool like webpagetest.org, he would have seen that the performance of his site more than doubled in the relevant areas.
Conclusion: Knowledge about the type and quality of measurement data is only the beginning
So, for a meaningful comparison of two or more websites, at least the following two conditions must be met:
- The tool used must measure the right things - the relevant parts of the website. When changing hosters, for example, you should not rely exclusively on a test that primarily looks at onpage factors.
- The data used must allow a meaningful comparison. Normally, one would like to know by which factor an optimization has brought one's own website forward. Only with this information can you make a forecast about the improvement of the conversion rate, for example.
Admittedly: Knowing the right data is just the beginning. Of course, you also need to know how to properly test page performance and read the data sets. That's why we deal with these two topics in detail in other blogposts.
However, understanding the data and the permissible conclusions that can be drawn from it is the basis for all further optimization steps. And it helps to take the right and most sensible optimization measures.
"*" indicates required fields