The school grade dilemma: A typical problem in performance comparison

Jan Hornung Last updated 21.01.2020
5 Min.
Measure charging time

Evaluating the performance of your own website is very easy today. One or two clicks and Google or another service spits out results with concrete suggestions for solutions. Wonderful. At least for the first optimization run. But at the latest when fine-tuning, changing hosters or cleaning out WordPress , it becomes important to understand which tools actually measure the loading time and how you can deal with this data.

Recently, a customer wrote to us via the support chat. He had just moved and was comparing the performance of his site at the old host with his site at RAIDBOXES. He told us that the migration was not really worth it for a performance increase of only 9 points at Google PageSpeed Insights .

In fact, we get such requests again and again. That's why I took a look at what information tools like Google PageSpeed Insights actually provide for the interpretation and how they measure the performance or the loading time. To be honest, the result surprised me a bit. Because: the meaning of the values is usually explained very well and in detail. However, the help pages of the test providers do not go into detail on two points:

  • Which tool is suitable for which purpose?
  • What data can be interpreted and used and how?

Tools like Google PageSpeed Insights don't measure the speed of your site

It was already a topic in a previous blog post: tests such as Google PageSpeed Insights do not measure the loading time of your site , but its optimization potential. They determine how well your site meets a predefined set of performance-relevant criteria. In addition, the tests provide instructions for optimising the performance potential. However, there is one thing that such tests explicitly do not do: measure the load time.

Google makes it sound like this:

PageSpeed Insights measures ways to increase the performance of a site in the following ways:

  • Time required to load the content visible without scrolling: Time taken from a user requesting a new site to the browser rendering the content visible without scrolling.
  • Time required to fully load the site : Time taken from a user requesting a new site to the browser fully rendering the site .

You see, Google doesn't measure speed, it measures "ways to increase performance". A crucial difference. And that also means that you can't read out from the results how fast the site or the area visible without scrolling actually loads.

Performance tools like PageSpeed Insights show you where you can quickly gain a lot of performance.

However, this is also not a problem, because the tools still provide valuable data for optimization, even if they do not measure the load time. The statements of such tests have the greatest added value for large optimization steps, such as the use of caching or image compression.

Even if the rating with points and colors looks good, there is one thing Google PageSpeed Insights does not do: measure the loading time.
Excerpt from a Google PageSpeed Insights test. From a score of 85 points, by the way, there would be a green colored mark. One thing the test does not do: systematically measure the loading time.

However, as soon as it comes to the load time optimization of an already optimized site , these tests can only provide limited insights. In such a case, you need to perform a real performance measurement. This is especially true when changing hosting providers. Because the web server itself can be as good as it is, if the site is full of construction sites, even a change of infrastructure brings relatively little.

For such a "real" performance measurement you can use e.g. the following tools:

With one of these tests, the customer would have been able to compare exactly which parts of his site had which performance gains after the change.

And that brings me to the second point of this post: Especially tools like PageSpeed Insights tempt to use values for a comparison that are only suitable to a limited extent or not at all. Because when you work with scores or grading systems, you quickly get into a situation that I call the school grade dilemma in this article.

The school grades dilemma: grades are not suitable for comparisons

Tools like Google PageSpeed Insights, or Yahoo's YSlow output two types of data:

  • a mark for page performance
  • specific advice on how to improve this grade

The scores are on a scale from 0 to 100, with 100 being the best score. So far so clear. And intuitively accessible to every user. Especially because the ratings are supported by a traffic light system.

But when it comes to comparing two sites based on these ratings, interpreting the measurement results is no longer so easy. In fact, it is incredibly difficult, if not impossible. Because everyone can see that the site with the 90 rating is better than the one with the 80 rating. But the following statement can already no longer be made: By what factor is the site with the 90 rating better than the other?

And this describes the problem at its core: Grading systems simply do not allow such statements. You know this from your schooldays: the person sitting next to you got a C, but you yourself got a B. Even if only one or two points separate you: The result is fundamentally different. And without knowing the grade key of the paper, it's impossible to say how close the result was.

The reason for this limited significance is the so-called scale level of the measurement data. However, I do not want to go into this in more detail here. For more details on scale levels and the permissible arithmetic operations, a look at Wikipedia is sufficient.

Back to our example from the beginning: The customer - and also no other person - is able to say exactly by which factor the old and the new site differ. Only with a real speed measurement is such a statement possible.

Ebook: Measure the performance of your site like a pro

Timing measurements provide the best load time data

The most valuable data for comparisons, the preparation of optimization measures, etc. are in any case time measurements. Because these have a zero point to which you can orientate yourself. Thus, tools that measure the load time allow all kinds of statements and comparisons.

So if you measure a page load time of 2.712 seconds before an optimization measure and a value of 2.133 seconds after the conversion, you can make the following statements based on this data:

  • The site is 21 percent faster after the changeover than before the changeover
  • The optimized aspect is responsible for more than a fifth of page performance. (one of the most important info ever!)
  • All further optimization measures can be set in relation to this value. Thus, an optimization that would bring 9 percent more speed, but means disproportionately more effort, can be prioritized differently than a measure that saves correspondingly more loading time.

If the customer from the example case had measured from the beginning with a tool like webpagetest.org, he would have seen that the performance of his site more than doubled in the relevant areas.

Conclusion: Knowledge about the type and quality of measurement data is only the beginning

Thus, for a meaningful comparison of two or more sites , at least the following two conditions must be met:

  • The tool used must measure the right things - i.e. the relevant parts of site . When changing hosters, for example, you should not rely exclusively on a test that primarily looks at onpage factors.
  • The data used must allow a meaningful comparison. Normally, you want to know by which factor an optimization has brought your own site forward. Only with this information can you, for example, make a forecast about the improvement of the conversion rate.

Granted: Knowing the right data is just the beginning. Of course, you also need to know how to properly test the page performance and read the data sets. That's why we'll be looking at these two topics in detail in upcoming blogposts.

However, understanding the data and the permissible conclusions that can be drawn from it is the basis for all further optimization steps. And it helps to take the right and most sensible optimization measures.

RAIDBOXER from the beginning and Head of Support. At Bar- and WordCamps he loves to talk about PageSpeed and website performance. The best way to bribe him is with an espresso - or Bavarian pretzel.

Related articles

Comments on this article

Post a comment

Your email address will not be published. Required fields are marked with *.