TL;DR: Ranking factors studies are detrimental to the legitimacy of our industry, and as search professionals we have a responsibility to properly interpret them for non-SEOs.
Ranking factors studies are detrimental to the legitimacy of our industry, and as search professionals we have a responsibility to properly interpret them for non-SEOs. Share on XRecently SEMrush posted a study on Google’s top ranking factors. The study was not unlike many other studies published each year. It used a statistically significant data set to draw parallels between common metrics and high position (or ranking) in Google.
However, the conclusions they came to and reported to the industry were not entirely correct.
Defining Ranking Factors and Best Practices
It may help first to define “ranking factors”.
Ranking factors are those elements which, when adjusted in connection with a website, will result in a change in position in a search engine (in this case, Google).
“Best practices” are different. Best practices are tactics which, when implemented, have shown a high correlation to better performance in search results.
XML sitemaps are an excellent example. Creating and uploading an XML sitemap is a best practice. The existence of the sitemap does not lead directly to better rankings. However, providing the sitemap to Google allows them to crawl and understand your site more efficiently.
When Google understands your site better, it can lead to better rankings. But an XML sitemap is not a ranking factor.
The only ranking factors that we know about for sure are the ones that Google specifically mentions. These tend to be esoteric, like “high authority” or “good content” or even “awesomeness”. Google generally doesn’t provide specific ranking factors because any time that they do, webmasters go overboard. Remember the link wars of 2011-2014? They have learned their lesson.
For more on ranking factors and correlation vs. causation, check out this definition by Searchmetrics.
The only ranking factors that we know about for sure are the ones that Google specifically mentions. These tend to be esoteric, like “high authority” or “good content” or even “awesomeness”. Share on XUnderstanding Correlation vs. Causation
The discussion of correlation vs. causation is not new. Dave Davies wrote a great post on this back in 2013 which still rings true.
Here’s another way to think of correlation vs. causation:
A large percentage of high-ranking websites probably have XML sitemaps. This is a correlation. The XML sitemap did not cause the site to obtain a high ranking.
This would be like saying if you eat sour cream, you will get into a motorcycle accident based on the correlation shown below.
Click here for more examples of strange correlations.
Let’s take another example.
High amounts of direct traffic were shown to have strong correlation with better ranking in the SEMrush study. This was a very controversial statement, because it was presented as “direct traffic is the number one ranking factor.”
While the data is likely accurate, what does it really mean?
Let’s start by defining Direct Traffic. This is traffic that came to a website URL with no referrer header (i.e. the visitor didn’t come to the site via email, search, or links from another site). Thus, it includes any traffic for which Google Analytics (or the platform in question) cannot determine a referrer. Direct Traffic is basically the bucket for “we don’t know where it came from.” Sessions are misattributed to Direct Traffic all the time, and some studies have shown that as much as 60% of direct traffic could actually be organic traffic. In other words, it’s not a very reliable metric.
Direct Traffic in Google Analytics is basically the bucket for “we don’t know where it came from.” Share on XLet’s assume for a moment Direct Traffic is a reliable metric. If a site has high direct traffic, they are also likely to have a strong brand, high authority, and loyal users. All of these things can help SEO ranking. But the connection is indirect.
There are many other good arguments that have debunked the concept of direct traffic as a ranking factor specifically. Any good SEO should read and understand them.
Moving on from Direct Traffic, Searchmetrics falls victim to this correlation/causation problem as well in their latest Travel Ranking Factors study, where they assert word count and number of images are both ranking factors for the travel industry. Google has directly debunked the word count assertion here and the number of images claim is so silly I had to ask John Mueller about it directly for this article:
If you read between the lines, you can tell John says the use of a certain number of images as a ranking factor is foolish, and it can vary widely.
It is much more likely that a fuller treatment of the keyword in question is the ranking factor rather than strictly “word count” and good quality travel sites are more likely to have lots of images.
If you want even more proof word count is a silly metric for any industry, just check out the top result for “is it Christmas?” (h/t Casey Markee)
This site has been in the #1 spot since at least 2008, and it literally has one word on the entire site. But that one word fully answers the intention of the query.
While Searchmetrics does a nice job of defining ranking factors, their use of that term in relationship to this graph is irresponsible. These should be labeled “correlations” or similar, not “ranking factors.”
This is the crux of the matter. Studies using statistically significant, correlation, or even machine learning like the Random Forest model (what SEMRush used) can be accurate. I have no doubt that the results of all of the studies mentioned were accurate as long as the data that was fed into them was accurate. However, the problem came not in the data itself, but in the interpretation and reporting of that data, namely when they listed these metrics as “ranking factors”.
The problem came not in the data itself, but in the interpretation and reporting of that data, listing metrics as “ranking factors”. Share on XEvaluate the Metrics Used
This raises the need to use common sense to evaluate things that you read. For example, a study may claim that time on site is a ranking factor.
First, you have to question where that data came from, since it’s a site-specific metric that few would know or be able to guess at without website or analytics access. Most of the time, this sort of data comes from third-party plugins or toolbars that record users’ behavior on sites. The problem with this is that the data set will never be as complete as site-specific analytics data.
Second, you have to consider the metric itself. Here’s the problem with metrics like time on site and bounce rate. They’re relative.
After all, some industries (like maps or yellow pages) thrive on a high bounce rate. It means the user got what they needed and went on their way having had a good experience and being likely to return.
For a time on site example, let’s say you want to consult with a divorce attorney. If you’re smart, you use incognito mode (where most/all plugins are disabled) to do this search and the subsequent site visits. Otherwise your partner might see your website history or get targeted ads to them.
Imagine your partner seeing this in the Facebook news feed when he or she thinks your marriage is solid:
Facebook ad example from https://www.easyagentpro.com/blog/real-estate-divorce-ads/
So for an industry like divorce attorneys, time on site data is likely to be either heavily skewed or not readily available.
But Google Owns an Analytics Platform!
Some of you will say that Google has access to this data through Google Analytics, and that’s absolutely true. However there has been no positive correlation ever shown between having an active Google Analytics account and ranking better on Google. Here’s a great article on the SEMPost that goes into more detail on this.
Google Analytics is only installed on 83.3 percent of “websites we know about”, according to W3techs. That’s a lot, but it isn’t every website, even if we do assume this is a representative sample. Google simply could not feed something into their algorithm that is not available in nearly 20% of cases.
Finally, some will make the argument that Chrome can collect direct traffic data. This has the same problem as Google Analytics though, because at last check, Chrome commanded an impressive 54% market share (according to StatCounter). That’s sizeable, but only slightly more than half of all browser traffic is not a reliable enough data source to make a ranking factor.
Doubling Down on Bad Information
Many of you have read this thinking that yes, we know all that. After all, we’re search professionals. We do this every day. We know that a graph that says direct traffic or bounce rate is a ranking factor has to be taken with a grain of salt.
The danger is when this information gets shared outside of our industry. We all have a responsibility to use our powers for good; we need to educate the world around us about SEO, not perpetuate stereotypes and myths.
We all have a responsibility to use our powers for good; we need to educate the world around us about SEO, not perpetuate stereotypes and myths. Share on XI’m going to pick on Larry Kim for a minute here, who I think is a great guy and a very smart marketer. He recently posted the SEMrush ranking factor graph on Inc.com along with a well-reasoned article about why he thinks the study has value.
I had the opportunity to catch up with Larry by phone prior to finishing this article, and he impressed upon me that his intention with his post was to investigate the claim of direct traffic as a ranking factor. He felt that if a study showed that direct traffic had a high correlation with good search ranking, there had to be something more there.
I told him that while I don’t agree with everything in his article, I understand his train of thought. What I would like to see more of from everyone in the industry is understanding that outside our microcosm of keywords and SERP click-through rates, SEO is still a “black box” in many people’s minds.
Because SEO is complicated and confusing, and there’s a lot of bad information out there, we need to do everything that we can to clarify charts and studies and statements. The specific problem I have with Larry’s article is that lots of people outside of SEO read Inc. This includes many high-level decision makers who don’t necessarily know the finer points of SEO.
In my opinion, Larry sharing the graph as “ranking factors” and not debunking the obviously false information contained in the graph was not responsible. For example, any CEO looking at that graph could reasonably assume that his/her meta keywords hold some importance to ranking (not a lot based on the position on the graph, but some).
However, no major search engine has used meta keywords for regular SERP rankings (Google News is different) since at least 2009. This is objectively false information.
We have a responsibility as SEO professionals to stop the spread of bad or incomplete information. SEMrush published a study that was objectively valid, but the subjective interpretation of it created problems. Larry Kim republished the subjective interpretation without effectively qualifying it.
Always and Never Do Not Exist in SEO
Last week, I met with a new client. They had been struggling to include five supplemental links in all of their content because at some point, an SEO told them they should ALWAYS link out to at least five sources on every article. Another client had been told they should NEVER link out from their website to anything.
Anyone who knows about SEO knows that either one of these statements is bad advice and patently false information.
We as SEO professionals can help stem the tide of these mythical “revelations” by emphasizing to our clients, our readers, and our colleagues that ALWAYS and NEVER do not exist in SEO because there are simply too many factors to say anything definitively is or is not a ranking factor unless a search engine has specifically stated that it is.
It Happens Every Day
Literally every day something is taken out of context, misattributed, or incorrectly correlated as a causation. Just recently, Google’s Webmaster Trends Analyst John Mueller said this in response to a tweet from Bill Hartzer:
TTFB for those non-SEOs reading is “Time to First Byte”. This refers to how quickly your server responds to the first request for your page.
Google has said on multiple occasions that speed is a ranking factor. What they have not said is exactly how it is measured. So Mueller says TTFB is not a ranking factor. Let’s assume he’s telling the truth and this is fact.
This does not mean you don’t have to worry about speed, or that you don’t have to be concerned with how quickly your server responds. In fact, he qualifies it in his tweet – it’s a “good proxy” and don’t “blindly focus” on it. There are myriad other ranking factors that could be negatively impacted by your TTFB. Your user experience may be poor if your TTFB is slow. Your site may not earn high mobile usability scores if your TTFB is slow.
Be very careful how you interpret information. Never take it at face value.
Mueller said TTFB is not a ranking factor. Now I know that is fact and I can point to his tweet when necessary. But I will not stop including TTFB in my audits; I will not stop encouraging clients to get this as low as possible. This statement changes nothing about how SEO professionals will do their jobs, and only serves to confuse the larger marketing community.
It is our responsibility to separate SEO fact from fiction; to interpret statements from Google as carefully as possible, and to generally dispel the myth that there is anything you ALWAYS or NEVER do in SEO.
ALWAYS and NEVER do not exist in SEO. Share on XGoogle uses over 200 ranking factors, or so they say. Chasing these mystical metrics is hard to resist – after all as SEOs, we are data driven – sometimes to a fault.
When you interpret ranking factor studies, use a critical eye. How was the data collected, processed and correlated? If the third party is making a claim that something is a ranking factor, does it make sense that Google would use it?
And finally, does learning that x or y is or is not a ranking factor change anything about the recommendations you will make to your client or boss? The answer to that last one is almost always “no.” Too much depends on other factors, and knowing something is or is not a ranking factor is generally not actionable.
Does learning that something is or is not a ranking factor change anything about the recommendations you will make to your client or boss? Probably not. Share on XThere’s no ALWAYS or NEVER in SEO and if we want SEO to continue to grow as a discipline, we need to get serious about explaining that. It’s time to take the responsibility we have to the outside world more seriously.
Searchmetrics and SEMRush were asked for comment, but did not respond prior to press time.
correlations are important but we can not deny that there are some ranking factors.
if we talk about the RankBrain it focuses on some factors
1. Time spent on webpages and if we user love the content he will stay on site and content is also a important factor to rank any website.
2. CTR how many people click on the link
if google see these factors it will give you a boost.
Precisely what I mean @sameer. The two “ranking factors” you have listed are in fact not ranking factors according to Google. While experience and observation may suggest a different outcome, and some people don’t like to trust Google (I choose to, but many do not), there are specific posts from Google employees denying each of the two items you mentioned are ranking factors. To state that they are ranking factors without effectively qualifying (Google says they aren’t, but I don’t trust them) is irresponsible.
Hi Jenny, interesting read.
We really did shake up the industry with our study, didn’t we? 🙂 So many debates. And SEOs argue about algorithms, data, conclusions – on literally anything. What they all have in common – millions of opinions. Everyone is backing their opinions up with their own experience, which is unique, but very limited. The experience of one marketer truly differs from what the other one has tested, and if you chat with SEOs around the world, you’ll discover various techniques work that might not work in US for a few years already.
I strongly disagree about the harm statement. Studies are not about conclusions that are comfortable, studies are about uncovering trends that are beyond one’s experience. And often studies reveal a breakthrough that majority won’t accept immediately. History knows many examples. And they all prove it’s worth to keep digging.
What SEO industry has on a very large scale – is guesswork. Even in your post you mention “if you read between the lines” about John’s tweet. I personally don’t see anything between the lines there. You do. And someone else would interpret it completely differently. With studies based on data it’s more straightforward and less guesswork, than with John’s tweets, but still could be flawed.
When we saw the results of the study, we were surprised ourselves. But we will always stick to what’s true, not what is socially acceptable. You’d be surprised to see how many positive feedback with the word “yes, I see the same thing” we got. A lot of thought leaders supported us, from various markets across the world.
So here we are, asking ourselves: have the results resonated because they were not true and we just have to keep on digging, or has the study really uncovered the truth and a lot of people are too conservative and scared to accept it?
Hi Olga, thanks for taking the time to comment. As you know, I think of both you and SEMRush very highly in general. My concern, and one many of my colleagues share, is not that the studies themselves exist (they are very important and often quite instructive) but quite simply, the use of the phrase “ranking factors”. I think @alanbleiweiss:disqus hits the nail on the head when he says that calling these studies ranking factors alienates the very industry the tool providers depend on for existence. Perhaps I didn’t explain it well enough in my post, but when people outside of SEO see that direct traffic for example is considered a ranking factor, SEOs everywhere have their phones and emails blow up with CEOs who want to know what we can do to get more direct traffic, since Inc or Fortune or some other well meaning but non SEO source republished the story. Since that’s not a logical or sustainable strategy for better SEO positioning, we spend a lot of time re-educating the client that could have been spent implementing sustainable strategies that would help them gain more revenue.
If we could simply apply the scientific method and nomenclature to the studies (“for sites that ranked in the top 10, our study found that 97.4% of them also had high direct traffic” for example), then I think the problem would not exist, and we in the SEO space could say “hey, SEMRush saw a really interesting thing in this study; let’s talk about it”, instead of spending all of our energy on trying to fact-check and add caveats.
Just to add to my comment below… I welcome the opportunity to talk about this with SEMRush or indeed, any other person or organization that does these studies. The “problem” as I see it is largely in semantics, and therefore is probably something easily solvable with some open communication. You have my email if this interests you @olga_andrienko:disqus. 🙂
We can discern some ranking factors through experimentation, but not through correlation studies as you rightfully point out. I hope our industry can now refocus our efforts on experimentation or better statistical models that are more useful to the industry. Unfortunately, it tends to be the case in science that the more rigorous and controlled the study, the less broadly applicable its results.
Totally agreed.
We loved your comment on the Moz blog about what ranking factors studies should develop in. Correlation is off the list for us, and we used another algo initially, but we might play it all around and take the path you have described. This was one of the options we’ve discussed internally at the end of last year.
Really good.
Jenny,
Thank you for writing this post. The fact that some “ranking factor” publishers turn to arrogant defiance when challenged by people who do the work, while those publishers “gather raw data” and don’t do the actual work on the scale, or to the extent those in the field do, is, at its base, beyond disrespectful.
The fact that unsuspecting newcomers to the industry, or those who have some experience, yet need advanced guidance, read such garbage and where that ends up influencing their understanding, is insulting to say the least.
With an attitude of “we disagree with your opinion of the data” or “you’re free to post your view in the comments”, it’s infuriating and alienates the very industry such brands rely upon for their revenue.