Understanding (Not Provided) Keyword Data in Two Simple Methods

In the ever-changing landscape of online marketing, there’s one thing we’re absolutely sure of:

Knowledge is power.

From being able to discern our best referral sources, to the obstacles users are encountering on our clients’ sites that keep them from converting, to getting a good hold on what exactly people are looking for on the web, at the core of our online marketing strategy objectives is a need for information.

Luckily, for the past several years Google Analytics and other top providers of web analytics have provided treasure troves of data with pre-built systems that allow us to slice and dice that data to provide actionable insights. We think it’s pretty nifty.

Well, sort of.

There’s a glaring hole in all our web data that continues to grow and compromise our rock-solid insights. One that was just a pesky annoyance when it was first introduced but has steadily grown into an unavoidable obstacle.

(Not Provided) Keyword Data: the online marketer’s and SEO’s constant source of irritation.

At SteadyRain, we’ve been practicing the art of white-hat SEO for the past few years and have continuously evolved our tactics to ensure that as the internet matures, our clients still see success from well-coordinated web development, compelling content and technical SEO. To show our value, we provide our clients with comprehensive reporting that highlights not only Organic Traffic, but Organic Traffic from Non-Branded Keywords, which is the metric that we seek to improve consistently through SEO efforts.

The problem with that metric? A big ol’ chunk of it is hiding in the (Not Provided) keyword data.

After several months of declining visits from Non-Branded queries to multiple clients’ sites that have mirrored the increase in (Not Provided) data, we decided to stop scratching our heads and proactively determine a way to speculate the types of keywords that were falling into the (Not Provided) bucket of keywords.

After researching the topic and gleaning insights from top blog posts like this one by industry leaders like Avinush Kaushik,  we’ve created two simple formulas to apply to our clients’ (Not Provided) data that will allow us to give a more accurate idea of the effectiveness of SEO campaigns in driving organic traffic through Non-Branded keyword phrases.

And since knowledge is power, we thought we’d share the simple methods we’ve developed and get some feedback on what we’ve come up with from others who share our pain.

A Few Notes on Terminology

Before diving into our methodology, it’s important to understand how we classify keyword data and exactly what (not provided) is and means for our clients. In 2011, Google responded to persistent complaints about privacy and how Google search user data was collected and used by introducing secured search for Google users, i.e. search while logged in to a Google account. 

At the time, the majority of the individuals that were searching while logged in were Gmail users. Since then, Google has expanded their product offerings, including Google Drive, Google Plus and more, while making the likelihood that someone is logged in while searching and browsing even greater. The result of Google’s push to have more users signed in to their products has, over time, resulted in more and more Google searches taking place at the secured https://www.google.com, resulting in a greater proportion of (not provided) keyword data. The proportion of (not provided) keywords in any websites’ organic search keyword referrals varies widely, with some sites only losing 20-30% of keyword data to (not provided) terms while other lose between 60 and 70%. 

We consider the keywords that are not being hidden behind the (not provided) secure search data hole to be “known keywords” and typically split them into two categories to analyze SEO success: Branded and Non-Branded.

A Branded keyword is any word or phrase that contains a brand name or brand variation that an individual might type into a search engine to get to your page specifically. For example, at SteadyRain, we consider any variation of our brand name (steady rain, steadyrain, steadyrain St. Louis) as well as the names of our management team to be Branded terms.

Anything that doesn’t meet the requirements listed above is considered a Non-Branded keyword. Non-Branded terms typically include queries like “web development” and “online marketing.” From an SEO perspective, we view these terms as especially valuable because it gives us the opportunity to get in front of potential customers that may not know about SteadyRain specifically but are in need of our services.

To summarize: we bucket all of the keywords from organic search into Branded, Non-Branded and (Not Provided) as the first step of our analysis of keyword data, regardless of which of the following two methods we’re using.

Explaining Not Provided Keyword Data Using “The Control Group Ratio Method” 

The first method is based on the assumption that the (Not Provided) keywords appear with similar ratios of Branded to Non-Branded terms as the known keywords. 

For the purposes of this post, we’re going to use some very basic numbers in the illustrations that make it super easy to understand where we’re coming from. The month-over-month charts are real data adjusted using the formulas you’ll see below. 

To access the report we’re using in Google Analytics, you’ll need to go into your account, set the desired date range, select Traffic Sources, then Sources, then Search and Organic. From this reporting view, you’ll be able to see the keywords and landing pages that sent visits to your site from Organic Search. Now, the first step in applying this method is to take your keyword data and separate it into the three basic buckets we talked about before: Branded, Non-Branded & (Not Provided). 

Not Provided Keywords - Total Organic Keywords

Obtaining this data requires you to create a Branded Organic Traffic advanced segment within your Google Analytics profile. Here’s a screenshot of one we’ve created, and a link to the segment for you to easily add it to your Analytics account - you’ll simply have to edit the RegEx to include your brand name, variations of that brand name, and other terms (such as prominent individuals within your company and your major product names) that drive organic traffic to your site.  

Not Provided Keywords - Branded Organic Traffic Segment

 

Once you’ve established the three buckets of traffic, you’ll calculate the ratio with which the Branded and Non-Branded queries appear within the known keywords and apply the ratio to your (Not Provided) keywords. This will give you an estimate of the number of keywords that fall into the Branded and Non-Branded buckets that were previously classified as Not Provided keywords. Keep in mind the assumption all of this started with: Branded and Non-Branded keywords occur with similar ratios in known and unknown keyword data sets.

 

Not Provided Keywords - Adjusted Keyword Data

 

This calculation allows us to go from month-over-month data like this:

Not Provided Keywords - Organic Keyword Distribution

To this:

Not Provided Keywords - Adjusted Organic Keyword Distribution

Because this method is grounded in speculation and assumptions, we naturally poked a few holes in our own methodology and found that there are a few flaws that this method doesn’t take into account:

  • The proportion of (Not Provided) keywords has consistently grown since Google began securing search and now, it is anywhere from 30% of the keywords for one of our clients to 60% or 70% for others. Using 30% of the known data to infer the breakdown of Branded and Non-Branded queries for 70% of the unknown data isn’t very statistically sound.

  • Some clients, especially those in more technical fields (including SteadyRain), may have more technologically-savvy potential customers who use more Google products (for example, this blog post was written in a Google Doc.) Therefore, their search behavior may differ greatly from those who do not use Google products and don’t search while signed in.

While this is not a perfect system for understanding your lost data, it can help to give you quick insight into whether or not your SEO efforts are effective beyond keyword rankings.

Keeping the flaws in this methodology in the formula in mind, we created a second approach that also helps us glean some insight into the (Not Provided) keyword data for our clients.

Explaining (Not Provided) Keyword Data Using the Landing Page Method

The landing page method is based on two primary assumptions:

  • The majority of keywords driving traffic to the home page are Branded keywords. The majority of keywords driving traffic to service, sustaining & blog pages are Non-Branded keywords.

  • While Google has so rudely robbed us of the organic keyword data, they’ve graciously left behind another source of great insight: Organic Landing Page data. Many SEOs have turned to Landing Page data to glean insights from (Not Provided) terms over the past few years, and this is a method that we’ve consistently been employing as well, by looking at this view of our data: 

 

Not Provided Keywords - Landing Page Data

 

In the interest of creating a numerical representation of this tactic, we created a consistent mathematical method that has been allowing us to continue to infer the nature of the (Not Provided) keyword data.  

 So taking our hypothetical keyword data again: 

 

Not Provided Keywords - Organic Keywords

Not Provided Keywords - Adjusted Keyword Data

 

So again, we’ve gone from month-over-month data that looks like this:

Not Provided Keywords - Organic Keyword Distribution

To month-over-month data that looks like this:

Not Provided Keywords - Adjusted Organic Keyword Distribution

Again, as this is based on speculation and assumptions, we’ve acknowledged that this method is flawed due to the occurrence of the following situations:

  • For some of our clients, especially really well-known brand names, people often searched for their products with a combination of a Branded and Non-Branded terms, like “service offering brand name,” and landed on a service or sustaining page, violating our second assumption. 

  • On that same coin, several of our clients’ homepages rank really well for Non-Branded terms that describe their products, service and expertise. Therefore, Non-Branded terms often send traffic to their home page. Alas, a violation of the first assumption.

One way that we’ve mitigated the flaws of each of the methods has been to use the two methods in tandem, looking at the ratio of Branded and Non-Branded terms that cause visitors from organic search to land on the home page and then applying that ratio to landing page data. We’ve also provided our clients with ranges of potential adjusted Branded to Non-Branded keyword data.

For example, taking the data shown in the month-over-month graphs into consideration, we might estimate that it’s likely the site had between 398 and 468 visits from Branded keywords in January and in the same time frame, between 551 and 613 visits from Non-Branded keywords.

In the end, the insights that are unearthed by going through this process to further understand the (Not Provided) keywords are the most important part of analysis. In applying these formulas to the data, we’re able to get a better idea of which pages within our client’s sites are ranking for various generic and Branded terms, which allows us to provide recommendations for creating meaningful experiences for users on those pages so as to turn as many visitors from organic search into potential customers for our clients. 

Now that we’ve shared our thought process behind two simple and straight-forward ways to estimate the distribution of Branded and Non-Branded keywords contained within (Not Provided) keyword data, we’re curious: how have you evaluated these keywords? What tactics are working for you? Share with us in the comments!

For more information about our online marketing campaigns or reporting, contact one of our online marketing specialists today.