Pages

Google Opens Up About Manual WebSpam Removals

n the first of March, Google Product Manager Jake Hubert announced a new interactive infographic by Google called How Search Works. The infographic allows visitors to get insight into what happens between the request of a search query and the retrieval of results, which Google says happens “billions of times a day in the blink of an eye.” The graphic is broken into three sections, which briefly go into crawling and indexing, the algorithm, and Google’s measures against webspam, both automatic and manual.

The piece is a follow-up to one released last year, The Story of Send, which reveals the path of an email around the world. The new graphic is relatively simple and aesthetically pleasing; it would serve as a nice introduction to search but experienced webmasters are unlikely to glean anything new from the piece. The third section, Fighting Spam, however, has some interesting information about Google’s penalization process.

This section details the types of webspam Google most often takes action against, with a chart showing the prevalence of various techniques over time. They explain the process of notifying spammy site owners and receiving reconsideration requests, with timelines for these as well. Perhaps the most interesting feature of the new graphic, however, is a section with live webspam screenshots. When you click “See what we’ve removed lately” you can see a semi-live feed of sites that Google has manually removed from their index.

Usually the feed shows sites removed in the past hour, but only about 46 or so at a time – which is why RustyBrick created a webspam archive. Their archive has, at the time of writing, more than 3,000 screenshots pulled from the infographic. There is also a search to see if one of your own sites was included. (Remember, however, that the search only checks if your site is among the manual webspam examples given by Google, not if your site was automatically penalized or manually removed but not included.)

Also included with the How Search Works infographic is a 43-page guide detailing how Google manually evaluates quality. The Search Quality Rating Guidelines document, however, is actually a much-condensed version of an earlier one never released publicly, but leaked among webmasters. The earlier, 161-page list of guidelines was intended to be a handbook for quality raters to use when evaluating relevance of sites and whether or not they should be manually removed from Google’s search index.
Search Engine Land posted a very long list of all of the changes they found between the most recent leaked document and the one made public by Google with How Search Works. While some of the changes are to be expected (they removed details about how to use the rating interface), others are a little more interesting. They emphasize that quality raters “do not directly impact Google’s search result rankings or ranking algorithms,” and a number of actual URLs used as examples have been removed. Search Engine Land’s Matt McGee gives two possible explanations for the removal of the examples: that Google wants to stay vague about the actual examples and ranking system, or that they did not want to have to constantly maintain this list as the internet is an ever-changing place.

Despite the cutbacks to the document, it is not often that Google opens up like this and makes any element of the ranking process public. Though hopefully you are not creating spammy web content yourself, it can be very useful to have this peek into exactly what the search giant considers spammy or low-quality. This knowledge can take a site that’s on the edge of being potential webspam back into safety.

Though the whole guide is a useful read (and trimmed down to 43 pages it’s unlikely to be a very time-consuming read, either), the conclusion gives a good half-page summary of exactly what characteristics belong to webspam pages and which belong to good pages. In particular, if a rater is unsure of whether to mark the page spam or not, they are told to ask themselves the following questions:

* Does the page provide the user with a good search experience?
* Does the page contain original content that would be helpful to users?
* Do you think the page should be included in a set of search results?
* Is the page designed for users? Is there a human element to the page?
* If you removed the PPC ads and copied text from the page, is there useful content remaining?

At a bare minimum, these are good qualities to keep in mind anytime you build a website.

A post by Adrienne Erin in SPN
Google Opens Up About Manual WebSpam Removals