n the first of March, Google Product Manager Jake Hubert announced a new interactive infographic by Google called How Search Works.
The infographic allows visitors to get insight into what happens
between the request of a search query and the retrieval of results,
which Google says happens “billions of times a day in the blink of an
eye.” The graphic is broken into three sections, which briefly go into
crawling and indexing, the algorithm, and Google’s measures against
webspam, both automatic and manual.
The piece is a follow-up to one released last year, The Story of Send,
which reveals the path of an email around the world. The new graphic is
relatively simple and aesthetically pleasing; it would serve as a nice
introduction to search but experienced webmasters are unlikely to glean
anything new from the piece. The third section, Fighting Spam, however,
has some interesting information about Google’s penalization process.
This section details the types of webspam Google most often takes
action against, with a chart showing the prevalence of various
techniques over time. They explain the process of notifying spammy site
owners and receiving reconsideration requests, with timelines for these
as well. Perhaps the most interesting
feature of the new graphic, however, is a section with live webspam
screenshots. When you click “See what we’ve removed lately” you can see a
semi-live feed of sites that Google has manually removed from their
index.
Usually the feed shows sites removed in the past hour, but only about 46 or so at a time – which is why RustyBrick created a webspam archive.
Their archive has, at the time of writing, more than 3,000 screenshots
pulled from the infographic. There is also a search to see if one of
your own sites was included. (Remember, however, that the search only
checks if your site is among the manual webspam examples given by
Google, not if your site was automatically penalized or manually removed
but not included.)
Also included with the How Search Works infographic is a 43-page guide detailing how Google manually evaluates quality. The Search Quality Rating Guidelines
document, however, is actually a much-condensed version of an earlier
one never released publicly, but leaked among webmasters. The earlier,
161-page list of guidelines was intended to be a handbook for quality
raters to use when evaluating relevance of sites and whether or not they
should be manually removed from Google’s search index.
Search Engine Land posted a very long list
of all of the changes they found between the most recent leaked
document and the one made public by Google with How Search Works. While
some of the changes are to be expected (they removed details about how
to use the rating interface), others are a little more interesting. They
emphasize that quality raters “do not directly impact Google’s search
result rankings or ranking algorithms,” and a number of actual URLs used
as examples have been removed. Search Engine Land’s Matt McGee gives
two possible explanations for the removal of the examples: that Google
wants to stay vague about the actual examples and ranking system, or
that they did not want to have to constantly maintain this list as the
internet is an ever-changing place.
Despite the cutbacks to the document, it is not often that Google
opens up like this and makes any element of the ranking process public.
Though hopefully you are not creating spammy web content yourself, it
can be very useful to have this peek into exactly what the search giant
considers spammy or low-quality. This knowledge can take a site that’s
on the edge of being potential webspam back into safety.
Though the whole guide is a useful read (and trimmed down to 43 pages
it’s unlikely to be a very time-consuming read, either), the conclusion
gives a good half-page summary of exactly what characteristics belong
to webspam pages and which belong to good pages. In particular, if a
rater is unsure of whether to mark the page spam or not, they are told
to ask themselves the following questions:
* Does the page provide the user with a good search experience?
* Does the page contain original content that would be helpful to users?
* Do you think the page should be included in a set of search results?
* Is the page designed for users? Is there a human element to the page?
* If you removed the PPC ads and copied text from the page, is there useful content remaining?
At a bare minimum, these are good qualities to keep in mind anytime you build a website.
A post by Adrienne Erin in SPN
Google Opens Up About Manual WebSpam Removals