Mon, 17 Dec 2007 13:15:04 by Kerry Dye
I wrote a while ago about ways to exclude
parts of your site from the search engines. In the section about robots.txt removal I
noted that the response was not instant, but I didn't elaborate any more than
that, so I thought I would revisit that subject having observed the response to
some of my robots.txt changes over the last few months.
In Google, the bottom line on removals is
that the page won't get removed until the spider revisits the page. On a small
site that is visited often, this happens really quickly. However on a larger
site with many pages, it may be months before the spider revisits the page.
You might have thought that it would work
in a different way - that if you added a directory to the robots.txt file, then
the first time that robots.txt file was downloaded, Google would go "Aha!" and
remove from its index anything that matched that rule. But that isn't what
happens; it is done on a page by page basis as the spider finds the page, which
is then matched to the rules in the robots.txt.
I have had a lot of SEO success with
removing "low value" pages from Google - pages that are almost-duplicates, but
because of the way that this is implemented, the effects can take different
amounts of time to show themselves. With a small site, the results are pretty
instant - just days can pass before the site races up the rankings with its new
more relevant page selection. In the case of larger sites with tens of
thousands of pages, the result is far more gradual, as each page is revisited
less often, and the removal process is much slower.
Is there a solution? Well, although Google
Webmaster Tools allows you to do removal requests for URLs that you want removed, each one has
to be entered by hand, this is time consuming for more than a handful of pages
(ask my colleague Pete - he removed nearly 300 URLs for a client). However, this is the only quick way to do it (and it still takes
a couple of days to be implemented).
If your removal pages are deeplinks, which
are low down the crawling hierarchy on the site, a possibility for speeding
these up is to provide a site-map like page of those links accessed temporarily
from your home page (which you remove when it has done its job).
The final option is just patience -
something that search engine optimisers are quite good at - eventually the
links will be removed and your site should climb the results and the page
ranking for the remaining pages is improved as a result.
Kerry Dye Campaign Delivery Manager |