< Back to Blog

Google Crawling Through Forms and Drop Downs: SEO Effects
Mon, 15 Sep 2008 14:09:14 by Kerry Dye

I’ve increasingly noticed search results with weird search terms appearing in the indexed results for some of my clients. Historically, we have never automatically prevented search results from being indexed, because it is usually only accessible by a form submission. However, it is something that is routinely monitored by looking at what pages of the site are indexed.

We’ve known for a while that (a) Google doesn’t like indexing search results and (b) it often adds a lot of pages with duplicate page titles and meta tags because they are all generated from one template page. So the appearance of these results was puzzling – was someone deliberately adding these pages to Google?

A trawl through the blogosphere brought up a much more likely explanation than some sort of SEO sabotage – Google has started using this as a method to discover new pages.

According to Matt Cutts (back on Apr 11 2008) Now Google is finding ways to crawl through forms and drop-down boxes. We only do this for a small number of high-quality sites right now, and we’re very cautious and careful to do the crawling politely and abide by robots.txt. If you’d prefer that Google not crawl urls like this, you can use robots.txt to block the urls that would be discovered by crawling through a form.

Nice that our sites are considered “high-quality” but adding potentially hundreds of search results pages isn’t always helpful. Not in the cases I’ve looked at, although in the case Matt was talking about in the post about it probably would have done.

The Google Webmaster Central blog post is much more explicit in saying “If we ascertain that the web page resulting from our query is valid, interesting, and includes content not in our index, we may include it in our index much as we would include any other web page” and “This change also does not affect the crawling, ranking, or selection of other web pages in any significant way.”

So they are saying that it doesn’t affect how a site is ranked, which is amazing. Back in the days of the supplemental index before they integrated them in the main index, results from internal search inevitably ended up there. Although Google’s algorithm has changed considerably, it is unlikely to have chosen to rank these samey-samey pages higher than back then.

So I’ll still be choosing not to let these results be indexed in the majority of cases because they don’t provide useful landing pages for visitors, and they generally show a snippet of text from pages that are already indexed. Generally, robots.txt is a good fail safe method, but another tactic would be to use the “noindex, follow” version of the robots meta tag, which would let Google read the page and discover any new links, but wouldn’t index the results page itself.

Kerry Dye
Campaign Delivery Manager


Subscribe

Archives

Related Blogs
Search based Keyword Tool
Fri, 28 Nov 2008 15:36:39 by Joe Bursell
Find related search terms with the tilde (~)
Fri, 28 Nov 2008 14:57:30 by Joe Bursell
Moving Servers or Expecting Downtime? Protecting Your SEO Work
Fri, 28 Nov 2008 14:12:25 by Pete Handley
Measuring the Success of an SEO Campaign: Conversions
Wed, 26 Nov 2008 11:53:45 by Kerry Dye
The Importance of SEO in the current economy
Mon, 24 Nov 2008 14:48:33 by Emily Mace
Google's New Personalised Search Update - SearchWiki
Fri, 21 Nov 2008 15:53:29 by Pete Handley
What country specific content?
Wed, 19 Nov 2008 17:05:07 by Joe Bursell