Google Crawling Through Forms and Drop Downs: SEO Effects
15th September 2008 by Kerry Dye
I’ve increasingly noticed search results with weird search terms appearing in the indexed results for some of my clients. Historically, we have never automatically prevented search results from being indexed, because it is usually only accessible by a form submission. However, it is something that is routinely monitored by looking at what pages of the site are indexed.
We’ve known for a while that (a) Google doesn’t like indexing search results and (b) it often adds a lot of pages with duplicate page titles and meta tags because they are all generated from one template page. So the appearance of these results was puzzling – was someone deliberately adding these pages to Google?
A trawl through the blogosphere brought up a much more likely explanation than some sort of SEO sabotage – Google has started using this as a method to discover new pages.
According to Matt Cutts (back on Apr 11 2008) Now Google is finding ways to crawl through forms and drop-down boxes. We only do this for a small number of high-quality sites right now, and we’re very cautious and careful to do the crawling politely and abide by robots.txt. If you’d prefer that Google not crawl urls like this, you can use robots.txt to block the urls that would be discovered by crawling through a form.
Nice that our sites are considered “high-quality” but adding potentially hundreds of search results pages isn’t always helpful. Not in the cases I’ve looked at, although in the case Matt was talking about in the post about it probably would have done.
The Google Webmaster Central blog post is much more explicit in saying “If we ascertain that the web page resulting from our query is valid, interesting, and includes content not in our index, we may include it in our index much as we would include any other web page” and “This change also does not affect the crawling, ranking, or selection of other web pages in any significant way.”
So they are saying that it doesn’t affect how a site is ranked, which is amazing. Back in the days of the supplemental index before they integrated them in the main index, results from internal search inevitably ended up there. Although Google’s algorithm has changed considerably, it is unlikely to have chosen to rank these samey-samey pages higher than back then.
So I’ll still be choosing not to let these results be indexed in the majority of cases because they don’t provide useful landing pages for visitors, and they generally show a snippet of text from pages that are already indexed. Generally, robots.txt is a good fail safe method, but another tactic would be to use the “noindex, follow” version of the robots meta tag, which would let Google read the page and discover any new links, but wouldn’t index the results page itself.
Related Posts
- Google may be crawling Flash websites- but it’s not good for your SEO Yet!
- Why Is Google Not Indexing Pages Although They Are In Sitemap.xml?
- 3 ways to stop your web pages being indexed
- Google Likes You to Help With Crawl Efficiency
- Video Google Sitemaps, is this the start of Google indexing video content from normal sites?
- Google answers to your canonical tag questions
- Google Webmaster Tools Errors in Sitemaps
- Google Gravity and its effects on New Content such as Blogs