• SEO Services
  • PPC Management
  • News Content Marketing
« Search Marketing Blog

Why Is Google Not Indexing Pages Although They Are In Sitemap.xml?
Tue, 28 Jul 2009 by Kerry Dye


This is something that often gets asked. It is more of a problem for larger sites than smaller ones. Small sites of a dozen pages rarely have any issues with all of their pages being found.

However, if your website has several hundred, or several thousand pages, you might find that the number of indexed URLs is less than the number of pages in your Sitemap. Google themselves say that they don’t guarantee that they will index all the web pages that you submit.

Here are some of the top reasons why your XML Sitemap numbers might be different

(1) Image URLs in Sitemap – Google doesn’t index images from Sitemaps (they say “we don’t index images directly (instead, we index the page that contains the image). As a result, direct image URLs in your Sitemap won’t be indexed”.)

(2) There are duplicate URLs in your Sitemap. This shouldn’t happen with a good XML Sitemap generator, but it is always something that you should check for.

(3) The data is out of date – Google describe the numbers as a “close approximation” which might not be 100% accurate. They talk about the fact that their systems are ever changing and that there might be a lag between calculation and publication.

(4) You have pages that are undervalued by Google. This is the old Supplemental Index problem again. Undervalued pages are not visited often, and may not be indexed at all. There are many reasons for undervalued pages, and you need to raise the authority of the site/pages in order to get them indexed.

(5) You have pages that are orphaned – that only appear in the XML Sitemap and not elsewhere in the site. This often causes pages to be undervalued, because the value of a page is at least partly defined by the number of links to a page. Although theoretically, XML Sitemap content does not have to be accessible by crawling (Google says it is a good way to provide pages accessible by Ajax that can’t be crawled by Googlebot) this can still be a barrier to being indexed.

(6) You have a crawling problem on your website. This may cause orphaned pages as above, or crawl problems like spider traps may be stopping Googlebot getting to the important parts of your site, which might cause pages to remain un-crawled that should otherwise be. To check for this, the best place to look at in Webmaster Tools is the Crawl Stats page – if there are huge peaks lasting just one day, or a number of pages listed as a “high” that very exceeds

(7) You may have pages being hit by a duplicate content filter. If Google sees pages as too similar to each other, it may not index all the different variations of the page. This can apply to database driven (cookie cutter) pages as well as complete word for word duplications.


The success that you have achieved with the organic search results has allowed us to completely eliminate our reliance on Pay Per Click and save us a fortune.

SweetieBag.com


Blog Feed Subscription
RSS FeedFollow us on Twitter

 Archives