Why Is Google Not Indexing Pages Although They Are In Sitemap.xml?
28th July 2009 by Kerry Dye
This is something that often gets asked. It is more of a problem for larger sites than smaller ones. Small sites of a dozen pages rarely have any issues with all of their pages being found.
However, if your website has several hundred, or several thousand pages, you might find that the number of indexed URLs is less than the number of pages in your Sitemap. Google themselves say that they don’t guarantee that they will index all the web pages that you submit.
Here are some of the top reasons why your XML Sitemap numbers might be different
(1) Image URLs in Sitemap – Google doesn’t index images from Sitemaps (they say “we don’t index images directly (instead, we index the page that contains the image). As a result, direct image URLs in your Sitemap won’t be indexed”.)
(2) There are duplicate URLs in your Sitemap. This shouldn’t happen with a good XML Sitemap generator, but it is always something that you should check for.
(3) The data is out of date – Google describe the numbers as a “close approximation” which might not be 100% accurate. They talk about the fact that their systems are ever changing and that there might be a lag between calculation and publication.
(4) You have pages that are undervalued by Google. This is the old Supplemental Index problem again. Undervalued pages are not visited often, and may not be indexed at all. There are many reasons for undervalued pages, and you need to raise the authority of the site/pages in order to get them indexed.
(5) You have pages that are orphaned – that only appear in the XML Sitemap and not elsewhere in the site. This often causes pages to be undervalued, because the value of a page is at least partly defined by the number of links to a page. Although theoretically, XML Sitemap content does not have to be accessible by crawling (Google says it is a good way to provide pages accessible by Ajax that can’t be crawled by Googlebot) this can still be a barrier to being indexed.
(6) You have a crawling problem on your website. This may cause orphaned pages as above, or crawl problems like spider traps may be stopping Googlebot getting to the important parts of your site, which might cause pages to remain un-crawled that should otherwise be. To check for this, the best place to look at in Webmaster Tools is the Crawl Stats page – if there are huge peaks lasting just one day, or a number of pages listed as a “high” that very exceeds
(7) You may have pages being hit by a duplicate content filter. If Google sees pages as too similar to each other, it may not index all the different variations of the page. This can apply to database driven (cookie cutter) pages as well as complete word for word duplications.
Related Posts
- 6 Reasons Generating an XML Sitemap Can Help Your SEO
- Sitemap.xml can get pages Indexed, but not Ranking
- Google Webmaster Tools and Sitemap.xml files
- Video Google Sitemaps, is this the start of Google indexing video content from normal sites?
- Google Launch new Sitemap Generator
- XML Sitemaps – do they really get pages indexed?
- Google Sitemaps – Don’t Need to be Verified
- Best practises for an HTML Sitemap