| Why Is Google Not Indexing Pages Although They Are In Sitemap.xml? |
| Tue, 28 Jul 2009 by Kerry Dye This is something that often gets asked. It is more of a problem for larger sites than smaller ones. Small sites of a dozen pages rarely have any issues with all of their pages being found. However, if your website has several hundred, or several thousand pages, you might find that the number of indexed URLs is less than the number of pages in your Sitemap. Google themselves say that they don’t guarantee that they will index all the web pages that you submit. Here are some of the top reasons why your XML Sitemap numbers might be different (1) Image URLs in Sitemap – Google doesn’t index images from Sitemaps (they say “we don’t index images directly (instead, we index the page that contains the image). As a result, direct image URLs in your Sitemap won’t be indexed”.) (2) There are duplicate URLs in your Sitemap. This shouldn’t happen with a good XML Sitemap generator, but it is always something that you should check for. (3) The data is out of date – Google describe the numbers as a “close approximation” which might not be 100% accurate. They talk about the fact that their systems are ever changing and that there might be a lag between calculation and publication. (4) You have pages that are undervalued by Google. This is the old Supplemental Index problem again. Undervalued pages are not visited often, and may not be indexed at all. There are many reasons for undervalued pages, and you need to raise the authority of the site/pages in order to get them indexed. (5) You have pages that are orphaned – that only appear in the XML Sitemap and not elsewhere in the site. This often causes pages to be undervalued, because the value of a page is at least partly defined by the number of links to a page. Although theoretically, XML Sitemap content does not have to be accessible by crawling (Google says it is a good way to provide pages accessible by Ajax that can’t be crawled by Googlebot) this can still be a barrier to being indexed. (6) You have a crawling problem on your website. This may cause orphaned pages as above, or crawl problems like spider traps may be stopping Googlebot getting to the important parts of your site, which might cause pages to remain un-crawled that should otherwise be. To check for this, the best place to look at in Webmaster Tools is the Crawl Stats page – if there are huge peaks lasting just one day, or a number of pages listed as a “high” that very exceeds (7) You may have pages being hit by a duplicate content filter. If Google sees pages as too similar to each other, it may not index all the different variations of the page. This can apply to database driven (cookie cutter) pages as well as complete word for word duplications. |
- March 2010
- February 2010
- January 2010
- December 2009
- November 2009
- October 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- February 2007



