Mon, 17 Nov 2008 11:35:55 by Joe Bursell
Back in June 2005 Google introduced the sitemap protocol to enable webmasters to keep Google informed about new and updated web pages and increase site indexing in Google.
Three and half years later the sitemap.xml has become almost ubiquitous, and justifiably so. Although not an SEO's silver-bullet solution for indexing and crawling it is a powerful tool if implemented properly. There are some foibles that still cause errors in Google Webmaster Tools so here's an overview of common problems, and how to fix them.
Possibly the most irritating problem (irritating as it's easy to overlook and/or misinterpret) is your robots.txt blocking URLs from the sitemap. Google may review your sitemap and report that some URLs are blocked by the robots.txt "disallow: /xyz" operator. This is more a comment than an error- if you want blocked pages indexed then allow them in robots. If not then simply review them to make sure you've got it right, and ignore the Google warning.
This next one can be a little tricky to unravel, but its worth it as fixing any 301/302 redirect errors may unlock a heap of latent sitemap value. The bottom line is that your generated XML sitemap can't contain URLs that redirect. If you have a page that has been moved, and that move was handled with a 301 or 302 Google will not recognise the redirected page. Google will only index a page that is "OK", or more correctly "200 OK", and you can't force it to behave differently with your XML sitemap. The "200 OK" is the status of the page expressed in the header. You can check your page header statuses with an HTTP Header Checker. Pages that show anything other than a 200 should be reviewed and dealt with. As an aside here's W3C's list of header statuses.
There are a number of ways to check the validity of your sitemap.xml once its generated, but that may be a step too far and here's why. If you review your robots.txt before generating the XML sitemap you can identify what should and should not be allowed- use this to verify against any sitemap errors. Review your site to check HTTP header statuses, and fix accordingly. Use a robust, trusted sitemap generator. There are many options but I like this one best.
Joe Bursell Campaign Delivery Manager |