SEARCH MARKETING BLOG

SEO Speak: Crawlable content

We’ve talked in previous blogs about content being crawled by Google and the importance of having content that Google can see. So what is crawlable content?

Crawlable content refers to things on your website that Google can read. There are some things which Google can’t read. Google uses a robot (or spider) called Googlebot to read pages from the internet. There are a number of things that Google can’t read such as the content of frames or iframes.

There are a number of ways of checking to see if the content of your website is readable or not by the search engines.

One method of checking is to use a text based browser such as Lynx to view the site. When using Lynx you can compare how the site looks with how it looks in a normal browser such as Google Chrome. If you can’t see something on your website when using Lynx it’s likely that Google can’t see the content either.

You can also use Google Webmaster Tools to view pages on your website as the Googlebot sees them. Under the Labs section of Google Webmaster Tools there is an option “Fetch as Googlebot” in which you can see the code on a page as the spider from Google will see it. We posted last year that the Googlebot appears to have a 100k file size limit  so using this tool is important to ensuring that Google is able to see all of the content on your pages.

Some key things to avoid are mentioned in the Google Guidelines for webmasters and include, iframes, frames, Session IDs and JavaScript. Google has said that they can now read Flash but it’s still not recommended to create websites using Flash as you will see if you are viewing the site in a text browser.

You also need to make sure that you don’t have a spider trap on your site. A spider trap causes the search engine spiders to get caught in a loop discovering an infinite number of pages which don’t exist on the site. This can stop your site from being indexed correctly as Google will expend all of its energy trying to crawl pages which aren’t there and not pages which do exist.  Check the listings Google has for your website by performing a site: search in Google (e.g. site:yoursite.com) to ensure that the pages Google has found on your site actually exist and aren’t the result of a spider trap

If there are sections of your website which could create crawling issues for Google there are things that you can do to resolve these issues. Ideally tweaking the content on your site to ensure it is crawlable is the best practice here but if there are certain pages which you know could cause an issue which can’t be fixed then consider using your Robots.txt file to resolve these issues by blocking the page in question.

Making sure that you have content which is readable is an important part of the SEO you are carrying out on your website so using Google Webmaster Tools and a text based browser to view your site are good methods of making sure that search engines can see your site as you intend them to.

This entry was posted in SEO Blog and tagged , by Emily Mace. Bookmark the permalink.

About Emily Mace

Emily joined Vertical Leap as an SEO Campaign Delivery Manager in 2008, having gained wide search marketing experience as a web developer, SEO specialist and trainer for local Government departments and Tourism South East. Emily gained Google Analytics Individual Qualification in 2011, and regularly blogs on the technical aspects of SEO, sharing her expertise with our readers.