Wed, 17 Oct 2007 12:34:24 by Kerry Dye
Recently
in the Vertical Leap office, controlling "link juice" (using a more generic
term than Google's Page Rank) has been a hot topic recently. Therefore, it
seems to be a good time to write a blog about the different ways this can be
done, when you might want to use them and why.
Method 1:
robots.txt
This is a
file that you place in the root of your site. When a spider/robot comes to
visit your site, this is the first file that they access. Using the information
here, they look at your site, excluding any pages that you don't want them to
visit.
For the
purposes of this blog, I am limiting my mention of robots.txt to SEO uses, but it can also be used for directing other robots, such as
content scrapers. For search engine optimisation, we use robots.txt quite
specifically. It is the best way to prevent pages from appearing in the index.
Putting them in robots.txt with the Disallow command will stop a search engine
from looking at a page and putting in the results for searches. If a page is in
the results that you don't actually want appearing, then adding it to
robots.txt will mean that it is removed, although the reaction isn't instant.
So why
would you want to keep pages out of the search engines? Well, in SEO the
primary reason is duplicate content. Often, unknown to the site owner or
creator they are unwittingly creating duplicate content. Common routes for this
to happen are printable versions of pages or accessible versions of pages. With
exactly the same content, it is difficult for a search engine to work out which
is the most important version and can thus result in the devaluation all of the
versions, which impacts on your website's visibility overall.
The
secondary reason for employment of robots.txt is low value pages. Again, this
is normally something done unwittingly as part of the site build process.
Recent examples I've seen causing this are mostly forms. Whilst from a
usability point of view, making a pre-populated form with a product or a result
of a search is great for the user, for a search engine it creates potentially
hundreds of similar pages with only a single element that differs.
Method 2:
rel="nofollow" on links
This is
where you put this tag into the <a> tag of the relevant link. It is often
used to control spamming e.g. by Wikipedia by discouraging people posting links
just for the sake of gaining link juice. On some sites it is used for all
external links for this reason. From an SEO point of view it can be used to
avoid leakage - reducing the number of off-page links which can lower the
importance rating of a page.
Also from
the point of view of improving search engine rankings this tag can be used for
directing page importance. Matt Cutt's of Google advocates the use of the tag
for this reason. Read Joe's blog for information about using this on internal
links to control link juice and the pros and cons.
The
disadvantage of the nofollow tag compared to robots.txt is that it is
inconsistently applied by the search engines. Whilst it causes Google &
MSN/Live to ignore the link completely, Yahoo does follow the link whilst
discounting the value and Ask ignores the tag entirely as it is unsupported. This has been evaluated several
times experimentally,
and means the nofollow tag is not entirely useful for controlling duplicate
content and low value pages. To absolutely ensure something doesn't list, you
need to use robots.txt. For controlling the value of spam and external links,
then it sort of works; certainly the fact that many would-be spammers believe
that all search engines function the same way as Google and MSN actually means
it does discourage bad linking practices, although it also devalues what could
be valid links when applied systematically.
Method 3:
noindex metatag
Primarily the noindex metatag is used as an alternative to robots.txt, especially where
you do not have access to the root of the site to change the content of robots.txt,
such as a hosted environment.
Whilst this sounds good, unfortunately,
just like nofollow, this appears to be handled inconsistently across the search
engines. Peculiarly, this time it's MSN and Yahoo that show the page, although slightly
differently. Unlike nofollow, which is newsworthy because of the spam issue,
noindex has almost no other blogs focussing on it apart from this relatively
unscientific one from Matt, so there's no saying that this is actually 100%
right with complete confidence. So if you have access to it, robots.txt is the best
method to use.
So there you have my take on what methods
are available. I will follow this blog soon with some examples of where
Vertical Leap have used these to get better results for our clients.
Kerry Dye Campaign Delivery Manager |