< Back to Blog

Latent Semantic Indexing: LSI
Thu, 4 Oct 2007 13:22:33 by Joe Bursell

Latent Semantic Indexing (LSI) is a method for extracting related and pertinent data from a vast resource. Literally it means:
Latent: a characteristic that is present in an undeveloped or hidden form
Semantic: relating to the meaning of words
Indexing: a way to highlight a particular trend or condition

Therefore LSI is the discovery of trends in the way words are used, and putting those trends to good use in a search context.

Historically LSI has been applied with varying success to structured databases, and currently it forms part of the search process for engines such as Google.

‘Back in the day' human powered searches were the norm, people collecting and organizing websites that were submitted to them, or that they found. They would add their finds to a directory, and that would be the resource you used to find information. Yahoo! began as a solely human powered search facility- the Open Directory Project still operate in this fashion, with many search engines using their directory to power their searches. (There are issues associated with this approach, for more information click here).

LSI attempts to fix the main problem with human directory enabled searches, that they are heavily reliant on keywords. For example:

  • You write a piece on office politics
  • You use, say, a pirate ship analogy
  • That analogy contains the words "captain" or "keel haul"
  • Those words are used more frequently than "office politics"
  • A directory editor will see an "office politics" piece that is populated with pirate-related words, so will exclude it from the search results for the phrase "office politics"

LSI would be able to decipher that while the piece discusses pirates, it would also know that office politics were important to its overall meaning, so would serve it up in a search for "office politics", and also for "pirates".

Overall it appears that as search engines are tweaked and evolve, developers are beginning to improve the algorithmic components that can mimic, in some small way, our ability to understand content and words in their proper context.

The way LSI is progressing it looks like plain-speaking, well-written content will be the order of the day- which is a good thing. When it comes to writing content the best approach might be simply to write with passion and focus, from a position of knowledge- writing for people rather than search engines.

Joe Bursell
Campaign Delivery Manager


Subscribe

Archives

Related Blogs
Microsoft to Splash Out $100million on Semantic Search
Mon, 30 Jun 2008 11:13:33 by Kerry Dye
Latent Semantic Indexing: LSI
Thu, 4 Oct 2007 13:22:33 by Joe Bursell
Latent Semantic Indexing (LSI)- Friend or Foe?
Wed, 21 Mar 2007 03:36:26 by Jayson Munday