ss_blog_claim=9b65969a6722f52f57875a2b7753cd47

How Google Uses Latent Semantic Indexing

Most people who have used the Internet, or even know
what it is know what Google is. But most people don’t
know what exactly it is that Google does.

Or rather what makes Google do what it does. Google
searches are able to be so accurate because of Latent
Semantic Indexing.

Latent Semantic Indexing allows a search engine to
determine what a page is about by searching for one or
more keywords selected by the user.

It adds an important step to the document index
process. LSI records keywords that a document contains
as well as examines the document collection as a
whole.

By placing importance on related words, or words in
similar positions, LSA has a net effect of making the
value of pages lower so they only match specific
terms.

Search engines such as Google try to figure out phrase
relationships when they are processing keyword
queries, which in turn improve the rankings of pages
with related phrases.

This happens even when those pages are not focused on
the target theme. Some pages are too focused on one
phrase and they tend to rank worse than you would
expect them to.

In fact, some are even filtered out for being too over
optimized. Pages that are focused on a wider net of
related keywords tend to have more stable rankings.

Although the LSI algorithm doesn’t understand anything
about what the words mean, the patterns it notices
make the search engine look extremely intelligent.

Latent Semantic Indexing

Understanding latent semantic indexing is quite
complex and usually requires a degree in math in order
to figure out and understand.

There are a few methods that can be used in order to
index and retrieve all the relevant pages of the users
query.

The obvious method of retrieving the relevant pages is
by matching words from a search query to the same text
found within the web pages that are available.

The problem with simple word matching is that they are
extremely inaccurate. This is because there are so
many ways for a user to express the desired concept,
which they are looking for.

This is known as synonymy. This also happens because
many words have multiple meanings. This is known as
polysemy.

With synonymy, the user’s query may now actually match
the text on the relevant pages. They will be
overlooked and the problem of polysymy means the terms
in a user’s query will often match terms in irrelevant
pages.

Latent semantic indexing, or LSI is an attempt to
overcome this problem. By looking at the patterns of
words distributed across the entire web.

Pages are considered that have many words in common
and are thought to be close in or semantically close in
meaning.

Pages that contain a few words in common are
semantically distant. The result is a relatively
accurate and similar value that has calculated for
every content word or phrase.

In response to a query, the LSI database will return
pages it thinks to be correct and relevant to the
query’s search.

The LSI algorithm doesn’t understand anything about
word meanings and does not require an exact match to
return useful web pages.

Ok, Ok, I know. This was probably a very boring article
to read, but it proves the point that even the most
simple things we take for granted with technology, such as
using a search engine, are actually very complex subjects that
require years of study.

Thank you for reading Nathan’s Plain Tech Talk.

If you're new here, you may want to subscribe to my RSS feed. Thanks for visiting!


 Viewed 768 times by 248 viewers

No Comments

Leave a reply

Powered by WP VideoTube