Phrases That Pay - A Look at the Way Search Engines May Be Reading Your Website Phrase based indexing and retrieval (PaIR)
Internet Search Engines strive to provide the most relevant results for user's queries. In determining what results to present to user, they will have made assessments against many thousands or even millions of criteria. One of the areas looked at is that of semantically similar or related phrases. Phrase-based indexing and information retrieval is an algorithm used to identify the most semantically relevant results from a set of data.
When we enter a query into a Search Engine, we are typically entering more than one word - a phrase consisting of 2, 3, 4 or 5 words. If that phrase was to be interpreted in Boolean context, then the information that we would get back from our query would be largely irrelevant to the intended query. This is because each word would be judged on its own merit and is there would be no attempt at semantically grouping the words in our query in order to match and retrieve relevant information. So for example if we were to enter the words London Eye, we would get a raft of results relating to London and then equivalent raft relating to eyes. The focus of the result set would unlikely be that of the landmark Ferris wheel that is a popular tourist attraction and often referred to as the London Eye.
Phrases for ranking
This is where phrase-based indexing can really help. It works by using phrases to index, rank, search and to define web page content. It uses an algorithm to find groups of words or phrases that are related. Related phrases are ones that are commonly used in discussion on a topic. They are the phrases or expressions that are expected to be used within the content of such a discussion.
In PaIR, not only are distinct phrases identified in content, a subset of information is also obtained. The subset details related phrases, those that are is considered semantically similar to the primary phrases identified initially.
Good and bad phrases
Distinguished phrases identified from content are further refined into good and bad phrases. A good phrase is one that meets certain criteria. These criteria include that the phrase may be found in a minimum number of documents and that the phrase occurs a minimum number of times overall. Another important criterion is whether a phrase occurs an expected number of times. There is an expected frequency built in the algorithm against certain phrases. The absolute values of these frequencies are known only by the Search Engine writers themselves. Phrases are grouped into good and bad. The good phrases are further subdivided into distinct groupings. Distinct groupings include the phrase represented in different formats within a page. This may include for example those within header tags, emboldened text or those used as anchor text for external links.
Frequency matters for SEO
One of the important criteria is of whether a phrases relevant or not is whether it occurs and expected number of times. There is an expected frequency built for certain phrases. So when a user enters a query, the algorithm will identify phrases present within it. It will look at those phrases and identify related phrases. The combination of the semantic groups in the lists is used to build the result sets and to refine and condensed those results to return only the most relevant and appropriate to the query.
Phrases can be enhanced by the algorithm, for example if a user enters United States there will be a weighting system that would suggest United States of America could be relevant in answering this query.
Where a user enters more than one phrase on the query, it will broken down into sub queries with the first query being matched against primary distinct phrases and the second query matching against related phrases. The two of them are combined to produce and weight the result set.
Phrase based footprints
Phrasing can also be used to eliminate duplicates within indexed documentation. Consider a document from which distinct phrases and related phrases have been analysed. The algorithm may take the first five distinct phrases and consider those the most relevant for the document. These would define a footprint of the document so that on subsequent index updates of that same document, this footprint could be compared to determine whether the document has changed.
Phrasing and Google
It is likely that PaIR is being used by Google and the other Search Engines. If you want to rank well it pays to consider phrasing your content in such a way that you include relevant, distinct and similar phrases throughout your documents so that the Search Engines find what they are expecting.
James Evans
http://www.seotranscript.com
Articles,Transcripts and Captioning Services for seo online media
james.evanns@seotranscript.com