Von der Suchanfrage zur Antwort: Natural-Language-Suche

Inbenta has an unbeatable self-service rate thanks to its Natural-Language-based Semantic Search Engine, which can match unstructured, wrongly-typed, ambiguous, intention-narrowed user search queries with structured, specific, concise FAQ titles so the customers of our customers find the information they want quickly and easily, in an unprecedented way.
 

One of the biggest challenges in Natural Language Processing is how to deal with ambiguity that is a main characteristic of a natural language.
 
For example, consider the phrase “run a test“.  The word “run” frequently refers to the act of running; it’s only because of this particular context that it means something like “perform”.  The difference is obvious to a person familiar with English, but not to a computer.
 
And even if a computer is able to analyze the syntax of a phrase like “run a test“, it still won’t be able to see any difference between that phrase and the phrase “run a race“- but the meanings of these two phrases is totally different!
 
That’s why Inbenta’s Semantic Search takes the analysis further by analyzing the contextual meaning of words, allowing it to choose the best definition of the word “run” even when the syntax and the word itself are the same.
 
This document offers a glimpse of how this process actually works.
 
Let’s imagine you work in a Telecom business, and you are in charge of creating FAQs, a knowledge base, and all the material that you will publish online for your customers to find for themselves.
 
For this particular example, you have an FAQ describing the cost and plans for your clients to call other countries that has the title:
 
“What is the price for international calls?”
 
Everything seems fine, right? But here comes the problem: the probability that your customers will search this FAQ using the same keywords that you have used in the content is remote.
 
Instead, they will type queries like:
 
“how much wll me cost to call to francw”
 
Your users will always use search queries that describe their particular situation, and therefore their words will be different from yours, because yours will try to describe a more general scenario.
 
The first thing Inbenta does is to apply a sophisticated spell correction algorithm.

Spelling

In the example above “wll” is not an English word, and therefore is not part of our extensive English dictionary, so it needs to be corrected.
 
We could actually fix this spelling issue with “will”, “well” or “wall”. However, the most probable correction would be “will” as that one is the one that would give a better sense to the whole sentence
 
There is another spelling issue:  “francw”.  Again, possible corrections could be “francs”, “franc” and “France”, although the most plausible one is of course the name of the European nation. When we say “the most plausible”, that’s easy to decide for humans, but it’s really hard for computers. Inbenta’s huge English dictionary (French, Spanish, and many other dictionaries are also available) contains information on how probable combinations of word would be, based on the specifics of the actual knowledge base as well as general features of the language.
 

Search

 
From this point on, Inbenta will not deal with the original wrongly-typed search query any more, and will instead find an answer for this corrected version:
 
 
“how much will me cost to call to France?”
 
The next step is to resolve any ambiguities in the sentence (and in natural language, there are always ambiguities!)
 

Ambiguity

 
Inbenta uses various tools to accomplish this:  Domain Dictionaries, Syntactic Analysis, Local Grammars and Lexical Functions.Performing the correct syntactic and semantic analysis is crucial to finding relevant answers.
 
Through Syntactic Analysis, Inbenta determines the precise function of every word in this particular question. The result of this analysis is as follows:

Speech

Further analysis of the sentence will depend of many factors including the domain dictionary, disambiguation rules, context, etc.
 
 For example, the word call can be a noun and have various meanings unrelated to phone calls- “urge”, “decision”, “cry” etc.  But as a verb, and in the context of the Telecommunication industry, call unambiguously relates to calling by phone.
 
Next, Inbenta will perform a semantic analysis to determine the relative importance of every word in the sentence. Users often add a lot of information in search requests that is not essential to finding a relevant answer. Additional search words introduced by users, which in the search industry are called noise, can frequently lead to irrelevant results, but our system is able to determine which words are worth taking into account because they carry meaning.  Based on the current knowledge base and Inbenta’s unique semantic-statistic algorithm, Inbenta would assign this particular semantic weight relative to every term in the search query:
 

Percentage

 
As you can see, not every term in the search query has the same importance. In this particular case, “cost” and “France” contain almost 70% of the whole “semantic weight” of the search query.  So we are now close to having a relevant answer for this question.
 
Inbenta possesses an extensive dictionary that contains hundreds of thousands of terms and many kinds of semantic relationships called “Lexical Functions”. Those Lexical Functions can cover easy relations, such as synonyms, but also more complex relations that depend on the context in which the term appears.
 
Thanks to the Dictionary and Lexical Functions, Inbenta is able to recognize phrases that have the same meaning despite using different words used to express that meaning.
And with this ultimate “meaning algorithm” we are able to know how close or far those sentences are from each other in terms of meaning.
 
Among many others, the dictionary contains these two Lexical Functions:
 
  • The term “international” is related to “nation”, and therefore to the names of all the different nations. That tells Inbenta that “France” and “international” are somehow related.

Semantic Tree

  • Also, this particular user asked about “cost”, which is the way users perceive what they pay for a service. However, the author of the FAQ wrote it using the term “price”, which is pretty much the same concept, but from the perspective of the one that is paid, not the one that pays.  Therefore, “cost” and “price” are related.

Paraphrase

 
By applying these two Lexical Functions, Inbenta knows that the user query is related to the concepts of ‘pricing’ and ‘international’, even though the query never actually uses these words.
 
The Lexicon is growing every day through the work of our Computational Linguists, who further develop the extension of semantic relationships and the complexity of these Lexical Functions.
 
In this particular case, our patented “Semantic Coincidence Algorithm” in combination with our extensive Lexicon, allows us to compute that the user query and this particular FAQ have a “Semantic Coincidence Score” of precisely 57.49%.
 
If no other FAQ has a better Semantic Coincidence Score, the one with the title “What is the price for international calls?” will be the one that is shown first in the results page
 
Again, the author of the FAQ only wrote “What is the price for international calls?“;  he didn’t have to enter alternative rephrasing, add tags, edit synonym index files, or review tedious vocabulary alternatives.
 
At Inbenta we believe that knowledge base creators and editors should only worry about WHAT is the content, not HOW the content will be found. Inbenta takes that responsibility, so your knowledge manager will have more time to write MORE and BETTER content.
 
It is also important to notice that our Semantic Search Engine must have an amazing response time, even when thousands of FAQs must be searched and thousands of users are asking different questions at the same precise second.
 
In this example, our search engine took only 0.0033376097679138 seconds to find the right answer to this question, with a base of more than 2500 FAQs to choose from.