From search query input to search results output: Natural language search

Natural language search is the key beyond the keyword

From search query input to search results output, our semantic search engine, which is rooted in natural language search, leads to self-service rates that blow away the AI industry. Our semantic search engine is so robust, it can match unstructured, incorrectly typed, ambiguous and intention-narrowed user search queries with structured, specific and concise FAQ titles. The result: your customers find the information they want quickly and easily, with a rate of success that’s unprecedented.

 

 

Inbenta natural language processing rises to the challenge

One of the biggest challenges in NLP is how to deal with ambiguity — a main characteristic of a natural language. For example, consider the phrase “run a test”. The word “run” frequently refers to the act of running. It’s only because of this particular context that it means something like “perform”. The difference is obvious to someone familiar with English, but not a computer. And even if a computer is able to analyze the syntax of a phrase like “run a test”, it still won’t be able to see any difference between that phrase and the phrase “run a race”, and yet the meanings of these two phrases is totally different. For us, it’s a piece of cake.

Understanding the meaning behind the words

In the case above, our semantic search engine takes the analysis further by digging into the contextual meaning of words, allowing it to choose the best definition of the word “run”, even when the syntax and the word itself are the same.

Here’s a perfect example of how this process works

Spelling

Let’s imagine you work for a telecom business, in charge of creating FAQs, a knowledge base, and all the material that you’ll publish online for your customers to find the answers and documents for themselves. For this particular example, you have the following FAQ describing the cost and plan for your clients to call other countries:
“What is the price for international calls?”
Everything seems fine, right? Here comes the problem: the probability that your customers will search this FAQ using the same keywords that you have used in the content is remote.
Instead, they will type queries like: “how much wll me cost to call to francw”
Your users will always use search queries that describe their particular situation, and therefore their words will be different from yours because yours will try to describe a more general scenario.

First, we apply a sophisticated spell correction algorithm

In the example above “wll” is not an English word, and therefore not part of our extensive English dictionary, so it needs to be corrected. We could actually fix this spelling issue with “will”, “well” or “wall”. However, the most probable correction would be “will”, as that is the one that would make the whole sentence clear.

A second part to the above example

Search

There’s another spelling issue: “francw”. Again, possible corrections could be “francs”, “franc” and “France”, although the most plausible one is, of course, the name of the European nation. When we say “the most plausible”, that’s easy to decide for humans, but it’s really hard for computers. At Inbenta, our huge English dictionary (plus French, Spanish and more than 25 other dictionaries) contains information on how probable combinations of a word would be based on the specifics of the actual knowledge base, as well as general features of the language.

The solution to the incorrectly-typed word

From this point on the semantic search engine will no longer use the original wrongly-typed search query anymore, but instead find an answer for this corrected version: “How much will me cost to call to France?”
The next step is to resolve any ambiguities in the sentence (in natural language, there are always ambiguities). We’ll use various tools to accomplish this: domain dictionaries, lexical functions, local grammars and syntactic analysis. Performing the correct syntactic and semantic analysis is crucial to finding relevant answers.

Ambiguity

 How syntactic and semantic analysis works

Through syntactic analysis, Inbenta determines the precise function of every word in this particular question. The result of this analysis is as follows:

Speech

 Further analysis of the sentence will depend on many factors including the domain dictionary, disambiguation rules, context, etc. For example, the word “call” can be a noun and have various meanings unrelated to phone calls, like “urge”, “decision”, “cry” etc. But as a verb and in the context of the telecommunication industry, “call” unambiguously relates to calling by phone.
Next, we perform a semantic analysis to determine the relative importance of every word in the sentence. Users often add a lot of information in search requests that’s not essential to finding a relevant answer. Additional search words introduced by users, which in the search industry is called noise, can frequently lead to irrelevant results, but our system is able to determine which words are worth taking into account because they carry meaning.
Based on our current knowledge base and unique semantic-statistic algorithm, we would assign this particular semantic weight relative to every term in the search query:

Percentage

As you can see, not every term in the search query has the same importance. In this particular case, “cost” and “France” contain almost 70% of the whole “semantic weight” of the search query. So now we’re close to having a relevant answer for this question.

The role of domain dictionaries and lexical functions

Inbenta possesses an extensive dictionary that contains hundreds of thousands of terms and many kinds of semantic relationships called “lexical functions”. Those lexical functions can cover easy relations between words, such as synonyms, and more complex relations that depend on the context in which the term appears.
Thanks to the domain dictionary and lexical functions, we’re able to recognize phrases that have the same meaning despite using different words to express that meaning. With this ultimate “meaning algorithm” we’re able to know how close or far those sentences are from each other in terms of meaning.

An example of two lexical functions

The term “international” is related to “nation”, and therefore to the names of all the different nations. That tells us that “France” and “international” are somehow related.

Semantic Tree

Also, this particular user asked about “cost”, which is the way users often perceive what they pay for a service. However, the author of the FAQ wrote it using the term “price”, which is pretty much the same concept, yet from the perspective of the one who is paid, not the one who pays. Therefore, “cost” and “price” are related.

Paraphrase

By applying these two lexical functions, we know that the user query is related to the concepts of ‘pricing’ and ‘international’, even though the query never actually uses these words.

Constant growth of lexical functions strengthens AI

Our lexicon functions are growing every day through the work of Inbenta Computational Linguists, who continue to develop the extension of semantic relationships and the complexity of these lexical functions. In this particular case, our patented “semantic coincidence algorithm” in combination with our extensive lexicon, allows us to compute that the user query and this particular FAQ have a “Semantic Coincidence Score” of precisely 57.49%. If no other FAQ has a better Semantic Coincidence Score, the one with the title “What is the price for international calls?” will be the one that’s shown first in the results page. It’s key to point out that the author of the FAQ only wrote, “What is the price for international calls?”; he didn’t have to enter alternative rephrasing, add tags, edit synonym index files or review tedious vocabulary alternatives.

What it all adds up to is that you can let us do the heavy lifting, because we believe that knowledge base creators and editors only need to focus on what the content is, not how the content will be found. Which gives your knowledge manager more time to write better content.

Inbenta Natural Language Search is the answer

Even when thousands of FAQs are being searched and thousands of users are asking different questions at exactly the same moment, our search technology answers the call. In the example illustrated, our search engine took only 0.0033376097679138 seconds to find the right answer to the question, with a base of more than 2,500 FAQs to choose from. Look to Inbenta for semantic search with amazing response time.