Using IBM Watson Concept Insights for Natural Language Search

We are trying to implement natural language search using the IBM Watson Cognitive Insights (CI) service. We want the user to be able to enter a question using natural language and then return the corresponding document (s) from the CI corpus. We use CI rather than Watson QA to avoid training and lower Watson infrastructure costs (i.e. avoiding the need to use a dedicated Watson instance for each case / use case).

We can create the required corpus through the CI API, but we are not sure which API to use in which order to make the most accurate / precise request.

Our initial thought was as follows:

  • Accept the user's natural language question and post this text string in the "Identifies Concepts in Part of Text" API (listed 6th from the bottom in the CI API Reference Document) for a list of concepts related to the question.

  • Then, perform a GET using the Performs Conceptual Search within corpus API (listed below at the bottom of the CI API reference document) to get a list of related documents from the corpus.

The first question is, is this the right way to achieve our goal described in the first paragraph of this article? Should we combine the CI APIs in different ways, or use multiple Watson services to achieve the goal?

If our initial approach is correct, we find that when we present a simple question (such as "How to repair corruption of a MySQL database") to "Identifies concepts in text", we are not receiving a complete list of related concepts. For example:

curl -u userid:password -k -d "How can I repair MySQL database corruption" https://gateway.watsonplatform.net/concept-insights-beta/api/v1/graph/wikipedia/en-20120601?func=annotateText

      

returns:

[{"concept":"/graph/wikipedia/en-20120601/MySQL","coords":[[17,22]],"weight":0.85504603}]

      

However, there are clear concepts associated with the example (repair, damage, database, etc.).

In another example, we simply sent the text "repair" to the "Identifies concepts in text" API:

curl -u userid:password -k -d "repair" https://gateway.watsonplatform.net/concept-insights-beta/api/v1/graph/wikipedia/en-20120601?func=annotateText

      

and it returned:

[{"concept":"/graph/wikipedia/en-20120601/Repair","coords":[[0,6]],"weight":0.65392953}]

      

It seems that we also had to bring back the "Repair" concept from the first example. Why the API will return the concept of "repair" when we submit "repair", but not when we submit the text "How can I repair MySQL database corruption", which also includes the word "repair".

Please advise on how best to implement a natural language search feature powered by Watson Concept Insights (possibly in combination with other services if needed).

+3


source to share


1 answer


Thank you very much for your question and my apologies for answering it so late.

The first question is the correct way to achieve our goal described in the first paragraph of this post? Should we combine the CI APIs> in different ways, or use multiple Watson services to achieve the goal?

Doing the above steps would be the natural way to accomplish what you want to do. Note, however, that the "annotated text" API currently uses exactly the same technology that the system has to connect documents in your corpus to concepts in the base knowledge graph, and as such is more "paragraph" oriented, and not an individual question. More precisely, the problem of concept extraction in a smaller piece of text is usually more difficult than in most of the text, because the latter has more context to use to make the right choice. Given this observation, the annotated text API is again becoming more conservative given its point focus.

Having said that, the / v2 API we now have improves the speed and quality of the concept extraction technology, you might be better off using it to extract topics from natural language questions. Here's what I will do / follow up:

1) Clearly display to the user which CI is extracted from the natural language at the input. Our APIs give you the ability to get a little abstraction for each concept that can be used to explain to the user what a concept means - use that.



2) Give the user the option to exclude the concept from the extracted concept list (cross it out)

3) Since the concepts in conceptual representations currently mostly correspond to the concept of "topic", there is no way to infer a more abstract meaning (for example, if the key to the meaning of a question is on a verb or an adjective as opposed to a noun, understanding the concept would be a bad way to withdraw). Watson has technology focused on the question answering as you pointed out earlier (the natural language classifier is one component of this), so I would like to take a look at that.

However, it is clear that there are other concepts related to the question> example (recovery, corruption, database, etc.).

The answer to this question and to the rest of the question posed is in a sense above - our intention was to provide the technology for the "larger text" first, which, as I explained, is an easier task. Since this question was first posted today, we have implemented a new annotation technology (/ v2), so I would encourage the reader to make sure it works a little better.

In the long term, we intend to provide the user with a formal way to specify the context for a common application, in order to increase the chances of extracting relevant concepts. We also have a plan for the user to specify custom concepts as it has been observed in the past that some topics of interest to users cannot be matched in our current project because they are not on wikipedia.

+3


source







All Articles