Query overview

IBM Watson® Discovery offers powerful content search capabilities through search queries.

To retrieve data from Discovery after it is ingested, indexed, and enriched, submit a query.

As data is added to Discovery, a representation of each file is stored in the index as a JSON-formatted document. Enrichments that are applied to your collections identify meaningful information in the data and store it in new fields in these documents. To search your data, submit a query to return the most relevant documents and extract the information you're looking for.

Query types

Discovery accepts one of the following supported query types:

Query

Finds documents with values of interest in specific fields in your documents. Queries of this type use Discovery Query Language syntax to define the search criteria.

Parameter name: query

Natural Language Query (NLQ)

Finds answers to queries that are written in natural language. NLQ requests accept a text string value.

Parameter name: natural_language_query

Along with the query that you specify by using one of the supported query types, you can include one or both of the following parameters. The values for these parameters are also specified by using the Discovery Query Language (DQL) syntax:

filter
aggregation

For more information about the Discovery Query Language, see DQL overview.

Queries that are submitted from the product user interface are natural language queries. A few other supported parameters are specified and given default values based on the project type in use. For more information, see Default query settings.

Discovery does not log query request data. You cannot opt in to request logging.

Choosing the right query type

The following table summarizes the capabilities that are supported for each query type. Use it to help you determine which type of query to submit.

Query types comparison
This table has row and column headers. The row headers identify query types. The column headers identify different goals you might have when you submit a query. To understand which query to use to support your goal, go to the row that describes the goal, and find the column that identifies the query type that supports your goal.
Goal	Natural Language Query (NLQ)	Discovery Query Language (DQL)
Return passages from documents
Highlight terms in responses (unless passages per document is enabled)
Define custom stop words or query expansions
Search specific document fields or enrichments
Use operators, such as boolean clauses in the search
Enable spelling correction
Add curations to return hardcoded answers to certain questions
Use relevancy training
Enable answer finding to return a succinct answer from a passage
Use table retrieval

Query analysis

When you submit a query, the query text string is analyzed. During query analysis, the root (or lemma) of each key term in the query is identified. Any stop words that occur in the original query string are removed and synonym expansions that are defined for any terms that occur in the original query string are added. This enhanced version of the query is what gets submitted to Discovery.

The same analysis is performed on all queries, whether they are submitted as natural language queries or by using Discovery Query Language syntax.

Query flow

The following diagram shows a conceptual illustration of how a search request is handled by Discovery.

Shows the different ways that query and natural language query requests are handled by Discovery — Flow chart that shows the processes that are used for Natural Language Queries versus Discovery Query Language queries

The following processes are shown in the flow diagram:

BM25: Uses Best Match 25 (a probabilistic information retrieval algorithm) to compute a relevance score for each document returned by search. The diagram shows that BM25 is applied to document results from the query requests, but it is not limited to query requests. It also is used along with other techniques as part of the relevancy training ranker process that is applied to natural language query results.
Curations: If the natural language query matches a predefined curation query, then certain documents and possibly a hardcoded snippet are returned. There is no query parameter to enable a curation. For curations to be used, you must define them programmatically (Create curation method). The output of any curations is merged with the output of the Relevancy training ranker or QPP results.
Relevancy training: A model that you can optionally define and apply to a project to score documents for relevance. There is no query parameter to enable relevancy training. For relevancy training to be used, you must successfully train the project either programmatically (Create training query method) or by using the product user interface.
QPP: A Query Performance Prediction algorithm that, given a query and a list of top results, produces a score that determines how relevant a document is. Used only if no Relevancy training ranker is available.
filter: The filter parameter can be passed along with query and natural_language_query requests to remove documents that don't meet certain criteria from the result set. The filter is shown as the last step within the document retrieval phase. However, it is used at different times in the flow. Its placement in the diagram is chosen to emphasize the fact that any documents that don't match the filter definition are excluded from the result set. The exclusion applies even to documents that might be specified in a curation.
Passage retrieval: Returns passages from documents when the passages.enabled=true parameter is included with a natural language query request.
Answer finding: When the passages.find_answers=true parameter is included with a natural language query request, returns succinct answers from passages along with the passages that are extracted from documents. If answer finding is enabled, then the final confidence score for each search result is a combination of the confidence scores from answer finding, passage retrieval, and QPP or Reranked search, whichever method is used.
Table retrieval: Returns information from tables in documents when the table_results.enabled=true parameter is included with a natural language query request.

Query limits

A query is any operation that submits a POST request to the /query endpoint of the API. Such operations include queries that are submitted by using the API. It does not include queries that are submitted from the search bar on the Improve and customize page of the product user interface.

A query is counted only if the request is successful, meaning it returns a response (with message code 200).

The number of search queries that you can submit per month per service instance depends on your Discovery plan type.

Number of queries per month
Plan	Queries per month per service instance
Cloud Pak for Data	Unlimited
Premium	Unlimited
Enterprise	Unlimited
Plus (includes Trial)	500,000

For Enterprise plans only, your bill labels requests that are generated from both query searches and analyze API calls as "Queries". For more information about Analyze API calls, see Analyze API limits.

The number of queries that can be processed per second per service instance depends on your Discovery plan type.

Number of concurrent queries
Plan	Concurrent queries per service instance
Cloud Pak for Data	Unlimited
Premium	50
Enterprise	5
Plus (includes Trial)	5

For information about pricing, see Discovery pricing plans.

Estimating query usage

How to estimate the number of queries your application will use per month depends on your use case.

For use cases that focus more on data enrichment and analysis or where the output from the document processing is not heavily searched, you can estimate query numbers based on the total number of documents.
For use cases where many users interact with the application that uses Discovery, you can estimate by calculating the number of searches per user times the number of expected users. For example, 50% of the questions that are submitted by users to a virtual assistant are likely to be answered by Discovery. With 100,000 users per month and an average of 3 questions per user, you can expect 15,000 queries per month. (10,000 users/mo * 3 queries/user * 50% to Discovery = 15,000)

Querying with document-level security enabled

IBM Cloud Pak for Data IBM Software Hub

This information applies only to installed deployments.

If you enable document-level security for a collection, only documents that the current user has permission to access are returned in search results. For more information, see Configuring document-level security.

To return search results that adhere to the security restrictions, the current user must meet these requirements:

Have access to your Discovery instance.
Have access to the data source.

If the current user does not meet these requirements, no search results are returned.

The username that is associated with your Discovery instance is used to generate an authorization token. The token is used to authenticate Discovery queries.

To generate each access token, run the following command:

curl -u "{username}:{password}" \
"https://{hostname}:{port}/v1/preauth/validateAuth"

Replace {username} and {password} with the user's Discovery credentials.

Use the bearer token that is associated with the user when you run the query.

curl -H "Authorization: Bearer {token}" \
'https://{hostname}/{instance_name}/v2/projects/{project_id}/collections/{Collection_ID}/query\?version\=2019-11-29'