Query overview
IBM Watson® Discovery offers powerful content search capabilities through search queries.
To retrieve data from Discovery after it is ingested, indexed, and enriched, submit a query.
As data is added to Discovery, a representation of each file is stored in the index as a JSON-formatted document. Enrichments that are applied to your collections identify meaningful information in the data and store it in new fields in these documents. To search your data, submit a query to return the most relevant documents and extract the information you're looking for.
Query types
Discovery accepts one of the following supported query types:
- Query
-
Finds documents with values of interest in specific fields in your documents. Queries of this type use Discovery Query Language syntax to define the search criteria.
Parameter name:
query
- Natural Language Query (NLQ)
-
Finds answers to queries that are written in natural language. NLQ requests accept a text string value.
Parameter name:
natural_language_query
Along with the query that you specify by using one of the supported query types, you can include one or both of the following parameters. The values for these parameters are also specified by using the Discovery Query Language (DQL) syntax:
filter
aggregation
For more information about the Discovery Query Language, see DQL overview.
Queries that are submitted from the product user interface are natural language queries. A few other supported parameters are specified and given default values based on the project type in use. For more information, see Default query settings.
Discovery does not log query request data. You cannot opt in to request logging.
Choosing the right query type
The following table summarizes the capabilities that are supported for each query type. Use it to help you determine which type of query to submit.
Goal | Natural Language Query (NLQ) | Discovery Query Language (DQL) |
---|---|---|
Return passages from documents | ||
Highlight terms in responses (unless passages per document is enabled) | ||
Define custom stop words or query expansions | ||
Search specific document fields or enrichments | ||
Use operators, such as boolean clauses in the search | ||
Enable spelling correction | ||
Add curations to return hardcoded answers to certain questions | ||
Use relevancy training | ||
Enable answer finding to return a succinct answer from a passage | ||
Use table retrieval |
Query analysis
When you submit a query, the query text string is analyzed. During query analysis, the root (or lemma) of each key term in the query is identified. Any stop words that occur in the original query string are removed and synonym expansions that are defined for any terms that occur in the original query string are added. This enhanced version of the query is what gets submitted to Discovery.
The same analysis is performed on all queries, whether they are submitted as natural language queries or by using Discovery Query Language syntax.
Query flow
The following diagram shows a conceptual illustration of how a search request is handled by Discovery.
The following processes are shown in the flow diagram:
- BM25
- Uses Best Match 25 (a probabilistic information retrieval algorithm) to compute a relevance score for each document returned by search. The diagram shows that BM25 is applied to document results from the query requests, but it is not limited to query requests. It also is used along with other techniques as part of the relevancy training ranker process that is applied to natural language query results.
- Curations
- If the natural language query matches a predefined curation query, then certain documents and possibly a hardcoded snippet are returned. There is no query parameter to enable a curation. For curations to be used, you must define them programmatically (Create curation method). The output of any curations is merged with the output of the Relevancy training ranker or QPP results.
- Relevancy training
- A model that you can optionally define and apply to a project to score documents for relevance. There is no query parameter to enable relevancy training. For relevancy training to be used, you must successfully train the project either programmatically (Create training query method) or by using the product user interface.
- QPP
- A Query Performance Prediction algorithm that, given a query and a list of top results, produces a score that determines how relevant a document is. Used only if no Relevancy training ranker is available.
- filter
- The
filter
parameter can be passed along withquery
andnatural_language_query
requests to remove documents that don't meet certain criteria from the result set. The filter is shown as the last step within the document retrieval phase. However, it is used at different times in the flow. Its placement in the diagram is chosen to emphasize the fact that any documents that don't match the filter definition are excluded from the result set. The exclusion applies even to documents that might be specified in a curation. - Passage retrieval
- Returns passages from documents when the
passages.enabled=true
parameter is included with a natural language query request. - Answer finding
- When the
passages.find_answers=true
parameter is included with a natural language query request, returns succinct answers from passages along with the passages that are extracted from documents. If answer finding is enabled, then the final confidence score for each search result is a combination of the confidence scores from answer finding, passage retrieval, and QPP or Reranked search, whichever method is used. - Table retrieval
- Returns information from tables in documents when the
table_results.enabled=true
parameter is included with a natural language query request.
Query limits
A query is any operation that submits a POST
request to the /query
endpoint of the API. Such operations include queries that are submitted by using the API. It does not include queries that are submitted from the search
bar on the Improve and customize page of the product user interface.
A query is counted only if the request is successful, meaning it returns a response (with message code 200).
The number of search queries that you can submit per month per service instance depends on your Discovery plan type.
Plan | Queries per month per service instance |
---|---|
Cloud Pak for Data | Unlimited |
Premium | Unlimited |
Enterprise | Unlimited |
Plus (includes Trial) | 500,000 |
For Enterprise plans only, your bill labels requests that are generated from both query searches and analyze API calls as "Queries". For more information about Analyze API calls, see Analyze API limits.
The number of queries that can be processed per second per service instance depends on your Discovery plan type.
Plan | Concurrent queries per service instance |
---|---|
Cloud Pak for Data | Unlimited |
Premium | 50 |
Enterprise | 5 |
Plus (includes Trial) | 5 |
For information about pricing, see Discovery pricing plans.
Estimating query usage
How to estimate the number of queries your application will use per month depends on your use case.
- For use cases that focus more on data enrichment and analysis or where the output from the document processing is not heavily searched, you can estimate query numbers based on the total number of documents.
- For use cases where many users interact with the application that uses Discovery, you can estimate by calculating the number of searches per user times the number of expected users. For example, 50% of the questions that are submitted by users to a virtual assistant are likely to be answered by Discovery. With 100,000 users per month and an average of 3 questions per user, you can expect 15,000 queries per month. (10,000 users/mo * 3 queries/user * 50% to Discovery = 15,000)
Querying with document-level security enabled
IBM Cloud Pak for Data IBM Cloud Pak for Data only
This information applies only to installed deployments.
If you enable document-level security for a collection, only documents that the current user has permission to access are returned in search results. For more information, see Configuring document-level security.
To return search results that adhere to the security restrictions, the current user must meet these requirements:
- Have access to your Discovery instance.
- Have access to the data source.
If the current user does not meet these requirements, no search results are returned.
The username that is associated with your Discovery instance is used to generate an authorization token. The token is used to authenticate Discovery queries.
To generate each access token, run the following command:
curl -u "{username}:{password}" \
"https://{hostname}:{port}/v1/preauth/validateAuth"
Replace {username}
and {password}
with the user's Discovery credentials.
Use the bearer token that is associated with the user when you run the query.
curl -H "Authorization: Bearer {token}" \
'https://{hostname}/{instance_name}/v2/projects/{project_id}/collections/{Collection_ID}/query\?version\=2019-11-29'