Release notes for Discovery for IBM Cloud

Learn about features and changes that were included for each release and update of the product software.

IBM Cloud

This information applies only to managed instances of IBM Watson® Discovery that are hosted on IBM Cloud or that were provisioned with IBM Cloud Pak for Data as a Service. For information about releases and updates for installed deployments, see Release notes for IBM Watson® Discovery Cartridge for IBM Cloud Pak® for Data.

Deprecation announcement of dynamic website web crawl feature

The dynamic website web crawl feature, which is controlled by the Execute JavaScript during crawl switcher in Crawl settings, is deprecated and will be removed by September 2025.

Action required to preserve any existing index

To prevent the loss of your existing index, it is essential to:

Disable the Execute JavaScript during crawl switcher. To disable the switcher, go to the Manage collections page, open the collection that uses dynamic website web crawl, and click the Processing settings tab. Under Specify where you want to crawl, click the edit icon for each URL, and disable the Execute JavaScript during crawl switcher.
Suspend any scheduled crawls that use the dynamic website web crawl feature

Once the Execute JavaScript during crawl switcher is disabled, it cannot be enabled again.

Consequences of not disabling the switcher

If you do not disable the switcher, it will be automatically disabled by September 2025, and the following scheduled web crawls will be suspended. This action prevents the existing index from being replaced with a reduced index, which depends on how JavaScript is used in the crawled site. The suspended crawls will not resume until they are reconfigured in the UI.

Impact on scheduled web crawls

Scheduled web crawls that do not use the dynamic website web crawl feature will continue to run as usual, without any interruptions.

29 February 2024

New Intelligent Document Processing (IDP) project type: The IDP project type is the new default project type in Discovery. Use the IDP project type to understand quickly what data is extracted from your documents in a rich document preview and also improve the data by applying enrichments. For more information, see Intelligent Document Processing.

29 January 2024

Enable stemming instead of lemmatization for normalization when you create a collection: You can now select stemming instead of lemmatization to normalize words in the index and queries. For more information, see Enabling stemming for uncurated data.

16 November 2023

APIs for get collection details, list documents, and get document details are now supported in Premium plans of IBM-Cloud managed instances

In Premium plans, the APIs are supported for collections that are created after 16 November 2023. If you want to get information about a collection that was created earlier, trigger a process that runs the conversion step of ingestion on the documents. For example, you can enable the APIs by making changes in the Identify fields, Manage fields, CSV settings, or Processing settings (such as OCR settings) pages, or by applying a Smart Document Understanding model to the older collection.

For more information about the new API, see the API reference documentation.

7 November 2023

Preview data for collections: You can preview a document in a collection. To preview data in the advanced document view, navigate to the Manage collections page, and click Preview data in the collection tile. Alternatively, you can open a collection that you want to preview, and click Preview data.

4 October 2023

The optical character recognition (OCR) feature for Hebrew language text in images is a beta feature in Discovery

When OCR is enabled, text extraction and OCR-identified text extraction have limitations for the Hebrew language. These limitations might include the following:

Inaccurate word order for plain text extraction
Extracted content in the text and html formats present the text in different word order
Punctuation and newlines are placed incorrectly in the text
Text order within a word is reversed depending on the collection settings
Missing text, text ordered incorrectly, or both might occur when a page contains plain text and image text.

Export labeled data for an entity extractor

You can export the labeled data for an entity extractor for training or building large language models (LLMs). For more information, see Exporting labeled data for an entity extractor.

Find terms that you want to label as entity examples in a document.

You can now search for terms that you want to label as entity examples in a document. You can also find labeled and unlabeled entity examples, and correct any labeling inconsistencies. For more information, see Searching for examples by using keywords.

External enrichment feature to annotate documents with a model of your choice.

Through a webhook interface, you can use custom models or advanced foundation models, and other third-party models for enriching your documents in a collection. For more information, see External enrichment API.

The Part of Speech enrichment is no longer available for any project types other than Content Mining

The Part of Speech enrichment had been used for dictionary suggestion. However, dictionary suggestion has been updated and it can now work without the Part of Speech enrichment applied. For Content Mining projects, the Part of Speech enrichment is available as before.

21 September 2023

Updated the tokenizer for all languages

The updated tokenizer might affect the ranking order of results for certain queries. If you observe any ranking differences in your query results, you can reindex the documents in the collection. Discovery tokenizes words both when it ingests and stores data in the index, and at run time when it analyzes queries that are submitted by users. By reindexing the collection, you ensure that your documents are indexed with the same tokenizer that is used for matching queries.

To reindex documents, open the Manage collection page, choose a collection, and navigate to the Enrichments tab. Select a field to enrich, and then clear the field. Next, click Apply changes and reprocess and wait for the documents in the collection to be reprocessed.

15 August 2023

Option to apply or remove a crawl schedule: This option is helpful for easily applying or removing a crawl schedule, and also for stopping a crawl. For more information, see Crawl schedule options.

9 August 2023

You can now specify fields from which to extract content when querying data from the UI: The ability to specify fields allows you to improve the search results when content is not indexed in the default fields. Content might not be indexed in the default fields when you ingest structured files or when you apply a Smart Document Understanding model. For more information, see Excerpt unavailable.
Enrichments in the advanced document view for PDFs are highlighted in distinct colors: When you select multiple enrichments in the advanced document view for PDFs, each enrichment type is highlighted in the document with distinct colors. Overlapping enrichments are also highlighted in a distinct color.

26 July 2023

You can now specify a custom date and time for the crawl schedule: This option is helpful if you want to avoid heavy load on a target system during business hours. For more information, see Crawl schedule options.

10 June 2023

All Entities enrichments use the Entities v2 type system: Natural Language Understanding Entities v1 is no longer supported. IBM Cloud instances that were created before 2 June 2021 and Discovery for IBM Cloud Pak for Data 2.x deployments used version 1 of the Natural Language Understanding Entities type system for English and Korean collections. Now, all collections use only version 2 of the Natural Language Understanding Entities type system.
Classifiers are identified more clearly: The Enrichments page lists classifier enrichments as either text classifier or document classifier enrichments.

16 May 2023

Improved tool for creating Smart Document Understanding (SDU) user-trained models: The SDU tool that you use to annotate documents when you create a user-trained SDU model now uses the React UI framework. This update does not change the behavior of the tool, but does make it more responsive.
You can now define JSON normalizations by using the Collections API: The Create a collection and Update a collection methods now support the addition of conversions and normalizations objects that you can specify to apply normalization operations to the documents in the collection. For example, you can define an operation to copy or merge one field to another in the JSON representation of the documents. The conversions object defines normalization operations that occur during ingestion and the normalizations object defines normalization operations that occur after enrichments are applied. For more information, see the Collections API reference.

31 March 2023

Update to API version

The current API version (v2) is now 2023-03-31. One change was made with this version.

Changed how fields named document_id are handled

If you add a JSON file that contains a field named document_id to a collection, the field is ignored. The system assigns a new unique document ID to the document when it is added to the index. To assign a document ID to a document regardless of its file type, use the Update document method from the API.

Previously, when you uploaded a JSON file with a field named document_id from the product user interface or by using the Add document API method, the document ID from the file was shown as the document_id value in query results. However, a different document ID was assigned to the document, and the assigned ID had to be used for certain other tasks, such as deleting the document. If your application relies on the previous behavior, specify a version number earlier than 2023-03-31, such as 2020-08-30, in your API calls.

2 March 2023

Now you can specify the types of files to add to a collection: When you connect to an external data source, you can limit the types of files to add to the collection from the external data source. For example, you can choose to add only PDF files from a Box data source.

21 Febuary 2023

Optical character recognition v2 technology is used

The latest version (OCR v2) is used automatically when you enable OCR for English, German, French, Spanish, Dutch, Brazilian Portuguese, and Hebrew collections in all IBM Cloud service plans.

The new optical character recognition model was developed by IBM Research to be better at extracting text from scanned documents and other images that have the following limitations:

Low quality images due to incorrect scanner settings, insufficient resolution, bad lighting (such as with mobile capture), loss of focus, unaligned pages, and badly printed documents
Documents with irregular fonts or a variety of colors, font sizes, and backgrounds

The entity extractor limits changed

The number of documents that are allowed in the training data for the Plus plan increased from 100 to 200.

The number of entity types that you can create per plan decreased.

For Premium plans, the limit changed from 75 to 18.
For Enterprise plans, the limit changed from 50 to 18.
For Plus plans, the limit changed from 20 to 12.

The string variation operator now works with phrases

When you include the string variation operator with query input that contains a phrase, the variation is applied to each word in the phrase. For example, "tom cat"~1 matches top hat in addition to tom cat. For more information about Discovery Query Language operators, see Query operators.

10 Febuary 2023

Entity extractor is generally available

The Extract entities enrichment brings the powerful ability to build a custom type system into Discovery. Use the tool to label entity examples within your industry data to build a machine learning model that Discovery can use to recognize meaningful terms for your business. Already built an entity type system in Knowledge Studio? You can use the corpus from Knowledge Studio as a starting point for your Discovery entity extractor training data. For more information, see Entity extractor.

If you created an entity extractor enrichment for testing purposes when the feature was in beta release, now that it is generally available, it will count toward your custom model limit. The entity extractor enrichment incurs charges whether or not it is applied to a collection.

7 Febuary 2023

Support for hourly crawls was removed

You can no longer choose to crawl a data source every hour. If an existing collection is configured to crawl hourly, you will be prompted to change the scheduled crawl the next time you edit the connector settings.

You can no longer enable FAQ extraction for a collection

The checkbox to enable or disable the beta FAQ extraction feature was removed. FAQ extraction was a beta feature that captured question-and-answer pairs from the data source as it was crawled. FAQ extraction generated a new subdocument for each pair and stored the question in the title field and the answer in the text field.

You cannot apply FAQ extraction to new collections.

Any existing collections with FAQ extraction enabled retain FAQ documents in their indexes until the collection is reprocessed. At that time, most of the question-and-answer pair subdocuments are deleted. However, any FAQ subdocuments that were generated from HTML or TXT source files remain. If you want to remove these subdocuments, go to the Manage data page to delete them. Subdocuments that are generated from one parent document all have the same metadata.parent_document_id value.

If you need a way to extract question-and-answer pairs from source documents that use a consistent style and formatting for questions and answers, you can use the Smart Document Understanding tool to annotate the pairs instead. For more information, see Using Smart Document Understanding.

25 January 2023

Set up a Microsoft SharePoint Online data store connector that has Read permission

When you create a Microsoft SharePoint Online connector to crawl a SharePoint data source by using Open Authentication v2, the enterprise application that is created by Discovery to make the connection requires Read permission only. The enterprise application that was configured for you previously required Write permission.

If you want to update an existing connector so that you can use the new Read permission configuration, you must delete your existing enterprise application first.

For more information, see Microsoft SharePoint Online connector.

FAQ extraction deprecation announcement

The beta FAQ extraction feature that detects and extracts question-and-answer pairs from documents is being removed. Support for the feature will end in 1Q 2023.

6 December 2022

Now you can stop a data source crawl: You can stop a crawl that is in progress or that is scheduled to occur in the future. For more information, see Stopping a crawl.

The following item is a known issue:

Box data source scheduled crawls are not updating documents: Due to a problem in the Box Events API, changes that occur between crawls in documents that are stored in Box are not detected and picked up by the Discovery collection during scheduled recrawls. To ensure that your collection is up-to-date, stop and restart the crawl.

1 December 2022

Plus plan supports fewer entity extractors: The maximum number of entity extractors that you can create with a Plus plan decreased from 6 to 3.

12 November 2022

Discovery users might experience issues with documents in collections where OCR is enabled that were added or processed between Nov 1 and Nov 11

Between 1 November and 11 November 2022, some projects with optical character recognition (OCR) enabled, including Document Retrieval for Contracts projects, experienced problems. The problems were related to a new version of the optical character recognition (OCR v2) feature that was enabled automatically for English, German, French, Spanish, Dutch, Brazilian Portuguese, and Hebrew collections during that timeframe. The new version changes sentence boundaries in ways that can negatively impact other functions, including element identification in contracts and the document labeling view in the entity extractor tool.

If you experience any of these issues with documents that were added or processed during this period, revert the version of OCR that is applied to the documents. Starting on 12 November 2022, OCR v1 is applied to all collections where OCR is enabled. To go back to using OCR v1, make a change that will reprocess the affected documents. For example, you can re-add documents that were added during the timeframe to reprocess them. Or you can reprocess an entire collection.

To reprocess a collection, from the Manage collections page, open the collection, and then go to the Processing settings tab. Expand the More processing settings section, set the OCR switch to Off, and then set it back to On. Click Apply changes and reprocess to reprocess your collection.

2 November 2022

A new and improved optical character recognition technology is available

A new version of optical character recognition technology is now available. This latest version (OCR v2) is used automatically when you enable OCR for English, German, French, Spanish, Dutch, Brazilian Portuguese, and Hebrew collections in all IBM Cloud service plans. The new optical character recognition model was developed by IBM Research to be better at extracting text from scanned documents and other images that have the following limitations:

Low quality images due to incorrect scanner settings, insufficient resolution, bad lighting (such as with mobile capture), loss of focus, unaligned pages, and badly printed documents
Documents with irregular fonts or a variety of colors, font sizes, and backgrounds

1 November 2022

Entity extractor loads the first 40,000 characters from training data documents: Even extra long documents from the collection that you use to define custom entity examples are loaded into the document view of the tool. However, only the first 40,000 characters, which is approximately 15-20 pages, are displayed. The rest of the file content is truncated. You'll know if your document is truncated because a notification is displayed in the document view. For more information, see Entity extractor.
You can set the passages per document setting to be higher than one: A bug was fixed that prevented you from using the search bar settings in the product user interface to increase the maximum number of passages to return per document. For more information, see How passages are derived.
Improved query aggregation documentation: The documentation that describes the aggregation types that you can specify in the query aggregation parameter was updated. For more information, see Query aggregations.

30 September 2022

Lite plans are no longer available from the London data center: Lite plans are discontinued. You cannot create new service instances that use the Lite plan type in any location, including London. Use the new Plus plan and its associated 30-day free trial to explore new features and a simpler way to build that is available with the latest version of the product.

22 September 2022

Plus plan supports more entity extractors: The maximum number of entity extractors that you can create with a Plus plan increased from 3 to 6.
You cannot apply a Smart Document Understanding model to Microsoft Excel files: The quality of structural analysis that can be produced for Excel files is not sufficient. Starting on 22 September 2022, you cannot apply an SDU model to Excel files. This change does not impact Excel files in collections where an SDU model was applied before 22 September 2022.

16 September 2022

In-context document preview is now available for PDF files that are crawled: When you click to view a passage from a search result that is extracted from a PDF document, a document preview page is displayed that shows the returned passage in the context of the original PDF page. The in-context view is available for PDF files to which a Smart Document Understanding model is applied.

15 August 2022

SDKs were updated to reflect the latest API changes.

The following Discovery v2 API changes are now reflected in the SDKs:

Use the new document classifier API to get, add, update, or delete a document classifier.
A new document status API is available. You can use it to get a list of the documents in a collection and to get details about a single document.
You can now get, add, and remove a stop words or expansion list for a collection.
A smart_document_understanding field is returned with the Get collection method. This new field specifies whether an SDU model is enabled for the collection and indicates the model type.
A similar parameter is available from the Query method. Use it to find documents that are similar to documents of interest to you.
The suggested_refinements parameter of the Query method is deprecated. The suggested_refinements parameter was used to identify dynamic facets from Premium plan data.

8 August 2022

Larger documents can be crawled: The maximum file sizes that are allowed for crawled documents increased for Premium plans. It also increased for the Box, IBM Cloud Object Storage, and Salesforce connectors. For more information, see File size limits.

2 August 2022

IAM authentication support was added to the IBM Cloud Object Storage connector: You can now choose to authenticate with the IBM Cloud Identity and Access Management (IAM) service. For more information, see IBM Cloud Object Storage.

28 July 2022

API updates

The following changes were made to the Discovery v2 API.

New fields are available:

A smart_document_understanding field is returned with the Get collection method. This new field specifies whether an SDU model is enabled for the collection and indicates the model type.
A similar parameter is available from the Query method. Use it to find documents that are similar to documents of interest to you.

The suggested_refinements parameter of the Query method is deprecated. The suggested_refinements parameter was used to identify dynamic facets from Premium plan data.

Discovery v1 deprecation announcement

Watson Discovery v1 is being deprecated. Existing clients who use Watson Discovery v1 are asked to migrate to Watson Discovery v2 before the end-of-support date of 11 July 2023. End of Support means that no v1 instance will work on or after 11 July 2023. For more information about migration, see Getting the most from Discovery.

11 July 2022

The advanced document view highlights even more enrichments

In addition to the built-in Entities and Keywords enrichments that are recognized by Watson Natural Language Processing models, the advanced document view now highlights the following types of enrichments:

Custom dictionary terms
Terms or numbers that match regular expression patterns that you define
Custom entities and relationships that are defined by Watson Knowledge Studio machine learning and rules-based models
Custom entities that are defined by using the entity extractor tool that is available as a beta feature

For more information about enrichments that you can add to your documents, see Adding domain-specific resources.

30 June 2022

Watson SDK support change

Support for the following SDKs is provided by the Watson community of developers instead of IBM:

Go
Ruby
Swift
Unity

For more information, see Watson SDKs.

1 June 2022

The entity extractor tool is now easier to use: The user interface was redesigned to better support the workflow of adding entity types and labeling examples of them. As part of the new design, the bulk labeling feature now is enabled by default, the documents view is easier to find and use, the suggestions pane is more responsive, and you can track metrics scores across multiple training runs. For more information about the entity extractor, see Customizing the terms that Discovery can recognize.
The entity extractor is now available in more plans and languages: The entity extractor beta feature is now available to users of Plus and Enterprise plans in addition to Premium plans. The extractor enrichment is supported for collections in languages other than English.
When you remove a starting URL from a Web crawl connector its associated documents are deleted: The Web crawl connector was updated. Starting with collections that you create after April 2022, if you remove a starting URL from the Web crawl configuration, any indexed documents that were derived from the content of the web page at that URL are deleted with the next crawl. For more information, see Web crawl.

16 May 2022

Added API methods for working with stop words and expansion lists: You can now get, add, and remove a stop words or expansion list for a collection programmatically. For more information, see the Query modifications methods.

13 May 2022

An improved JSON view is available: You can now use keyboard keys to tab through elements in the view. The new JSON view also numbers the occurrences of elements in each JSON object, which makes it easier to keep track of information and to read totals at a glance.

20 April 2022

Analyze API is supported in Enterprise plan deployments

Use the Analyze API to process a JSON file according to a collection's configuration settings, and then return the file for realtime use without storing it in the collection. The Analyze API was supported only in installed deployments previously. For more information, see Analyze API.

A new document status API is available

Use the new document status API to programmatically get a list of the documents in a collection and to get details about a single document. The following notes apply to this release:

The API is supported for collections that are created after 23 March 2022.

If you want to get status information about a collection that was created earlier, trigger a process that runs the conversion step of ingestion on the documents. For example, you can enable the API by making changes in the Identify fields, Manage fields, CSV settings, or Processing settings (such as OCR or FAQ extraction settings) pages, or by applying a Smart Document Understanding model to the older collection.
The API is available only from Plus and Enterprise plan instances.

For more information about the new API, see the API reference documentation.

More messages are shown to keep you informed about the status of document processing

An issue was fixed which previously prevented informative messages from being displayed about the status of document conversion and indexing during the ingestion process. Now that the issue is fixed, you might see more messages than usual when you add or reprocess documents. This increase is expected. Nothing you did caused the increase in messages.

6 April 2022

Project tile has a more intuitive menu: The project tile was updated to include an overflow menu that you can use to perform actions such as deleting or renaming a project.

30 March 2022

A new document classifier API is available

Use the new document classifier to programmatically get, add, update, or delete a document classifier. Document classifier methods are supported on installed intances (IBM Cloud Pak for Data) or IBM Cloud-managed Premium or Enterprise plan instances.

For more information about the new API, see the API reference documentation. For more information about adding a document classifier by using the product user interface, see Classifying documents.

21 March 2022

Visualize enrichments found in your documents

When you click to view the passage from a search result, a document preview page is displayed that shows a representation of the original document where the search result was found. For most document types, you can open a new advanced view of the document to see useful summary information, such as the number of occurrences of any enrichments that are detected in the document. You also can select one of the enrichments to highlight every occurrence of the element within the document text.

Currently, only the Entities and Keywords enrichments are listed.

Improved format of search results from PDF documents

When you click to view a passage from a search result that is extracted from a PDF document, a document preview page is displayed that shows the returned passage in the context of the original PDF page.

The in-context view is available for PDF files to which a Smart Document Understanding model is applied. The rich preview does not work on images, meaning it doesn't work on scanned PDF documents. The in-context view is available for PDFs in all languages; however, the enrichment highlighting might be misaligned in some languages.

Tell us what you think

Share your opinions and ideas with us at any time by clicking the Share feedback button from the page header of the product user interface.

10 March 2022

Manage the data in a collection from the new Manage data page: You can now access the Manage data page for a collection from the Manage collections navigation pane. Go there to see a list of the documents in your collection and get a quick view of information about the documents. You can also delete documents from a collection with just a few clicks. For more information, see Excluding content from query results.

15 February 2022

An alternative authentication mechanism is available for Microsoft Sharepoint Online connectors: You can now use Open Authentication to sign in to Microsoft SharePoint directly when you configure a new IBM Cloud connector. The Sign in with Microsoft option that uses Open Authentication to authenticate with the external data source is a beta feature. For more information, see Microsoft SharePoint Online.

7 January 2022

Upgrade from Plus to Enterprise without help: You can perform an in-place upgrade from a Plus plan to an Enterprise plan. For more information, see Upgrading.

6 December 2021

Crawling web pages with dynamic content is now generally available: The Execute JavaScript during crawl feature was introduced as a beta feature, but is now generally available. For more information, see Web crawl.
Capturing the SharePoint ACL information from crawled documents: You can now configure the data source crawl to store ACL information as metadata in the documents that are added to your SharePoint Online collection. For more information, see Microsoft SharePoint Online.
You can add more documents to the training data of the beta entity extractor model: If you added and labeled 20 documents to train a model, and now want to continue to improve the model's performance, you can add more documents. Add the additional documents to the collection that you are using to train the model. After you label the first 20 documents, and the model is up to date with any changes, you can choose to continue labeling documents. The new documents that you added to the collection are loaded. You can label them to augment the training data, and then retrain your model. For more information, see Customizing the terms that Discovery can recognize.
Log out of Discovery: You can log out of the Discovery service instance at any time by clicking Log out from the user profile menu that is available from the page header of the product user interface.

18 November 2021

Enterprise plan is now available everywhere: The Enterprise plan is available from all data center locations. Scale and secure your Discovery application with enterprise-grade support and performance, and address more use cases including contract analysis and content mining to explore insights across documents. For more information, see Discovery pricing plans.

11 November 2021

New locations for Enterprise plan now available: The Enterprise plan is available from the Frankfurt, London, Sydney, and Tokyo locations in addition to the Dallas location.

3 November 2021

New Enterprise plan: Scale and secure your Discovery application with enterprise-grade support and performance and address more use cases, including contract analysis and content mining to explore insights across documents. Currently, the Enterprise plan is available only from the Dallas location. For more information, see Discovery pricing plans.
New beta entity extractor enrichment: The Extract entities enrichment brings the powerful ability to build a custom type system into Discovery. Use the tool to label entity examples within your industry data to build a machine learning model that Discovery can use to recognize meaningful terms for your business. Currently, this beta feature is available for English-language projects that are created in Premium plan service instances only. For more information, see Customizing the terms that Discovery can recognize.
New Helpful links tab: The home page includes a Helpful links tab that has quick links to documentation, a community site, and other resources.
Improved field selection choices: When you apply an enrichment to a field or choose a field to use as the source for a facet, the fields that are displayed for you to choose from now include only fields that are valid choices. Previously, the list included fields that were not valid choices.

14 October 2021

New Discovery home page: A new home page is displayed when you start Discovery and gives you quick access to a product overview video, and tours. You can collapse the home page welcome banner to see more projects.
New plan usage section: Stay informed about plan usage and check your usage against the limits for your plan type from the Plan limits and usage page. From the product page header, click the user icon . The Usage section shows a short summary. Click View all to see usage information for all of the plan limit categories.
Change to spelling settings in Search: The spelling correction setting changed from being enabled automatically in new projects to being disabled by default. If you want to alert users when they misspell a term in their query, turn on Spelling suggestions. For more information, see Customizing the search bar.
Improved Guided tours availability: The Guided tours button is now available from the product page header, which make them accessible from anywhere. Previously, it was available from the My Projects page only.

1 October 2021

Change to Lite and Advanced plans in all locations: Lite and Advanced plans are discontinued. You cannot create new service instances that use the Lite or Advanced plan types in the Dallas, Frankfurt, London, Sydney, Tokyo, and Washington DC locations. Any existing Lite and Advanced plans continue to function properly and continue to be supported. You can upgrade from a Lite plan to an Advanced plan. Use the new Plus plan and its associated 30-day free trial to explore new features and a simpler way to build that is available with the latest version of the product.

24 September 2021

New scoring for NLU enrichments: Relevance and confidence scores are displayed for NLU enrichments that are returned by search. For example, when you open the JSON view of the document preview from a query result, you can see confidence scores for Entities mentions and relevance scores for Keyword mentions.

9 September 2021

New location for Plus plan: The Plus plan is now available from the Sydney location. Use the new Plus plan and its associated 30-day free trial to explore new features and a simpler way to build that is available with the latest version of the product. For more information, see Getting the most from Discovery.
Change to Lite and Advanced plans in most locations: Lite and Advanced plans are discontinued. You cannot create new service instances that use the Lite or Advanced plan types in the Dallas, Frankfurt, London, Sydney, Tokyo, or Washington DC locations. Any existing Lite and Advanced plans continue to function properly and continue to be supported. You can upgrade from a Lite plan to an Advanced plan.

26 August 2021

New locations for the Plus plan: The Plus plan is now available from the London and Washington DC locations, in addition to Dallas, Frankfurt, and Tokyo.
Change to Lite and Advanced plans in some locations: You cannot create new service instances that use the Lite or Advanced plan types in the Dallas, Frankfurt, London, Tokyo, or Washington DC locations. Any existing Lite and Advanced plans continue to function properly and continue to be supported. You can upgrade from a Lite plan to an Advanced plan.
New answer finding feature: Answer finding is now generally available for managed deployments. Use answer finding when you want to return a concise answer to a question. For more information, see Answer finding.

16 August 2021

New locations for the Plus plan: The Plus plan is now available from the Frankfurt and Tokyo locations, in addition to Dallas.
Change to Lite and Advanced plans in some locations: Lite and Advanced plans are no longer offered. You cannot create new service instances that use the Lite or Advanced plan types in the Dallas, Frankfurt, or Tokyo locations. Any existing Lite and Advanced plans continue to function properly and continue to be supported. You can upgrade from a Lite plan to an Advanced plan.

27 July 2021

Improved document size limit: Document size limit is increased. For Premium plan collections, you can now upload files that are up to 50 MB in size instead of 32 MB. For more information, see Document limits.

23 July 2021

Improved SharePoint Online connector: The Microsoft SharePoint Online data source connector now accepts any valid Azure Active Directory user ID syntax; the format of the user ID doesn't need to match the <admin_user>@.onmicrosoft.com syntax. For more information, see Microsoft SharePoint Online.

16 July 2021

New beta dynamic website web crawl: The Web crawler can now crawl dynamic websites that use JavaScript to render content. If you enable this beta feature, the time it takes to crawl the site increases. For more information, see Web crawl.

23 June 2021

New Plus plan: Use the new Plus plan and its associated 30-day free trial to explore new features and a simpler way to build that is available with the latest version of the product. Currently, the Plus plan is available from the Dallas location. For more information, see Getting the most from Discovery.
Change to Lite and Advanced plans: Lite and Advanced plans are no longer offered. You cannot create new service instances that use the Lite or Advanced plan types in the Dallas location. Any existing Lite and Advanced plans continue to function properly and continue to be supported. You can upgrade from a Lite plan to an Advanced plan.

Endpoint deprecation reminder

Change to Discovery API endpoint

As part of work done to fully support Identity and Access Management (IAM) authentication, the endpoint that you use to access your Discovery service programmatically is changing. The old endpoint URLs are deprecated and will be retired on 26 May 2021. Update your API calls to use the new URLs.

The pattern for the endpoint URL changed from gateway-{location}.watsonplatform.net/discovery/api/ to api.{location}.discovery.watson.cloud.ibm.com/. The domain, location, and offering identifier are different in the new endpoint. For more information, see Updating endpoint URLs from watsonplatform.net.

If your service instance API credentials use the old endpoint, create a new credential and start using it today. After you update your custom applications to use the new credential, you can delete the old one.

19 March 2021

Improved Web crawl connector: You can use the Web crawl collection type to connect to content that is stored on an internal company website. For more information, see Web crawl.

4 March 2021

New drag and drop feature when uploading: Upload collections now support dragging and dropping documents before and during document upload. For more information, see Uploading data.

17 December 2020

Improved date and time display on Activity tab: Each collection now displays the Next sync scheduled for date and time on the Activity tab of the Manage collections page.
New beta FAQ extraction: Released the beta feature FAQ extraction. FAQ extraction automatically extracts question-and-answer pairs from FAQ (frequently asked questions) documents and web pages so that your application returns more precise answers. For more information, see FAQ extraction. For a statement explaining beta features, see Beta features.

3 December 2020

New Content Intelligence: You can now apply the Contracts enrichment to a Document Retrieval project when you create it. The Contracts enrichment can be used to classify contract terms, parties, effective dates and more within your documents. For more information, see Document Retrieval for Contracts.

10 November 2020

New Box connector: Crawl Box systems. For more information, see Box.
New SharePoint 2016 On-Premises connector: Crawl SharePoint 2016 On-Premises systems. For more information, see SharePoint 2016 On-Premises.
The Box connector does not run on Safari: For more information, see Box connector.
Metadata conversion: If the metadata property is converted to an array in the index, the document cannot be deleted by using the Delete labeled data API method. For more information, see the API reference.

30 October 2020

New language support for Bosnian, Croatian, Hindi, and Serbian: Basic language support now available for Bosnian, Croatian, Hindi, and Serbian. For more information, see Language support.
New beta Patterns enrichment: The beta release of Patterns enrichment uses pattern induction to help you teach Discovery to recognize patterns in your data. Pattern induction generates extraction patterns from the examples you specify. After you specify a small number of examples, Discovery will suggest additional rules that you verify to complete the pattern. You can use pattern induction as an enrichment or to create a facet. For more information, see Patterns and Creating a facet by identifying a pattern. For a statement explaining beta features, see Beta features.
Change to Document Retrieval projects: In new Document Retrieval projects, the suggested refinements query setting is now set to false by default. It was previously set to true.

14 September 2020

New pre-trained model for SDU: A new pre-trained model is available in Smart Document Understanding for Document Retrieval projects. This model is ideal if you need to extract data from documents that include a large number of tables. For more information, see Identifying fields.

30 August 2020

Update to API version: The current API version (v2) is now 2020-08-30. The following change was made with this version:
Change to 'options' object: The List enrichments method no longer returns the options object per enrichment. Use the Get enrichment method to return the options object for a single enrichment.

16 July 2020

New release for Premium instances: This release is available for Premium instances of Discovery on IBM Cloud created after 16 July 2020. For Premium instances created before that date and for all Lite and Advanced plans, see Getting started with Discovery.
Change to IBM Cloud Premium: The Premium plan is now generally available.
New Project-based interface: The project-based UI includes configurations optimized for three common use cases: Document Retrieval, Conversational Search, and Content Mining. For more information, see Creating projects.
New Content Mining app: This entirely new capability of Watson Discovery allows you to find insights in your data when you may not even know the question to ask. The powerful correlation tooling will help you unlock value from large unstructured data sets. For details, see Analyzing your data with the Content Mining application.
New tables as answers: Snippets of text aren't helpful if they are found in a table, so Discovery instead returns a formatted table as an answer if your question is best answered by a table. For more information, see Table retrieval.
New dynamic faceted search feature: Underspecified queries are common. Dynamic Faceted Search automatically categorizes your search results into intelligence facets without training by understanding how they are used in the sentences. See Facets in Document retrieval projects.
New reusable components: You no longer need to build a Discovery application from scratch. We now ship out of the box with reusable, open source, React components. As you configure your Discovery application, you are using the real components. From there you simply deploy to get a custom Discovery application. See Building and deploying components.
New Domain Vocabulary feature: You can build a facet for your users without a Dictionary. Use Domain Vocabulary to build a powerful facet with our understanding of how the data is used in as little as 5 minutes. See Facets.
New relevancy training: You can train at a project level. Discovery ranks the best answer regardless of the data source/collection. See Improving result relevance with training.
New built-in spelling corrector: Discovery has spelling suggestions built in. See Parameters descriptions.
Improved Autocomplete: Discovery includes autocomplete (type-ahead) for searches, as well as a reusable component for providing this feature to your end users.
New support for 12 languages: Language support for Discovery is now available in 12 additional languages. For the complete list, see Language support.
Cloud Object Storage connector limitation: When connecting to an IBM Cloud® Object Storage data source, only the first 75 buckets for a given credential are displayed.
Current API version: The API version (v2) is 2019-11-29.
Change to features in this release: Deduplication is not available in this release.; Anomaly Detection is not offered.; IBM Watson® Discovery News is no longer included.; Several Watson Natural Language Understanding enrichments are not available at this time (Entity extraction, Relation extraction, Keyword extraction, Category classification, Concept tagging, Semantic Role extraction, Sentiment analysis, Emotion analysis); The SharePoint 2016 On-Premises and Box data sources are not available at this time.