Getting started with Watson Discovery

In this tutorial, we introduce IBM Watson® Discovery and walk you through the Discovery sample project. Exploring the sample project is a great way to tour and try out some of the product's features.

Before you begin

Choose the appropriate step to complete for your deployment:

IBM Cloud Pak for Data IBM Software Hub Install Discovery. See Installing Discovery for Cloud Pak for Data.
IBM Cloud Complete the following steps:
1. Sign up for a IBM Cloud account or log in.
2. You can use a Plus plan for 30 days at no cost. However, to create a Plus plan instance of the service, you must have a paid account.
  
  For more information about creating a paid account, see Upgrading your account.
  
  If you decide to discontinue use of the Plus plan and don't want to pay for it, delete the service instance before the 30-day trial period ends.
3. Go to the Discovery resource page in the IBM Cloud catalog and create a Plus plan service instance.

Open Watson Discovery

IBM Cloud

These instructions apply to all managed deployments, including IBM Cloud Pak for Data as a Service instances.

Click the Discovery instance that you created to go to the service dashboard.
On the Manage page, click Launch Watson Discovery.

If you're prompted to log in, provide your IBM Cloud credentials.

IBM Cloud Pak for Data IBM Software Hub

These instructions apply to Discovery deployments:

From the web client main menu, expand Services, and then click Instances.
Find your instance, and then click it to open its summary page.

You can create a maximum of 10 instances per deployment. After you reach the maximum number, the New instance button is not displayed in IBM Cloud Pak for Data.
Click Launch tool.

Open the sample project

A new browser tab or window opens and the My Projects page is displayed.

Shows the main My Projects page with a single Sample Project tile. — My projects page of the Sample project

To get familiar with the product, you can watch an under 3-minute overview video by clicking the Watch a video link from the product home page.

In this tutorial, you explore the sample project.

The sample project is a built-in project that is provided as a resource for you to initially explore the product. The sample project is a Document Retrieval project type. Document Retrieval projects are used to search and find the most relevant answers from your data.

Click Sample Project.

The Improve and customize page is displayed.

If you just installed Discovery, the Sample Project needs time to finish processing documents. Wait for processing to finish before you start experimenting. You can check the status of data processing from the Activity page, which is described in the next step.

Sample project Improve and customize page

Learn about the sample collection

Learn about ways you can manage and enhance a collection by exploring the sample collection that is available with the sample project. The sample collection consists of a set of uploaded IBM Support PDF documents.

Click the Manage collections icon on the navigation panel.

Any collections in your project are displayed here. This project has only one collection.

Collections page in the Sample project
Click Sample Collection.

The Activity page is displayed. This page shows the status of the collection. For example, it shows the total number of documents and when it was last updated. If Discovery encounters a problem when a document is uploaded or a data source is crawled, any associated messages are displayed here.

Activities page in the Sample project

After you create a collection, you can come to this page to find information about the processing status of the data in the collection.
Click the Enrichments tab.

The Enrichments page shows you a list of available enrichments. Enrichments make meaningful information easier to find and return in searches. You can apply built-in enrichments to your collection to leverage powerful Natural Language Understanding models that tag terms, such as commonly known keywords.

Enrichments page of the Sample project

The Entities enrichment is applied to the sample collection:

Entities

Recognizes proper nouns such as people, cities, and organizations that are mentioned in the content.

This enrichment is applied automatically to collections that are added to projects of the Document Retrieval type.
For the Entities v2 enrichment, click 1x Selected fields.

A list of available fields is displayed and the text field is selected. This selection means that the Entities enrichment was applied to content that was indexed and added to a field named text when documents from the collection were processed.

Entities enrichment being applied to the text field

From this page, you can apply new enrichments to your collection or change the fields where an enrichment is applied.

A powerful feature of Discovery is that you can add your own custom enrichments, such as dictionaries, patterns, and machine learning models. When you create custom enrichments, they are listed on this page also. You can manage where they are used from here.

For more information about custom enrichments, see Adding domain-specific resources.
You are going to apply another enrichment to the collection. Find the Keywords enrichment in the list, and then click Select fields.

The Keywords enrichment recognizes significant commonly-known terms in your content.
Scroll through the list of fields until you find the text field, and select it.

Fields to which you can apply the Keywords enrichment
Click Apply changes and reprocess.

While your documents are being reprocessed to look for and tag keywords, you can continue to explore the tools available for managing a collection.
Click Identify fields.

Most content from a document is indexed in the text field automatically. You might want to index certain types of content in different fields or split up large documents so that the text field contains fewer passages per document. To do so, you can teach Discovery to recognize important fields in your documents by applying a Smart Document Understanding model to your collection.

Smart Document Understanding (SDU) is a technology that learns about the content of a document based on the document's structure. You can apply a prebuilt SDU model or create a custom SDU model.

Smart Document Understanding model options

To create a custom SDU model, you select the User-trained model option, and then annotate fields in your document. (You will not annotate documents as part of this tutorial.)

Smart Document Understanding annotation tool

For more information about SDU, see Using Smart Document Understanding.
Click Manage fields.

The Manage fields page lists the indexed fields. From here, you can include or remove fields from the index. You can also split large documents into many smaller documents.

Fields in the collection index

For more information about splitting documents, see Splitting documents to make query results more succinct.

Search the sample project

Click the Improve and customize icon from the navigation panel.

The Improve and customize page is where you can try out queries, then add and test customizations to improve the query results for your project. A list of sample queries is displayed to help you get started with submitting test queries.
Click the Run search button forIBM.

Query results are displayed.
From one of the query results, click View passages in document.

A preview of the document where the result was found is shown.
Do one of the following things to explore the search result.
1. Click Open advanced view.
  
  Useful summary information is displayed, such as the number of occurrences of any enrichments that are detected in the document.
2. Select the URL entity to highlight mentions of URLs within the text.
  
  Advanced view that shows entities that were recognized
3. To see how the information from the document is stored in JSON format, click the View as menu from the view header, and select JSON.
  
  A JSON representation of the document is displayed.
  
  JSON representation of the document
  
  You can explore the JSON representation to see information that Discovery captured from the document. For example, if you expand the enriched_text section, and then expand the entities section, you can see mentions of entities that were recognized and tagged by the Entities enrichment.
  
  Shows the enrichment_text.entities section of the JSON representation

Customize the sample project

Now, let's customize the search result view a bit by adding a facet. A facet is a way to organize and classify documents that share similar patterns or content.

From the Improve and customize page, submit the following natural language query:
```
How do I install Discovery?
```
Review the query results that are displayed.

Top Entities facet results

Notice that a Top Entities section is displayed. You can expand the entities and click one of them to filter the query results to show only those results in which the entity is mentioned. The Top Entities section is a built-in facet. It uses information that was added to the documents by the Entities enrichment.

You will add your own facet that uses the Keywords enrichment that you applied to the collection in a previous step.
On the Improvement tools panel, expand Customize display, and then click Facets.

Customize display options
Click New facet, and then click the From existing fields in a collection button.
Choose enriched_text.keywords.mentions.text, change the label to Keywords, and then click Apply.

Creating a Keywords-based facet

Remember the JSON representation of the document that you looked at earlier? Now that the Keywords enrichment is applied to the text field, and the documents are reprocessed, any keyword mentions found in the text field are included in the JSON representation of the document.

The field you picked to use for the facet (enriched_text.keywords.mentions.text) reflects where the keyword text is stored in JSON.
```
"enriched_{field_name}": [
  "keywords" : [
    "mentions" : [
      "text": "Cloud Pak"
    ]
  ]
]
```
The new facet is displayed. You can click a keyword to filter the documents to include only those results that mention the keyword.

Keywords facet

You successfully added a built-in NLU enrichment that recognizes keywords in the sample collection documents. Then, you added a facet that uses the keywords enrichment to let you filter the documents by keyword.

Share the sample project

Click Integrate and deploy from the navigation panel.

From here, you can share your project with colleagues and deploy it.
Follow the on-screen instructions to add a user, and then send login credentials and the provided link to your colleague.

Integrate and deploy page

After you build your own search application and are ready to deploy it, you can use prebuilt user interface components or build a custom application.
- Click API Information. From this page, you can get the project ID for your project. You need the project ID to use the Discovery API. You also need the service instance URL and API key. The credential details are available from the Manage page of your service instance in IBM Cloud.
- Click UI Components to find links to ready-to-use code that you can use to create a full-featured search application faster.

Add your own content

Now that you know more about some of the product features, you're ready to evaluate the data you want to search.

It's all about the data. Review the types of content you own that you want your search solution to be able to leverage.

Supported data sources

The following table shows the supported data sources for each deployment type.

Supported data sources
This table has row and column headers. The row headers identify supported data sources. The column headers identify the different product deployment type options. To understand which data sources are available for your deployment type, go to the row that describes the data source, and find the columns for the type of deployment you're interested in.
Data source	IBM Cloud	IBM Cloud Pak for Data
Box
Database (IBM Data Virtualization, IBM Db2, Microsoft SQL, Oracle, Postgres)
FileNet P8
HCL Notes
IBM Cloud Object Storage
Local file system
Salesforce
Microsoft SharePoint Online
Microsoft SharePoint On Premises
Website
Microsoft Windows file system

Not sure what you can build?

For more information about the types of search solutions you can build, see Start getting value from your data.

You can access the product documentation at any time by selecting the Help icon from the page header of the product user interface. The help content is customized to provide information that is related to what you're doing in the product.

No matter what you build, step one is to create a project. Decide which project type best fits your needs.

If none of the existing types is quite right, you can choose None of the above to create a custom project instead.

Project descriptions

Project type use cases
Need	Goal	Project type
I want to extract data to support automation of repetitive document processing tasks.	I want to understand quickly what data is extracted from my documents and improve the data by applying enrichments.	Intelligent Document Processing
Which document contains the answer to my question?	Find meaningful information in sources that contain a mix of structured and unstructured data, and surface it in a stand-alone enterprise search application or in the search field of a business application.	Document Retrieval
Where is the part of the contract that I need for my task?	Quickly extract critical information from contracts.	Document Retrieval for Contracts
I want the chatbot I'm building to use knowledge that I own.	Give a virtual assistant quick access to technical information that is stored in various external data sources and document formats to answer customer questions.	Conversational Search
I want to uncover insights I didn't know to ask about.	Gain insights from pattern analysis or perform root cause analysis.	Content Mining

For more information, see Creating projects.