Getting started with Watson Discovery
In this tutorial, we introduce IBM Watson® Discovery and walk you through the Discovery sample project. Exploring the sample project is a great way to tour and try out some of the product's features.
Open Watson Discovery
IBM Cloud
These instructions apply to all managed deployments, including IBM Cloud Pak for Data as a Service instances.
-
Click the Discovery instance that you created to go to the service dashboard.
-
On the Manage page, click Launch Watson Discovery.
If you're prompted to log in, provide your IBM Cloud credentials.
IBM Cloud Pak for Data
These instructions apply to Discovery deployments:
-
From the IBM Cloud Pak for Data web client main menu, expand Services, and then click Instances.
-
Find your instance, and then click it to open its summary page.
You can create a maximum of 10 instances per deployment. After you reach the maximum number, the New instance button is not displayed in IBM Cloud Pak for Data.
-
Click Launch tool.
Open the sample project
A new browser tab or window opens and the My Projects page is displayed.
To get familiar with the product, you can watch an under 3-minute overview video by clicking the Watch a video link from the product home page.
In this tutorial, you explore the sample project.
The sample project is a built-in project that is provided as a resource for you to initially explore the product. The sample project is a Document Retrieval project type. Document Retrieval projects are used to search and find the most relevant answers from your data.
-
Click Sample Project.
The Improve and customize page is displayed.
If you just installed Discovery, the Sample Project needs time to finish processing documents. Wait for processing to finish before you start experimenting. You can check the status of data processing from the Activity page, which is described in the next step.
Learn about the sample collection
Learn about ways you can manage and enhance a collection by exploring the sample collection that is available with the sample project. The sample collection consists of a set of uploaded IBM Support PDF documents.
-
Click the Manage collections icon on the navigation panel.
Any collections in your project are displayed here. This project has only one collection.
-
Click Sample Collection.
The Activity page is displayed. This page shows the status of the collection. For example, it shows the total number of documents and when it was last updated. If Discovery encounters a problem when a document is uploaded or a data source is crawled, any associated messages are displayed here.
After you create a collection, you can come to this page to find information about the processing status of the data in the collection.
-
Click the Enrichments tab.
The Enrichments page shows you a list of available enrichments. Enrichments make meaningful information easier to find and return in searches. You can apply built-in enrichments to your collection to leverage powerful Natural Language Understanding models that tag terms, such as commonly known keywords.
The Entities enrichment is applied to the sample collection:
- Entities
- Recognizes proper nouns such as people, cities, and organizations that are mentioned in the content.
This enrichment is applied automatically to collections that are added to projects of the Document Retrieval type.
-
For the Entities v2 enrichment, click 1x Selected fields.
A list of available fields is displayed and the
text
field is selected. This selection means that the Entities enrichment was applied to content that was indexed and added to a field namedtext
when documents from the collection were processed.From this page, you can apply new enrichments to your collection or change the fields where an enrichment is applied.
A powerful feature of Discovery is that you can add your own custom enrichments, such as dictionaries, patterns, and machine learning models. When you create custom enrichments, they are listed on this page also. You can manage where they are used from here.
For more information about custom enrichments, see Adding domain-specific resources.
-
You are going to apply another enrichment to the collection. Find the Keywords enrichment in the list, and then click Select fields.
The Keywords enrichment recognizes significant commonly-known terms in your content.
-
Scroll through the list of fields until you find the
text
field, and select it. -
Click Apply changes and reprocess.
While your documents are being reprocessed to look for and tag keywords, you can continue to explore the tools available for managing a collection.
-
Click Identify fields.
Most content from a document is indexed in the
text
field automatically. You might want to index certain types of content in different fields or split up large documents so that thetext
field contains fewer passages per document. To do so, you can teach Discovery to recognize important fields in your documents by applying a Smart Document Understanding model to your collection.Smart Document Understanding (SDU) is a technology that learns about the content of a document based on the document's structure. You can apply a prebuilt SDU model or create a custom SDU model.
To create a custom SDU model, you select the User-trained model option, and then annotate fields in your document. (You will not annotate documents as part of this tutorial.)
For more information about SDU, see Using Smart Document Understanding.
-
Click Manage fields.
The Manage fields page lists the indexed fields. From here, you can include or remove fields from the index. You can also split large documents into many smaller documents.
For more information about splitting documents, see Splitting documents to make query results more succinct.
Search the sample project
-
Click the Improve and customize icon from the navigation panel.
The Improve and customize page is where you can try out queries, then add and test customizations to improve the query results for your project. A list of sample queries is displayed to help you get started with submitting test queries.
-
Click the Run search button for
IBM
.Query results are displayed.
-
From one of the query results, click View passages in document.
A preview of the document where the result was found is shown.
-
Do one of the following things to explore the search result.
-
Click Open advanced view.
Useful summary information is displayed, such as the number of occurrences of any enrichments that are detected in the document.
-
Select the
URL
entity to highlight mentions of URLs within the text. -
To see how the information from the document is stored in JSON format, click the View as menu from the view header, and select JSON.
A JSON representation of the document is displayed.
You can explore the JSON representation to see information that Discovery captured from the document. For example, if you expand the
enriched_text
section, and then expand theentities
section, you can see mentions of entities that were recognized and tagged by the Entities enrichment.
-
Customize the sample project
Now, let's customize the search result view a bit by adding a facet. A facet is a way to organize and classify documents that share similar patterns or content.
-
From the Improve and customize page, submit the following natural language query:
How do I install Discovery?
-
Review the query results that are displayed.
Notice that a Top Entities section is displayed. You can expand the entities and click one of them to filter the query results to show only those results in which the entity is mentioned. The Top Entities section is a built-in facet. It uses information that was added to the documents by the Entities enrichment.
You will add your own facet that uses the Keywords enrichment that you applied to the collection in a previous step.
-
On the Improvement tools panel, expand Customize display, and then click Facets.
-
Click New facet, and then click the From existing fields in a collection button.
-
Choose
enriched_text.keywords.mentions.text
, change the label toKeywords
, and then click Apply.Remember the JSON representation of the document that you looked at earlier? Now that the Keywords enrichment is applied to the
text
field, and the documents are reprocessed, any keyword mentions found in thetext
field are included in the JSON representation of the document.The field you picked to use for the facet (
enriched_text.keywords.mentions.text
) reflects where the keyword text is stored in JSON."enriched_{field_name}": [ "keywords" : [ "mentions" : [ "text": "Cloud Pak" ] ] ]
-
The new facet is displayed. You can click a keyword to filter the documents to include only those results that mention the keyword.
You successfully added a built-in NLU enrichment that recognizes keywords in the sample collection documents. Then, you added a facet that uses the keywords enrichment to let you filter the documents by keyword.
Share the sample project
-
Click Integrate and deploy from the navigation panel.
From here, you can share your project with colleagues and deploy it.
-
Follow the on-screen instructions to add a user, and then send login credentials and the provided link to your colleague.
After you build your own search application and are ready to deploy it, you can use prebuilt user interface components or build a custom application.
-
Click API Information. From this page, you can get the project ID for your project. You need the project ID to use the Discovery API. You also need the service instance URL and API key. The credential details are available from the Manage page of your service instance in IBM Cloud.
-
Click UI Components to find links to ready-to-use code that you can use to create a full-featured search application faster.
-
Add your own content
Now that you know more about some of the product features, you're ready to evaluate the data you want to search.
It's all about the data. Review the types of content you own that you want your search solution to be able to leverage.
Supported data sources
The following table shows the supported data sources for each deployment type.
Data source | IBM Cloud | IBM Cloud Pak for Data |
---|---|---|
Box | ||
Database (IBM Data Virtualization, IBM Db2, Microsoft SQL, Oracle, Postgres) | ||
FileNet P8 | ||
HCL Notes | ||
IBM Cloud Object Storage | ||
Local file system | ||
Salesforce | ||
Microsoft SharePoint Online | ||
Microsoft SharePoint On Premises | ||
Website | ||
Microsoft Windows file system |
Not sure what you can build?
For more information about the types of search solutions you can build, see Start getting value from your data.
You can access the product documentation at any time by selecting the Help icon from the page header of the product user interface. The help content is customized to provide information that is related to what you're doing in the product.
No matter what you build, step one is to create a project. Decide which project type best fits your needs.
If none of the existing types is quite right, you can choose None of the above to create a custom project instead.
Project descriptions
Need | Goal | Project type |
---|---|---|
I want to extract data to support automation of repetitive document processing tasks. | I want to understand quickly what data is extracted from my documents and improve the data by applying enrichments. | Intelligent Document Processing |
Which document contains the answer to my question? | Find meaningful information in sources that contain a mix of structured and unstructured data, and surface it in a stand-alone enterprise search application or in the search field of a business application. | Document Retrieval |
Where is the part of the contract that I need for my task? | Quickly extract critical information from contracts. | Document Retrieval for Contracts |
I want the chatbot I'm building to use knowledge that I own. | Give a virtual assistant quick access to technical information that is stored in various external data sources and document formats to answer customer questions. | Conversational Search |
I want to uncover insights I didn't know to ask about. | Gain insights from pattern analysis or perform root cause analysis. | Content Mining |
For more information, see Creating projects.