Use imported ML models to find custom terms
Use custom Machine Learning models that use rules or context to recognize and tag entities.
Add Machine Learning models that you created with IBM tools that you can use to define your own type system.
The type of models you can add depend on your deployment:
-
IBM Cloud Pak for Data You can add models that were created with Watson Explorer Content Analytics Studio models, or with an instance of IBM Watson® Knowledge Studio that is hosted on IBM Cloud Pak® for Data or IBM Cloud. Starting with the 4.6.2 release, you can also add custom entity extractor models that were created in and exported from another instance of Discovery.
-
IBM Cloud You can add models that were created with a IBM Watson® Knowledge Studio instance that is hosted in IBM Cloud only.
To use a Knowledge Studio model that was built with Knowledge Studio on IBM Cloud Pak for Data, migrate the ground truth to a IBM Cloud instance of Knowledge Studio. and then retrain the model.
The following types of models are supported:
- Rule-based models created in Knowledge Studio that find entities in documents based on rules that you define. (File format: .pear)
- Machine learning models created in Knowledge Studio that understand the linguistic nuances, meaning, and relationships specific to your industry (file format: .zip)
- Custom entity extractors that are created in and exported from Discovery. (File format: .ent)
- Sentence classifiers that are created in and exported from Discovery. (File format: .sc)
- IBM Cloud Pak for Data Custom UIMA text analysis models created in Watson Explorer Content Analytics Studio. (File format: .pear)
From installed deployments, support for importing entity extractor models was added with the 4.6.2 release.
Discovery cannot identify entity subtypes that are defined by a Knowledge Studio model.
To add a Machine Learning model, complete the following steps:
-
Create the model and export it from the tool you use to create it.
For more information, see the following documentation:
-
Knowledge Studio for IBM Cloud Pak® for Data
-
Knowledge Studio for IBM Cloud
-
Watson Explorer Content Analytics Studio
You must export the model from Watson Explorer Content Analytics Studio as a UIMA PEAR file. For more information, see: Creating Custom PEAR Files for use with Lexical Analysis Streams.
-
-
From the Teach domain concepts section of the Improvement tools panel, and then click Import machine learning models.
-
Specify a name for the model, and then choose the language that was used to define the model.
-
Click Upload to browse for the file that you exported earlier.
-
Click Create.
-
Choose the collection and field where you want to apply the enrichments from the model, and then click Apply.
If the model is too large to upload from the product user interface, you can use the Create an enrichment method of the API to import the file.
Rule-based model example
For example, when a machine learning model is applied as an enrichment to a field, it extracts all entity types in that field that were specified in a Knowledge Studio rule-based model. If the model recognizes entity types such as person
,
surname
, and job title
they are recognized in your documents and tagged.
In the output, the information that is extracted by the Machine Learning enrichment in the enriched_{field_name}
array, within the entities
array. In this example, the field that is selected for enrichment is text
.
{
"enriched_text": [
{
"entities": [
{
"path": ".wksrule.entities.PERSON",
"text": "George Washington",
"type": "PERSON"
},
{
"path": ".wksrule.entities.GIVENNAME",
"text": "George",
"type": "GIVENNAME"
},
{
"path": ".wksrule.entities.SURNAME",
"text": "Washington",
"type": "SURNAME"
},
{
"path": ".wksrule.entities.POSITION",
"text": "politician",
"type": "POSITION"
},
{
"path": ".wksrule.entities.POSITION",
"text": "soldier",
"type": "POSITION"
},
{
"path": ".wksrule.entities.JOBTITLE",
"text": "President of the United States",
"type": "JOBTITLE"
}
],
"text": [
"George Washington (February 22, 1732‚ December 14, 1799) was an American politician and soldier who served as the first President of the United States from 1789 to 1797 and was one of the Founding Fathers of the United States."
]
}
]
}
As a result, if someone uses the API to submit a Discovery Query Language query to look for occurrences of the enriched_{field_name}.entities.type:jobtitle
enrichment,
any passages that discuss a person's job title are returned.
Machine learning model example
In this example, a Machine learning model extracts entity types such as person
, oranization
, and date
, and information about relationships between the entities. When this ML model is applied as an enrichment
to a field, it uses machine learning to understand the linguistic nuances, meaning, and relationships that are mentioned in the document.
In the output, the information that is extracted by the Machine Learning enrichment in the enriched_{field_name}
array, within the entities
and the relations
arrays. In this example, the field that is selected
for enrichment is text
.
{
"enriched_text": [
{
"entities": [
{
"count": 1,
"text": "Democratic Party",
"type": "ORGANIZATION"
},
{
"count": 1,
"text": "March 15, 1767",
"type": "DATE"
},
{
"count": 1,
"text": "President",
"type": "POSITION"
},
{
"count": 1,
"text": "Andrew Jackson",
"type": "PERSON"
}
],
"relations": [
{
"sentence": "Andrew Jackson (March 15, 1767‚ June 8, 1845) was an American soldier and statesman who served as the seventh President of the United States from 1829 to 1837 and was the founder of the Democratic Party."
}
]
}
]
}
Machine learning model limits
The number of Machine Learning (ML) models you can create per service instance depends on your Discovery plan type.
Plan | ML models per service instance |
---|---|
Cloud Pak for Data | Unlimited |
Premium | 10 |
Enterprise | 10 |
Plus (includes Trial) | 3 |
For each Knowledge Studio machine learning model, the maximum number of entities that can be detected is 50.
Advanced rules models
Add an advanced rules model to apply a text extraction model that was created and exported from the Advanced Rule editor of IBM Watson® Knowledge Studio to your collection.
Your model must be created with the appropriate Knowledge Studio deployment:
-
IBM Cloud Pak for Data You can add models that were created and exported from the following places:
- IBM Watson® Knowledge Studio that was built with a IBM Cloud Pak® for Data deployment earlier than the 4.5 release.
- IBM Watson® Knowledge Studio that is hosted on IBM Cloud
- NLP Editor that is built by contributors to the Center for Open-source Data & AI Technologies
-
IBM Cloud You can add models that were created with a IBM Watson® Knowledge Studio instance that is hosted on IBM Cloud only.
Removal from Knowledge Studio
Support for building models with the beta Advanced Rules Editor in Knowledge Studio ended. Any rules models that were exported from Knowledge Studio prior to the end of support date can continue to be used in Discovery.
End of support dates differ based on the deployment type:
- IBM Cloud 30 June 2022
- IBM Cloud Pak for Data IBM Cloud Pak for Data release 4.5.1 on 3 August 2022.
IBM Cloud As an alternative to using a model that is generated by the Knowledge Studio Advanced Rules Editor, you can define a rule by adding a Patterns enrichment.
Adding an existing model
To add an advanced rule model, complete the following steps:
-
Create the model and export the ZIP file that contains the model resources.
For more information about how to export the model, see the instructions for your model source:
-
From the Teach domain concepts section of the Improvement tools panel, choose Advanced rules model.
-
Click Upload.
-
Specify a name for the model, and then choose the language that was used to define the model.
-
Specify a name for the result field, which is the field in the index where the output of this enrichment will be stored.
-
Click Upload to browse for the ZIP file that you exported earlier.
-
Click Create.
-
Choose the collection and field where you want to apply the enrichments from the model, and then click Apply.
Output format for advanced rules
Knowledge Studio uses the Annotation Query Language (AQL) to define the rules in an advanced rules model. Each model is defined by one or more views. Each view is a relational data structure that contains multiple data records. Each record is composed of values in columns that are defined by the view’s schema. To facilitate representing these models, which are custom and therefore have various schemas, a uniform JSON output schema is used.
- Each JSON object represents an Annotation Query Language (AQL) view.
- The name-and-value pairs in the JSON objects represent the names and values of the attributes in the view.
- The tuples in an AQL view are represented as an array of JSON objects, with one object for each tuple in the view.
The following table describes how AQL data types are represented in JSON syntax.
AQL data type | JSON syntax | JSON example |
---|---|---|
Integer | number | 5 |
Float | number | 4.13 |
Boolean | boolean | true |
Text | string | "some string" |
Span | object with the form {"text": String, "location": {"begin": Integer, "end": Integer}} |
{ "text": "Jane", location": {"begin": 5, "end": 9} } |
Special case: null value | null | null |
List of Integer | array of number values | [ 1, 2, 3, 4, 5] |
List of Float | array of number values | [ 4.13, 4.5 ] |
List of Boolean | array of boolean values | [ true, true, false] |
List of Text | array of string values | [ "some string", "another string" ] |
List of Span | array of objects with the form {"text":String, "location": {"begin": Integer, "end": Integer}} |
[{ "text":"Jane", "location": {"begin": 5, "end": 9} }, { "text":"...", "location": {"begin": 15, "end": 40} }] |
Special case: empty List | array with 0 elements | [ ] |
Advanced rules model limits
The number of advanced rules models that you can define per service instance depends on your Discovery plan type.
Plan | Advanced rules models per service instance |
---|---|
Cloud Pak for Data | Unlimited |
Premium | 3 |
Enterprise | 3 |
Plus (includes Trial) | 1 |