Machine learning model creation workflow
Create a machine learning model that trains a model you can use to identify entities, coreferences, and relationships of interest in new documents.
Understand the typical workflow for creating a machine learning model in Knowledge Studio for IBM Cloud Pak for Data.
All the steps are performed by the project manager, except for the Annotate documents step, which is performed by the human annotator. Because human annotators are often subject matter experts, they might be consulted during the creation of workspace resources, such as the type system, also.
Figure 1. The workflow for developing a machine learning model
Step | Description |
---|---|
Create a workspace | See Creating a workspace. A workspace contains the resources that are used to create the model, including: - Type system: Upload or create the type system, and define the entity types and relation types that human annotators can apply when annotating text. The model process manager typically works with subject matter experts for your domain to define the type system. See Establishing a type system - Source documents: Create a corpus by uploading sample documents that are representative of your domain content into the workspace. See Adding documents for annotation. Partition the corpus into document sets, specify the percentage of documents that are shared among all document sets, and assign the document sets to human annotators. - Dictionaries: Upload or create dictionaries for annotating text. You can choose to manually add dictionary entries or upload entries from a file, and then edit the entries. See Creating dictionaries. |
Optional: Pre-annotate documents | Pre-annotate documents according to the terms in the workspace dictionaries or based on rules that you define. See Bootstrapping annotation. |
Annotate documents |
|
Adjudicate and promote documents | Accept or reject the ground truth that was generated by human annotators, and adjudicate any annotation differences to resolve conflicts. Evaluating the accuracy and consistency of the human annotation effort might be the responsibility of a senior human annotator or a user with stronger subject matter experience than the project manager. See Adjudication. |
Train the model | Create the machine learning model. See Creating a machine learning model. |
Evaluate the model | Evaluate the accuracy of the model. See Evaluating annotations added by the model. Depending on model accuracy, this step might result in the need to repeat earlier steps again and again until optimal accuracy is achieved. See Analyzing machine learning model performance for ideas about what to update based on common performance issues. |
Publish the model | Export the model. See Using the machine learning model. |