Machine learning model creation workflow

Create a machine learning model that trains a model you can use to identify entities, coreferences, and relationships of interest in new documents.

Understand the typical workflow for creating a machine learning model in Knowledge Studio for IBM Cloud Pak for Data.

All the steps are performed by the project manager, except for the Annotate documents step, which is performed by the human annotator. Because human annotators are often subject matter experts, they might be consulted during the creation of workspace resources, such as the type system, also.

Figure 1. The workflow for developing a machine learning model

Step	Description
Create a workspace	See Creating a workspace. A workspace contains the resources that are used to create the model, including: - Type system: Upload or create the type system, and define the entity types and relation types that human annotators can apply when annotating text. The model process manager typically works with subject matter experts for your domain to define the type system. See Establishing a type system - Source documents: Create a corpus by uploading sample documents that are representative of your domain content into the workspace. See Adding documents for annotation. Partition the corpus into document sets, specify the percentage of documents that are shared among all document sets, and assign the document sets to human annotators. - Dictionaries: Upload or create dictionaries for annotating text. You can choose to manually add dictionary entries or upload entries from a file, and then edit the entries. See Creating dictionaries.
Optional: Pre-annotate documents	Pre-annotate documents according to the terms in the workspace dictionaries or based on rules that you define. See Bootstrapping annotation.
Annotate documents	Admins and project managers can annotate documents directly, without needing to create annotation tasks or go through the adjudication process. See Annotating document sets directly. The project manager generally assigns annotation tasks to human annotators, configures the inter-annotator agreement threshold, and provides annotation guidelines for the human annotators to follow. See Creating an annotation task. Human annotators use the ground truth editor to manually annotate documents. A human annotator identifies mentions of interest in your domain content and labels them with entity types. The human annotator also identifies relationships between mentions (for example, Mary is an employee of IBM) and how the mentions co-reference the same entity (such as an occurrence of "she" that refers to Mary). Refer to Annotating documents.
Adjudicate and promote documents	Accept or reject the ground truth that was generated by human annotators, and adjudicate any annotation differences to resolve conflicts. Evaluating the accuracy and consistency of the human annotation effort might be the responsibility of a senior human annotator or a user with stronger subject matter experience than the project manager. See Adjudication.
Train the model	Create the machine learning model. See Creating a machine learning model.
Evaluate the model	Evaluate the accuracy of the model. See Evaluating annotations added by the model. Depending on model accuracy, this step might result in the need to repeat earlier steps again and again until optimal accuracy is achieved. See Analyzing machine learning model performance for ideas about what to update based on common performance issues.
Publish the model	Export the model. See Using the machine learning model.