Evaluating the assistant

Plus Enterprise IBM Cloud Pak for Data IBM Software Hub

Overview

You can evaluate and analyze the performance of your assistant by uploading a comprehensive, relevant collection of utterances and sending the utterances to your assistant in a test run.

You can use the Evaluate page of watsonx Assistant to upload a collection of sample utterances and run through the utterances on your assistant in one test run.

Each test utterance within a test run starts a new conversation session.

When a test run completes, you can view a comprehensive evaluation result. It includes the response routing metrics, conversational search scores (if conversational search is enabled), and response details for any utterance in the uploaded collection. It also includes your assistant settings relevant to the test run.

Evaluation is supported for the draft environment only.

Before you begin

To evaluate the Conversational search performance, in the Search Integration window, set the Conversational search toggle to On. For more information see, Enabling Conversational search.

You can run a maximum of 250 messages per test.

Required format for the CSV file to upload

You must follow these criteria for the CSV file to be uploaded.

The CSV file must have a single column that includes the text of the utterance to be sent.
Each row is sent to your assistant sequentially.
The CSV has no heading row.

Things to consider in writing a CSV

If your utterance is plain text, it can be written as is. For example, This is a test utterance can be written as: This is a test utterance
If your utterance contains a comma, you must wrap the line in quotation marks. For example: Hi, this is a second utterance must be written as "Hi, this is a second utterance"
If your quoted utterance contain quotation marks, those quotation marks must themselves be prefixed by another quotation mark. For example: I have the "best" plan must be written as "I have the ""best"" plan"

Procedure

To evaluate the response settings of your assistant, perform the following steps.

In the watsonx Assistant home page, click Evaluate to open the Evaluate response settings.
Click Add file to select the data. You can upload test data set in .csv format.
Click Start.

Conversational search scores

Under Conversation search scores, you can view the scores for extractiveness, retrieval confidence, response confidence, average citations per response and average response length of the whole dataset. For more information, see Conversational search analytics.

Settings

Under Settings, you can view your assistant's settings.

Filtering the results

You can filter the results based on the type of response routing. Click filter icon( Filter icon ) and choose the type of response you want to display from the drop-down menu.

On the Response details table, by default you can see the response confidence for each message. Click the settings icon( Settings icon ) and choose to display the extractiveness and retrieval confidence for each message from the drop-down menu.

Exporting the results

You can export and save the result of the evaluation. Click the export icon filter icon to export the evaluation result table to a .csv file.

The test result of the latest test run is preserved as per the same retention policy of the chat logs. You can also click the reset icon Reset icon to delete the result at any time before the test result expires. The result is deleted for all the users.