IBM Cloud Docs
Evaluating the assistant

Evaluating the assistant

Plus Enterprise IBM Cloud Pak for Data

Overview

You can evaluate and analyze the performance of your assistant by uploading a comprehensive, relevant collection of utterances and sending the utterances to your assistant in a test run.

You can use the Evaluate page of watsonx Assistant to upload a collection of sample utterances and run through the utterances on your assistant in one test run.

When a test run completes, you can view a comprehensive evaluation result. It includes the response routing metrics, conversational search scores(if conversational search is enabled), and response details for any utterance in the uploaded collection. It also includes your assistant settings relevant to the test run.

Evaluation is supported for the draft environment only.

Before you begin

To evaluate the Conversational search performance, in the Search Integration window, set the Conversational search toggle to On. For more information see, Enabling Conversational search

Procedure

To evaluate the response settings of your assistant, perform the following steps.

  1. In the watsonx Assistant home page, click Evaluate to open the Evaluate response settings.

  2. Click Add file to select the data. You can upload test data set in .csv format.

  3. Click Run followed by Confirm to start the evaluation to view the results.

Conversational search scores

Under Conversation search scores, you can view the scores for extractiveness, retrieval confidence, response confidence, average citations per response and average response length of the whole dataset. For more information, see Conversational search analytics.

Settings

Under Settings, you can view your assistant's settings.

Filtering the results

You can filter the results based on the type of response routing. Click filter icon(Filter icon) and choose the type of response you want to display from the drop-down menu.

On the Response details table, by default you can see the response confidence for each message. Click the settings icon(Settings icon) and choose to display the extractiveness and retrieval confidence for each message from the drop-down menu.

Exporting the results

You can export and save the result of the evaluation. Click the export icon(filter icon) to export the evaluation result table to a .csv file.

The test result of the latest test run is preserved as per the same retention policy of the chat logs. You can also click the reset icon(Reset icon) to delete the result at any time before the test result expires. The result is deleted for all the users.